I have a good understanding of what clock speed is, but why does it matter?
For the second question, I was wondering since for example, the new i9-14900K has a base clock speed of 3.2 GHz, whereas my previous desktop CPU, the i7-4790K, had a base clock speed of 4.0 GHz. Why hasn’t this number steadily gone up thought the years?
In: Technology
There are 3 different factors at play here.
1. Clock speed limit
1. CPUs can only go so fast because there is a propogation delay within the silicon and transistors as the signal moves across the chip. At multiple points in the chip there is dedicated hardware that has to be used to boost the signals because they get too weak before they can reach the other side of the CPU. Modern CPUs run so fast that multiple parts of the chip are executing 4-10 clock cycles behind the main clock signal. Even more so when your talking about external buses like PCIe.
2. IPC Improvements.
1. Each clock cycle just represents a step that the rest of the hardware can synchronize around as they push and move data. So to add 2 numbers together, You would need to load number A into register 1, loader number B into register 2, execute the ALU unit to add register 1 & 2 together & copy the ALU result into register 3. That is 4 clock cycles per add instruction which is an IPC of 0.25.
1. Now optimize the add instruction by copying both numbers to the registers while the ALU is executing, and copy out the old result before it finishes. Now it gets done in 2 clock cycles for a IPC of 0.5 or a 50% increase in speed while remaining at the same clock speed.
3. Pipeline Improvements.
1. CPUs have gotten alot more advanced due to many layers of pipelining to keep data flowing. Compared to RAM, CPUs are 4-5 orders of magnitude faster. This means that the CPU can execute hundreds of instructions by the time ram gets back a single piece of data. This can dramatically reduce speed if your unable to keep enough data feed to the CPU to prevent it from twiddling its thumbs.
1. Branch predictors which executes future code based on results it is still waiting on to ensure unnecessary data accesses dont happen.
2. Cache lines which store frequently accessed data close to the CPU. Often multiple layers with each layer getting larger but slower memory (L1-L4 is common)
3. Pipelines which lets multiple parts of the CPU execute at the same time. If the ALU is busy doing some heavy long division, something like the branch predictor can come in and borrow the floating point units to work ahead
4. Wider/Vector instructions which operates on chunks of data instead of bytes. Instead of adding 2 arrays together, you can use a vector instruction which will consume the entire array in a single instruction done on dedicated silicon.
5. Multiple/Mixed cores. Instead of 1 CPU doing 1 task at a time, we not have 16 CPUs doing 2 tasks at once (hyper threading) where some of those CPUs can be dedicated to lighter/lower end work leaving less context overhead on the larger CPUs
Latest Answers