I have a good understanding of what clock speed is, but why does it matter?
For the second question, I was wondering since for example, the new i9-14900K has a base clock speed of 3.2 GHz, whereas my previous desktop CPU, the i7-4790K, had a base clock speed of 4.0 GHz. Why hasn’t this number steadily gone up thought the years?
In: Technology
It’s very simple. If you want to add two numbers, it takes a certain number of actions (moving data from memory into registers, manipulating it, writing it back to memory). The more actions you can perform in a given time, the sooner your numbers will be added.
Think of it like a car being built on an assembly line. If you have 10,000 parts and you can install one every second, you’ll have the car done faster than the guy who takes two seconds per part.
In a computer, everything you ask the software to do requires millions of actions. The higher the clock speed, the faster those things happen.
We are approaching physical limits to how much we can affect clock speed.
It can only operate at the speed of light. Just imagine you have a distance of 10 atoms to travel in a circuit.
This is unrealistically stupid small. Inconcievably tiny.
Now imagine you have a circuit that is 20 atoms to travel. This is still inconceivably small… but it would take *twice* as long still at the speed of light.
Now multiply this concept by *billions* of microscopic components all working together to get from input to output.
Ok with me so far? Now imagine you have those billion components in front of you (at a human scale like a giant warehouse). You have billions of wires with the components as well.
Your job is to interconnect all of these components (in order correctly configured) with the shortest possible distance between all of them.
We are at a point where the components can’t really physically get smaller. All that’s left is to arrange them in the most efficient configuration, such that an electronic signal travels a shorter path overall.
This is no easy task, and is why that clock speed number isn’t growing quickly.
Clock speed matters in the sense that it is the only standardized way to compare speeds of processors. Because processors have their own architecture and instruction sets, the speed to complete any given task can vary from one processor to another. While one processor may complete a given task faster, it might complete a different task slower than another processor.
Clock speed – however – is the same across all processors. It is the maximum rate at which the processor can pulse. While this doesn’t translate directly into “speed” in terms of how fast tasks are completed, it’s useful because it’s a standardized measurement that doesn’t depend on anything else about the processor – so all processors are comparable in terms of how quickly they pulse.
The reason clock speeds haven’t gone up much is for various reasons. Components have limitations on how quickly they can transition between a 0 and 1. The processor cannot pulse faster than this transition time or the results will be unpredictable (you cannot know if the previous instruction was completed before you try to use the results and perform the next instruction).
But even when the components do allow faster transitions, higher transition speeds result in more heat generation. Given the relatively small size of modern CPUs, there is only so much surface area to pull heat out of the CPU. The more heat generated within the CPU, the higher the risk that the heat cannot be removed and begins to build up. If heat builds up in the CPU, it can damage or destroy transistors, rendering the CPU unusable.
The clock is the pulse of the CPU. It triggers new instructions, and then allows the results of those instructions to be synchronized when they are finished, ready for the next instruction. Generally, instructions are a single clock cycle, but some complex instructions do take more.
The reason you need a clock is that the individual binary gates (AND, OR, NOT, etc) that make up the CPU have a “propagation time). So if you send a binary “1” into a sequence of five gates, it will have a result faster than if you send it through a sequence of twenty gates. But you need all those results at the same time. At the end of all those gate sequences is a latch that holds the result, and then the clock signal transition releases all those latches at the same time. So your minimum clock speed depends on the longest gate propagation time of a single-clock instruction.
But wait, there’s more. The clock has to be distributed to all the parts of the CPU. It actually takes some time for that signal to propagate, as well. The larger and more complex the physical CPU, the longer it takes for the clock to propagate to all parts. This is why CPUs using smaller features can run faster, but more complex CPUs that are larger run slower. Using several CPUs on a die allows management of feature size, complexity, clock propagation, and performance.
There is a cost to keeping everything in step like this – heat. Heat is the result of gates changing state. So clock speed also manages heat in the CPU. This is why modern CPUs use boost clocks and throttling to manage the balance between performance and heat.
I know that ARM and others were looking at clock-less “asynchronous” CPUs at one stage. Here, everything just cascades and once all the latches are in a final state, the next instruction proceeds. Saves on power and reduces heat, but much harder to design and ensure that everything actually has completed before moving on. I suspect it has turned out to be a computational dead end.
Inside of the CPU is a bit of quartz crystal. These crystals have special property that allows them to vibrate at specific speeds when electricity is applied. Those vibrations can be used to control electricity passing through different parts of the CPU with extremely precise timing.
Let’s say you press the “K” button on your keyboard. The keyboard needs to send pulses of electricity to the CPU so that it can make the letter appear on your screen. These pulses have either “High” voltage or “Low” voltage. These highs and lows are often displayed in binary as 1s and 0s. The binary code for the letter K is “01001011”
So, imagine a single pin on the CPU is used to input keyboard commands. The CPU has to read the voltage on this pin 8 times to determine the letter. That’s where the quartz timing comes in. It controls how often the CPU will check this pin and read either a 0 or a 1 based on the voltage. Higher rates of pulsing allow the CPU to read all 8 values faster.
Each time the clock pulses, all the pins on the CPU is read, and electricity will move around the inside of the CPU through billions of little transistors. Electricity doesn’t like moving around and actively resists it. You can think of this like friction from moving inside the wires. That friction causes everything to heat up and too much heat will melt or burn out parts.
So to recap, the clock speed is important because it controls how frequently voltage values are sent to and received from the CPU, and therefore how quickly the CPU can perform tasks. It also controls how quickly heat is generated from moving those voltages.
Modern day CPUs have additional resources to allow them to perform certain tasks much more efficiently and can accomplish some tasks faster, even with a slower clock speed. In addition, modern CPUs are really multiple CPU cores combined into a single package. Each core can work on a different task simultaneously, making everything faster. It’s like cloning yourself 3 times. It’d only take you 2 hours to finish your work day instead of 8!
Because of these modern efficiencies, CPUs can operate with less electricity, and slower clock speeds and still complete tasks much faster than older models.
This isn’t really a 5-year-old answer but there’s an equation that describes dynamic power consumption in CMOS devices, which is one of the more popular processes for fabricating CPUs.
P = C * V^2 * f
P is power, c is switching capacitance, V is voltage, and f is frequency.
If you want optimize power consumption, voltage matters a lot the operating frequency also matters. C naturally gets smaller with more advanced process nodes. Therefore, IPC or instructions per clock is quite important for efficiency.
1) You can do something every time the clock ticks. In fact it’s the pulse of the “tick” that makes things happen. ie; imagine pressing a button and that makes everybody in a factory do one thing…lift an arm, move one foot. So the more times and the faster you click, the more you get done.
2) Engineers have come up with ways to make CPUs do MORE THAN ONE THING per clock tick. Therefore a CPU from 10 years ago maybe could to 1000 things simultaneously today you can do 5000 things simultaneously per clock tick. Therefore you can use fewer clock ticks
3) Why you want slower, but more efficient CPUs. It take more power and creates a lot more heat the more you click the button. So being more efficient per clock tick and lowering the clock speed allows you to not push the envelope of the power/heat usage. ALSO, the reason some things are more efficient is because there are more “things” packed into a CPU to do more tasks, but that means they must be smaller too, so that means you have to use a slower clock speed to avoid problems with the smaller components (transistors actually) BUT since each thing does more things per clock tick, it works out better.
ELI5 Imagine a factory with 1000 robots in it. Each time you clap, the robots EACH do one thing. Say they are making toy cars. One robot grabs a wheel, one robot attaches a wheel, but it takes multiple movements to do each thing (move arm. Open Hand. Lower arm. Close hand on wheel. Lift arm. Move arm to car. Push wheel on car). So the more you clap, the more things get done. The faster you clap, the more claps you can get done in an hour, and therefore the more toy cars you can make.
Fast forward 10 years. Now you have 5000 robots in a factory. But you’ve also made things more efficient. Now when a robot picks up a wheel, another one picks up the car, then they move the car and the wheel together in one step, then they attach the wheel. Also, each robot now has three arms instead of two. With each clap, they can actually do three things.
CPUs are similar. Plus part of the efficiency is in the progrmming too. For example. Say your task was to make either a red car or a blue car based on if a customer was a boy or a girl. You would first check if the customer is a boy or a girl, then based on the answer, you would paint a car red or blue. What you do instead is this. When a customer arrives, you make one red car, and one blue car. When you look to see if the customer is a boy or girl, you give them the correct car and throw the other one away. OR you keep the other car and immediately make another ar of the opposite color, to prepare for the next customer. Now you are much more efficient as you are only ever painting one car at a time and you do it in advance of finding out if the customer is a boy or girl.
There are 3 different factors at play here.
1. Clock speed limit
1. CPUs can only go so fast because there is a propogation delay within the silicon and transistors as the signal moves across the chip. At multiple points in the chip there is dedicated hardware that has to be used to boost the signals because they get too weak before they can reach the other side of the CPU. Modern CPUs run so fast that multiple parts of the chip are executing 4-10 clock cycles behind the main clock signal. Even more so when your talking about external buses like PCIe.
2. IPC Improvements.
1. Each clock cycle just represents a step that the rest of the hardware can synchronize around as they push and move data. So to add 2 numbers together, You would need to load number A into register 1, loader number B into register 2, execute the ALU unit to add register 1 & 2 together & copy the ALU result into register 3. That is 4 clock cycles per add instruction which is an IPC of 0.25.
1. Now optimize the add instruction by copying both numbers to the registers while the ALU is executing, and copy out the old result before it finishes. Now it gets done in 2 clock cycles for a IPC of 0.5 or a 50% increase in speed while remaining at the same clock speed.
3. Pipeline Improvements.
1. CPUs have gotten alot more advanced due to many layers of pipelining to keep data flowing. Compared to RAM, CPUs are 4-5 orders of magnitude faster. This means that the CPU can execute hundreds of instructions by the time ram gets back a single piece of data. This can dramatically reduce speed if your unable to keep enough data feed to the CPU to prevent it from twiddling its thumbs.
1. Branch predictors which executes future code based on results it is still waiting on to ensure unnecessary data accesses dont happen.
2. Cache lines which store frequently accessed data close to the CPU. Often multiple layers with each layer getting larger but slower memory (L1-L4 is common)
3. Pipelines which lets multiple parts of the CPU execute at the same time. If the ALU is busy doing some heavy long division, something like the branch predictor can come in and borrow the floating point units to work ahead
4. Wider/Vector instructions which operates on chunks of data instead of bytes. Instead of adding 2 arrays together, you can use a vector instruction which will consume the entire array in a single instruction done on dedicated silicon.
5. Multiple/Mixed cores. Instead of 1 CPU doing 1 task at a time, we not have 16 CPUs doing 2 tasks at once (hyper threading) where some of those CPUs can be dedicated to lighter/lower end work leaving less context overhead on the larger CPUs
Latest Answers