Why can’t we just pack more and more ALUs in a CPU to increase processing throughput instead of increasing clockspeeds? Wouldn’t the gain be just as significant?

1.06K views

Why can’t we just pack more and more ALUs in a CPU to increase processing throughput instead of increasing clockspeeds? Wouldn’t the gain be just as significant?

In: Engineering

6 Answers

Anonymous 0 Comments

A 5 year old doesn’t know what an ALU is, but here’s an ELI5 explanation: I want you to color this picture, and we are going to time how long it takes. Now I want you to color this picture, but use both hands. Why did you not get it done in half the time? Sometimes the two spots you want to color are close together, and your hands are trying to take up the same space. Sometimes they are far apart, and you can’t look at both of them at the same time. Sometimes all the parts that are left need to be colored blue, and you only have one blue crayon. All the time, it’s very challenging for your brain to move two crayons at the same time and stay inside the lines.

A more technical explanation: It sounds like you are familiar with a basic CPU model with one register file, one arithmetic-logic unit, one load-store unit, and some miscellaneous parts. No modern processor that I am familiar with is like that — they all have redundant parts to support parallelization. But however many ALUs a processor has, they could always have more, right? So let’s talk about why we get to diminishing returns rather quickly.

Another commenter already explained the concept of data dependence, but it is important, so I’m going to do so also. If your first instruction is “c = a + b” and your second instruction is “d = c – 4”, you can’t start working on the second instruction until you know the answer to the first instruction. Now you are only using one of your ALUs at a time. Maybe your instructions can be re-ordered so that some later instruction “e = f + g” can be done at the same time as “c = a + b”. In fact, modern processors all do this. But the hardware necessary to logically determine which instructions can be started when and how to forward results between instructions is very big and complicated, and big and complicated electronics are also (relatively) slow, power-hungry, and heat-producing.

Even if we didn’t have the headache of trying to figure out how to utilize all of our hardware as much as possible without getting invalid results, just the fact that the processor has more parts is already a problem. More total parts means a longer distance between the two furthest away parts, which means that it takes longer for an electron to get from one to the other, which forces your clock to slow down. As mentioned earlier, more parts means more energy consumed, which means more heat produced. And heat is a big problem, because you don’t want your CPU to melt.

So the trend over the last decade+ has neither been for higher clock speeds nor more complex processors, but instead for a larger number of relatively slow, relatively simple processors all working on independent things. Going back to the ELI5 explanation, you are giving another page and another box of crayons to a friend. Since you are working on independent problems using independent tools, you now have double the coloring speed whereas you trying to use both hands probably were even slower than you with one hand.

You are viewing 1 out of 6 answers, click here to view all answers.