Why can’t we just pack more and more ALUs in a CPU to increase processing throughput instead of increasing clockspeeds? Wouldn’t the gain be just as significant?

1.07K views

Why can’t we just pack more and more ALUs in a CPU to increase processing throughput instead of increasing clockspeeds? Wouldn’t the gain be just as significant?

In: Engineering

6 Answers

Anonymous 0 Comments

That is done, it plays a large role in the recent performance increases.

For example, since Pentium 1 (earlier if you count exotic processors) desktop CPUs have been able to execute more than one instruction per cycle (normally you need more than one ALU for this). That is not as effective as a 2x speed increase would be, it has some limitations after all. For example if you had `a + b + c` then at 2x speed you could compute it in half the time, but with 2 simultaneous instructions .. you can’t do it because the second operation needs the result of the first operation, the second instruction has to wait for that result.

For an other example, you can make some special new instructions that do a lot of work in one go, for example “add these 4 numbers to these other 4 numbers”. You can use multiple ALUs to make operations like that fast. This is what SIMD is. This has even more limitations in addition to only working for independent operations. Eg maybe you can do 4 additions, or 4 multiplications, but you can’t mix&match, and if you only needed 3 additions then the 4th still happens but it’s wasted. As an analogy, it’s like having a bunch of clones of you to help you out, and they exactly copy your movements – you can’t use them to do different chores for you at the same time, and they don’t take out the trash any faster than you would on your own, but they would be great for raking the lawn.

You are viewing 1 out of 6 answers, click here to view all answers.