If 64 bit CPU’s all run on a 64 bit x86 architecture, how are they getting faster?



So my question is this, if all modern day computer CPUs (central processing units) more or less have the same or similar clockspeeds from generation to generation as of late, and all run on a 64 bit architecture with an x86 instruction set, what about the processor is being altered or changed to yield performance gains from one gen to the next?

I understand that more cache is available, and transistor sizes are getting smaller and smaller, but at the end of the day, it is still running an x86 instruction set at 64 bits.

So last gens 4 core 4.5 GHz processor should run identically as quickly as a current gen 4 core 4.5 GHz processor theoretically – but that is not the case.

In: Technology

The biggest problem with making current CPUs go faster is feeding them data to work on so they aren’t ever sitting idle. It doesn’t matter if they crunch the numbers in 1 five-billionth of a second if they don’t have the numbers on hand to crunch.

A lot of the Instructions Per Clock (IPC) gains in recent years have come from always having data for the CPU to work on. One common technique that modern CPUs have is that when a program is waiting on something to happen that requires a decision to be made, the CPU will guess what the outcome will be and start executing instructions based upon that guess. This is called branch prediction. The more accurately a CPU can guess the correct branch and the faster that it can reset from a mistake will speed up how much data it can process.

Also, as you alluded to, caches are really important. Having reliable access, quick access to the data means that the CPU spends less time waiting for data to be transferred. One of the big ways that the most recent generation of AMD processors sped up IPC was by making it so that more cores could access the same pool of cache memory. That way there is a higher likelihood that the data it needs is already in cache instead of having to be loaded.

There are a lot of other behind-the-scenes performance improvements that CPUs can implement that don’t involve changing the x64 instruction set.

Even if the operations are the same, you can implement then with different levels of efficiency. Eg, suppose a shift operation. That is, we take a bunch of bits: `00110010` and shift them 1 position to the left: `01100100`. Well, we can do that in different ways. For instance, we could do it digit by digit:

1. Digit 7 moves to digit 8.
2. Digit 6 moves to digit 7
3. Digit 5 moves to digit 6
4. Digit 4 moves to digit 5
5. Digit 3 moves to digit 4
6. Digit 2 moves to digit 3
7. Digit 1 moves to digit 2
8. Fill empty digit 1 with a 0

And that takes us 8 steps. So let’s say 8 clock cycles.

Instead of that you could have 8 copies of the “move digit circuitry”, and perform all of these steps at the same time. Now it takes 1 cycle, but it’s more complex and takes more room. We have 8 times the speed, but it also takes at least 8 times more room.

One thing to add to this answer is that the semiconductor processes used to manufacture these things have gotten a lot better at making really tiny transistors. So N times reduction in feature size leads to an N^2 reduction in area. Now, for the same amount of area, you can play around with chip architectures that increase throughput, such as parallelism. So now we can fit even more cores in a CPU than we could twenty or even ten years ago. So with more cores, we can execute more instructions in the same clock cycle.

The change from 32 to 64 bits did not increase the speed for the most program because it is relatively rare that the operation you do require a lot of operations with integers larger than 4 billion.
Runnin 64 bits operation can in fact be slower than 32 bit is the extra bits are not needed because you need to transfer more data from ram and the bandwidth is limited. It will also limit the number of variables you can fit in the cache memory.
So when you run 64 bits program lots of all integer operation still use the 32 bit operation because they are faster.

The change from 32 to 64 bits is the primary one of memory because not the OS can handle more than 4 gigabyte of RAM that results in around 3 gigabyte of ram. For X86 the change to 64 bit added some other stuff lite more register and that increased the speed.


The cock frequency is also not a good measurement because it only tells you how fast the clock is in the CPU it do not tell you how may the cycle an operation takes. I am not the case that a CPU performed 4.5 billion operations per second if it runs at 4.5 GHz

The Pentium 4 CPU had an architecture optimized for high clock speed. It decided each instruction up to 31 steps. One instruction could start each cycle so 31instruction is run at the same time ad different stag of completion
So it is like if you have an assembly line and move stuff from station to station. It takes time to move stuff between stations. Pentium 4 got to high clock frequency because the amount of work on each stop was quick but is used a lot of the time so just end and start each cycle.

It is also a problem is there there is a step where what is the next instruction depends on the output of a previous. A CPU would need to wait for it do be complete but that takes up to 31 cycles or have to guess so it has done some work if it guesses correctly but need to restart if it guesses incorrect. The impact of an incorrect guess is a lot larger if you split it up into more steps so the Core architecture that followed got a better performance at half the clock frequency.


A CPU today take instruction and break them apart into micro-operations. It then looks as them if they depend on each other. If instructions is independent they are executed at the same time. The CPU has multiple units that does a calculation that can be used at the same time

You can look at AMDs [Zen architecture overview](https://cdn.wccftech.com/wp-content/uploads/2016/08/AMD-Zen-CPU-Architecture-7.png) It can decode 4 instructions for each cycle. It can den send out 6 micro-ops to the part of the CPU that execute them.
It has 4 ALU that is the part that do integer maths and two AGU that handle the transfer to and from memory.

To that, you should add that CPUs today have multiple cores so multiple CPUs in the same chip
Today you have at least 4 core so that is like if you had four Pentium 4 CPUs because they only had a single core.

So there is a loss of improvement that has been done and it is about doing multiple instruction at the same time.

The way in which one CPU model can be faster than another while having the same architecture and running the same program at the same clock frequency is by several clever tricks: 1. Running several instructions at the same time, 2. Using statistics to predict the outcome of troublesome ‘branch’ instructions that usually halt execution completely, 3. Introduce more cores, effectively allowing several programs to run simultaneously, side by side (or many copies of the same program), 4. Having larger on die memory caches for instructions and intermediate calculations, 5. Having faster memory bus speed to allow for faster memory access.