Why does hyperthreading increase CPU performance?


I am not able to get this.

Lets say that 1 cpu core takes 1 second to do 1 operation and it requires full focus on that 1 task to finish it in 1 second. So, in this case a 4 core cpu will do 4 operations per second. How does hyperthreading increase it to 5 or 6 tasks per second.

Where is the extra power coming from?

In: Technology

The idea is that not every task requires the full power of the CPU core to execute, so the CPU is sent two tasks to do simultaneously (which will use the full power) to speed up task execution.

As a crude metaphor, imagine you are putting boxes on a shelf. It takes you 3 seconds to pick up the box, put it on the shelf, and reach down for the next box. However, each box only weighs 10lbs and you can lift 20. You can speed things up by stacking two boxes on on top of the other, effectively _doubling_ your ability to put boxes on the shelf with the same movement.

The CPU isn’t a single unit, each core is actually multiple blocks

If I give you an array of 10 numbers and tell you for each item in that array first add 4, then multiply by 3. If we’re restricted to a single operation at a time then you have a lot of wasted time because the adder is empty when you’re multiplying and the multiplier is empty when you’re adding

Superscalar architecture works to split instructions so that it can keep the blocks busier so while its multiplying for the first item its adding for the second so that nothing is empty

But in real programs you often won’t be filling all the blocks all the time. Hyperthreading lets two different programs run on the core at the same time with their own memory so they don’t get jumbled, this does an even better job of ensuring that all of the blocks are doing something as much as possible.

Consider cooking a large meal with a meat and a sauce and a pasta. Do you cook the meat in a pan, finish it, then the sauce in another, then the pasta or can you cook all three on the stove at the same time? You have spare pans and burners, but you need the capacity to focus on multiple things in succession, that’s what hyperthreading gives the CPU

A CPU core has many parts. One accesses memory, one performs simple arithmetic (addition, subtraction, etc), multiplication is usually a separate piece, floating point is separate, encryption is separate, etc.

Hyperthreading takes advantage by trying to run two jobs at the same time but having them share the parts of the CPU. If one wants to execute a multiplication instruction and the other wants to decrypt data, both can run at the same time. It’s only when they want to use the same part that they have to take turns. So hyperthreading isn’t a doubling of speed due to these situations where they bump into each other, but it can really help.

It’s like having two cooks in a kitchen. Though there’s still only one sink, one oven, and one cutting board, you can still get dinner prepared faster by having another person working. Not double speed, but a lot better.

>it requires full focus on that 1 task to finish it in 1 second

This just isn’t true of modern CPUs.

Modern CPU design involves the use of what are known as “execution units.” The CPU decodes each instruction and dispatches it to be executed to an execution unit and while it is being executed, which might take a few cycles, it starts handling other instructions, and uses some complex logic to make sure that the new instructions aren’t dependent on old instructions.

So integer addition is performed by the integer addition module, memory reads and stores performed by the read store module, floating point additions by that module, etc. And each core has multiple of these modules, so that it can do multiple say floating point additions at the same time.

So what simultaneous multithreading (hyperthreading in Intelspeak) does just make it so that each core has two of these “instruction decode, dispatch, and register file” units attached to each “execution engine.” So they share the same execution engine, and if some modules aren’t used like say one process isnt doing floating point additions and the other is, everything works out, there are free execution units for floating point additions so the two threads can share the same execution unit.

The processor is very fast, it can do hundreds, *thousands* of operations while it’s waiting for more data to come from the memory sub-component, so it’s very possible to improve efficiency with doing calculations on multiple threads that may already have their data loaded into the processor’s local cache memory banks.

There’s a lot of waiting in the computer; the time it takes you to type one letter, the processor can do a hundred million calculations, and the hard disks, printers, internet, etc. aren’t very fast either (they’re faster than human typing speed, but still very slow in terms of processor speed).

So basically the architecture is set up with several layers of cache memory banks, smaller capacity but faster and faster, inside and near the processor, to hold temporary chunks of data for the processor to work on while it’s waiting for things.

Lets say thread 1 is chugging along at 4 operations per second, but it comes across an operation it doesn’t immediately have the data for, it has to wait for that data to be fetched from RAM, which might stall the processor for ten seconds or so.

Hyperthreading lets the processor switch to a second thread while the first one is waiting on the memory access, keeping utilization higher.