As an Electrical Engineering major, I have taking several introductory computer engineering courses. I have studied ARMv8 for a long time now. I know about registers, instruction fetch, arithmetic instructions, branching instructions, pipelining, and data forwarding. I know that some of these are specific to ARM only. ARMv8 is the only architecture that I know.
However, I am curious to know what exactly are cores and threads? And specifically for cores, how are instructions distributed to each core? And if a dependency exists from one core to another core’s instructions, is there a such thing as data forwarding from one core to another?
Lastly, kinda unrelated, but what is a Graphics Card and what are the differences between GPU architecture and the ARMv8 architecture that I have studied?
If someone could please answer these three questions, I would greatly be appreciated.
In: 13
A CPU with multiple cores is, effectively, multiple CPUs all living on the same chip that can all function more or less independently from one another. If you have a quad-core PC what you really have, in essence, is a PC with 4 separate processors that can multitask 4 different things at the same time. There’s some nuance involved with cores possibly being able to share memory space but that’s the high-level overview of it.
A thread is essentially “a task that needs to be done”. When you run a program on a typical computer, that program will have at least one thread created that represents everything the program needs the CPU to do. In general, one CPU core can be working on one thread at any given time. So a quad-core computer can, in essence, be thinking about 4 separate threads at any given time.
Some programs can even be *multithreaded*, spawning several independent threads that each handle a different part of the program’s logic that can be run independently. A common use case for this would be creating one thread to run the graphical UI logic, and another one to handle reading and writing data. Splitting the two up like this means that even if the reading and writing part hits a snag, the UI will never hang, because the threads are separate. Or, if you’re doing something really calculation intensive like rendering video, the work can be divided into as many threads as you have cores and your CPU can divide and conquer the workload.
GPUs are special kinds of processors that are designed for a very specific kind of job. A typical CPU will be very flexible, and can do just about anything you want it to do, but it has to process instructions one at a time. A GPU, by contrast, is only able to do really simple calculations, but it’s able to process hundreds of thousands of calculations all in one go.
The main task of most GPUs is for rendering display graphics (hence the name, Graphical Processing Unit). Graphics calculations aren’t hard, but you generally need to be able to calculate the color and lighting values for every single pixel on a screen. On a massive 8K screen, that’s a LOT of pixels to consider. And the refresh rate might be 144 times *every second*. A CPU is never going to be fast enough to do all of that calculation. But a strong GPU would be able to crank out those really simple calculations in big batches.
If you’re still confused about GPUs, Adam and Jamie of the Mythbusters put together [a pretty elegant demonstration of the difference between a CPu and a GPU](https://www.youtube.com/watch?v=-P28LKWTzrI) that I think will make this click for you.
Cores are just CPUs. One day CPUs got so small they could pack more than one CPU into the same chip.
Threads are when one CPU pretends to be two CPUs. Why? Well the CPU has a bunch of circuits in it to calculate different kinds of instructions, like addition, subtraction, multiplication, division, etc. And there’s a apart (the front end) that reads the instructions in the program and feeds them to those different circuits (execution units). If the numbers for one calculation depend on the answers from another calculation, that’s still being calculated, the front end has to wait before it sends that calculation to its execution unit. One day the engineers at Intel realized they had so many different execution units that it was basically impossible for one program to use them all at once – so they figured they could use more of them at once by making the CPU (core) run two programs at once, and those are threads.
Cores are real CPUs – a 4-core CPU can run 4 things at full speed at the sake time. Threads are fake – a core *can’t* run two things at full speed at the same time *but* it can usually run them at more than 50% speed, so it’s still an improvement.
Edit: this is the hardware kind of threads. Software threads / operating system threads are a different thing (but related)
I’ll go a bit more complex than ELI5 since you seem to know quite a bit.
> what exactly are cores
TLDR: Cores are multiple CPU’s on the same chip.
From about 1970-2005, technology kept improving, and we were able to make transistors inside a chip smaller and faster every year. For example a CPU from 1990 is about 10 MHz, while a CPU from 2000 is about 1000 MHz.
Then around 2005 or so, we started to hit physical limits of speed, but not size. Transistors kept getting smaller, but didn’t get much faster. So they started putting multiple CPU’s on a single chip.
Then the marketing and distribution people said, “Wait a minute. You’re telling me you’re putting multiple CPU’s on a CPU? How does that even make any sense?”
After a bit of confusion they realized that it was a language problem. To an engineer, a CPU is “a thing that executes programs.” To a distributor, a CPU is “a physical chip we can put in a box and sell to customers.”
So they solved the vocabulary problem by deciding CPU == chip, and core == thing that executes programs.
Sometimes you still see this vocabulary issue pop up. For example some OS’s or programs might report “4 CPU’s” when running on a system that has a single 4-core CPU.
> threads
A CPU runs a list of instructions. To run the list of instructions, it needs to keep track of:
– (a) What is the next instruction to execute (instruction pointer / program counter)
– (b) Temporary storage (general purpose registers / stack)
A thread is basically an independent copy of (a) and (b).
Threads are mostly a software concept implemented by operating system software. But the OS does use some hardware features, e.g. it uses a timer interrupt to stop running the current thread, jump to OS code that saves the current thread’s registers, figures out what thread to switch to, restores the registers of the new thread, and then jump to the new thread. (The OS logic for figuring out what thread to execute next is called the *scheduler*.)
If you run multiple programs, each one runs in a different thread. But a single program can also have multiple threads.
Each core can execute only one thread at a time. So on a 4-core system, at most 4 threads can execute simultaneously.
A typical modern PC will have 100 threads or more. This seemingly far exceeds the capability of a typical processor, but it works out for a couple reasons:
– The OS scheduler frequently switches threads. So for 0.01 seconds it executes threads A, B, C, D, then after 0.01 seconds it switches to threads E, F, G, H, then after 0.01 seconds it switches to threads I, J, K, L, and so on…
– A lot of programs use threads that spend most of their time waiting for something to happen (timer, I/O, or something happens in another thread). The OS usually removes these threads from consideration by the scheduler until the thing they’re waiting for happens.
> how are instructions distributed to each core?
Each core has its own instruction pointer (program counter). Each core talks to memory over the bus to fetch instructions and data. Typically all the cores in a physical chip share one set of bus lines and the outermost level(s) of the CPU cache (L2 / L3 cache).
> if a dependency exists from one core to another core’s instructions, is there a such thing as data forwarding from one core to another?
Each core has its own general purpose registers. So core A has no way to access registers on core B.
On the other hand, in theory core A and core B share memory. *But* they have their own L1 caches. And there are multi-CPU systems that have multiple CPU chips in their own motherboard sockets; those CPU’s don’t even share L2 / L3 caches.
So if you want to share memory between threads, software needs to use special instructions to make sure the memory’s adequately synchronized.
(Multithreaded programming has a reputation among software people for being difficult and mind-bending. Partly because there’s a lot of possibilities for subtle bugs that are hard to test for or reproduce, because they depend on the exact timing and sequencing of how different threads interact.
The best way to handle this is to design the program, or even the [entire programming language](https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.html), from the beginning to only use provably safe inter-thread communication patterns.
For example, well-designed multithreaded software often doesn’t directly share memory, instead it passes messages through queues, and lets the queue library handle low-level issues like memory correctness.)
Latest Answers