CPUs use a few fast cores and are much better at complex linear tasks and GPUs use many weak cores and are better at parallel tasks. To use an analogy, the CPU does the hard math problems and the GPU does many, many easy problems all at once. Together they can tackle any test quickly and efficiently.
A typical CPU these days will have something like 8 cores/16 threads meaning that it can do up to 16 things at once. Each core is very powerful and designed to be general-purpose so they can do a wide range of things. The things that are best done on CPU are tasks that are serial meaning that the previous step needs to be finished because the result of it is used in the next one.
A typical GPU may have something like 2304 stream processors, meaning that it can do up to 2304 things at once, but what each stream processor can do is much more limited. What a GPU is most suited for is doing math on a big grid of numbers. With a CPU, it’d have to calculate those numbers 16 at a time (actually, less than that because the CPU has to do other things) but with a GPU, you can do math on those numbers 2304 at a time.
But it turns out that graphics are pretty much nothing more than a big grid of numbers representing the pixels. And a lot of scientific calculation involves doing math on huge grids of numbers.
GPUs are good at solving a lot of simple problems at once. A good example is graphics…. I need to take every pixel (and there’s a million of them!), and multiply each of them by .5. Anything you can convert into adding/multiplying large groups of numbers together, it can do really fast…. which is frequently needed to render graphics. But they can’t do all operations. They are very specialized to working with big lists of numbers. Working with a large list of numbers is all it can really do, and it can only do a handful of operations to them. But if the operation isn’t supported, you’re basically totally out of luck. Luckily the things it can do are common ones. These operations share some commonality with artificial intelligence and physics simulation as well. But it doesn’t do well with directions with a bunch of decisions. GPUs want to work on a whole list of things at once.
CPUs are good at doing a bunch of different types of tasks quickly. It’s a jack of all trades. It can work with big lists of numbers… but it’s slower at it. But it can do all sorts of things that the GPU can’t. CPUs are good at following directions that have a bunch of decisions. Everything from making the keyboard work with the computer to talking to the internet requires a lot of decision making. With this ability to make a bunch of decisions, you can come up with some kind of solution to any problem.
A CPU has a few cores clocked very high. The Ryzen R7 3700X is a pretty mainstream CPU and has 8 cores.
A GPU these days has a few thousand cores clocked low. A Radeon 5700 XT has 2560 cores. That’s 320 times the cores of one of the most popular desktop CPUs.
This difference in clock speed is down to many things but mostly power consumption and heat. Double something’s clock speed and its power usage *more* than doubles because physics. (This is why downclocking a video card just a little bit can save a lot of power for a small loss in performance.)
In addition to the core count, the underlying architecture of a GPU and CPU is different. Keep in mind, a GPU is basically a mini computer on a card. It has its own CPU, which we refer to as a GPU, and its own RAM.
* GPUs are very efficient at one particular problem: multiply-add. This is *very* common in 3D rendering. They can take three sets of 4 numbers, multiply the first two together then add the result to the third. CPUs are capable of this too but it’s almost cute given the difference in core count.
* The bigger difference comes in how a video card can use its local memory vs a CPU using system memory. System RAM traditionally (DDR4, these days) is built to be accessed in lots and lots of small chunks. One number here, four numbers there, two numbers yonder. It is low latency but relatively low bandwidth (not a lot of data at once but a very small delay). A GPU’s RAM (GDDR6, most recently) is high latency but much higher bandwidth (a shitload of data but often a large delay).
This difference in architecture means that the two can serve polar opposite functions. A CPU can process a long string of calculations with data coming from all over RAM very quickly, but don’t ask it to do too much at one time. A GPU can process a shitload of calculations all at the same time but don’t ask it to access lots of different bits of RAM.
And finally, one of the shitty parts about how computers are built is that the CPU controls data going in and out of the GPU. This communication can be slow as shit. See: the purpose of DirectX12/Vulkan over DirectX11/OpenGL.
A CPU can do a few things quickly, and a GPU can do a lot of things slowly.
Imagine you have to get from New York to California and back as fast as you can. You can take any car you want, but only you are allowed to drive. You’d get the fastest sports car you could, and drive it as fast as you can. But if you had to take 30 people, you’d want to take one trip with a bus instead of 30 trips with the sports car.
CPU and GPU is the same idea. When you make a picture for a game or video, each pixel can be done without worrying about the other pixels – so you have a few million pieces of math that have to be done, it would be better to do them slowly but in big batches than quickly but one at a time.
(ELI25 notes)
There’s also some fundamental differences in the memory model and instruction sets between CPUs and GPUs. GPUs are designed to perform operations important to graphics programming quickly – for example, trigonometric functions that take many cycles on a CPU typically complete in few (usually only a single) GPU cycles. GPUs also have many areas of memory with different sharing characteristics, while CPUs generally just have the RAM and varying levels of cache.
In addition to what others have said, CPUs are good at things like:
– Compare the coordinates of the bullet object and the opponent object.
– If they are the same, then:
* Read the score stored at a certain location in memory.
* Add 10 to it.
* Write the number back to the memory location where the score is stored.
* Look up the memory location where the start of the “show opponent dying animation” routine is stored.
* Remember what part of the program we’re currently at.
* Temporarily go to the “dying animation” part of the program we found earlier.
And so on, and so on, and so on. CPUs are really, *really* good at doing relatively complicated steps like each of the above. But because each step might have lots of nitty gritty details, they take a lot of work for the CPU to actually do them. (Read about [instruction pipelining](https://en.wikipedia.org/wiki/Instruction_pipelining) if you want to go down the rabbit hole of how complicated a modern CPU actually is behind the scenes).
GPUs can’t do anything nearly that complicated. Their “programs” are more like:
– Find the chunk of memory starting at a particular location.
– Add 3 to the first 1,000 numbers you find there.
Or:
– Here’s a list of 10,000,000 decimal number, like 2.3 and 4.7. Add each pair of numbers and divide them by 2, and put the results in another list. Oh, and if it lets you go a little faster to pretend that 2.3 is really 2.9999999987, go for it: raw speed is more important than perfect math here.
They can’t do things like make complicated decisions or jump around to just another part of their programming. They don’t have the circuitry to do that stuff. But those simple little instructions like I described? They’re smoking fast at those things, and at doing *a whole awful lot* of those simple little instructions at the same time. A CPU can do all the same things a GPU can, but it doesn’t have the circuitry for “do this one thing a gazillion times” kind of operations.
Or TL;DR:
– A CPU is like having a mathematician sitting at her desk solving hard problems.
– A GPU is like having a thousand kindergartners counting to 10 on their fingers, but all at the same time.
Imagine a CPU like a sports car moving at 100kph. It holds 2 people and gets them to point B very very quickly.
Now imagine a GPU like a big ass bus moving at 10 kph. It can hold 50 people. But gets them to point B very slowly.
Basically, a CPU does a few things fast. And a GPU can do multiple things at the cost of speed.
CPU are general purpose calculator. It is excellent at nothing, but also bad at nothing.
GPU are specialised calculator. It is excellent at graphic stuff, and bad as a general purpose calculator.
The reason is simple: graphic is a set of instructions that repeat itself alot, so it is worth to combine many standard instructions into one single one and super optimise that function. Since this function will be used only in this context, they can sacrifice the flexibility of it for the gain of speed.
As a wrong example, it’s like if you had to calculate the volume of a polygon. The CPU would do it the hard way, like you would do it by hand. But the GPU would have a “gimme the 3d coordonates and I will tell you the volume” function. So the GPU you throw in the 3d points, it use it’s super optimised function (maybe even get help from some look up tables), and return the result in a fraction of the time it would normally take for a CPU.
Also, a CPU will have a few cores, while a GPU now have often several thousands of cores. They are slower, so you have to split the problem in many small pieces. Which is fine for a 3d image: it’s full of polygon, just send a few thousands at a time to be processed. A cpu may do each faster, but can’t compete at all with the thousands of the other.
Another thing that a GPU is good at: sorting matrix. Feed it a list, here come a sorted one. A CPU do not have such function. Reason being, a GPU deal with that. Alot.
But… Thousands of slow cores… It also mean that each result take more time to come out. For a single, simple task, the CPU will most likelly do it faster: it’s single core performance is higher for general purpose use. And sometime by a big margin! However, if you have thousands of repetitive tasks that can be done in parallel, then the GPU will probably beat it.
CPU “waste” silicon trying to predict the future (branch prediction), to remember the past (cache) and to have it’s different cores try to agree with each other (coherency protocols).
GPU is the dumb but effective approach: every body does the exact same thing, on data that are right next to each others. They can’t do anything else, they can’t wait, they don’t “think”, they don’t talk to their neighbors, they just do.
Latest Answers