[ELI5] Why is SIMD still important to include in a CPU. When GPUs exist and serve the exact same purpose?

351 viewsOtherTechnology

SIMD to my understanding is supposed to accelerate 3D Processing and other multimedia processing. GPUs do the exact same thing but instead they’re dedicated into doing that role. I heard SIMD is an early attempt to make 3D Processing possible on CPUs

“GPU is a compute device which implements a SIMD (Single Instruction Multiple Data) in a much more multi-threaded fashoin, in fact SIMD is coined as SIMT (Single Instruction, Multiple Thread) in a GPU. So basically GPU is a an extension of the SIMD paradigm with large scale multi-threading, streaming memory and dynamic scheduling”

-Gary Cole, Quora

Unless I’m wrong. And there are many other things SIMD can be useful for other than Multimedia use and can help with traditional CPU tasks like integer operations but I have not found any info of SIMD being used outside of Multimedia use. It could do operations related to Audio which is very handy but couldn’t they instead be handled by a Digital Signal Processing?

My understanding of computer science is limited so I may gather a ton of knowledge to learn with this post

In: Technology

5 Answers

Anonymous 0 Comments

There is quite a bit overhead in transferring work/data between the GPU and CPU. There are many cases where the overhead of setting up the GPU to do the work will outweigh the savings, and this is even more likely if you need to get the results back to the CPU afterwards. Doing the work directly on the CPU bypasses this overhead, though there may be some other minor overhead due to alignment or register requirements, this overhead is extremely limited. A lot of this overhead comes from having to transfer the data across the GPU pipe, which is generally *much* slower than the CPU memory. This means that you ideally want the bulk of the data to get transferred once and reused – think data like textures or meshes, which are loaded once and reused across many hundreds of frames.

Additionally, the GPU is optimized to do extremely large chunks of identical data processing. Most GPUs run big blocks, such as 32*, *identical* operations at once, and sometimes more, where only input values can change. That is, if you tell the GPU to run a multiply, its going to run 32 multiplies with 32 different inputs – even if you only have a single input. If you want to run 33 operations, it will end up running 64 operations. When the number of operations is expected to be off, this wasted work can be a net negative. For comparison, the typical SIMD instruction on the CPU works on merely 4 inputs, though there is some variance.

The composite of these means there is still a lot of benefit of running smaller operations directly on the CPU. This even applies in cases where you need to do some moderate-sized bulk operations, but will need to completely swap around the data or there is a lot of complex data. In most cases, the overhead is only worthwhile in cases such as physics processing, graphics processing, and *some* cases of encryption or compression. Most cases of encryption and compression are better done on the CPU, due to how they use memory/data, but still benefit from SIMD calculations.

There has also been some recently movement towards supporting hardware loading, which is mostly useful for game loading, but can see benefit in other applications. The idea here is to provide methods by which data can be read off disk into GPU memory, decompressed, with the decompressed data written back into GPU memory, completely bypassing the bulk of the CPU overhead in the process.

TLDR: Good usage of the GPU requires more engineering work and applying specific design constraints. These are not always worth the costs involved, even if the calculation could benefit.

* The exact value depends on the exact GPU and configuration. A minimum block size of 32 is pretty common, while 1024 is often the highest supported size.

You are viewing 1 out of 5 answers, click here to view all answers.