Why do programs have to be manually optimized for multiple CPU cores? Why is single-core performance such a bottleneck?

1.12K views

For a long time, single core performance has been the most important feature for gaming. Though we are getting better multi-threaded games, we are still pushing for the maximum single core performance instead of cores. Why can’t 16* 2ghz cores do roughly as good job as 8* 4ghz (of the same CPU architecture), especially in gaming?

They say that software programmers have to manually split the jobs for multiple cores. Why? Why does the OS even need to operate multiple cores in the first place? To me this sounds like bad workplace management, where the results depend on pushing the limits of the same few people (cores) instead of splitting the work. I feel like making just a big bunch of cheap cores would give better performance for the money than doing tons of research for the best possible elite cores. This works for encoding jobs but not for snappy game performance.

Now, one limitation that comes to mind is sequential jobs. Things where the steps need to be done in a certain order, depending on the results of the previous step. In this case, higher clock speed has an advantage and you wouldn’t even be able to utilize multiple cores. However, I still feel like the clock speeds like 4 000 000 000 cycles per second can’t be the limiting factor for running a game over 150 frames per second. Or is it? Are the CPU jobs in game programming just so sequential? Is there any way to increase the speed of simple sequential jobs with the help of more cores?

Bonus question: How do multiple instructions per cycle work if a job is sequential?

Bonus question 2: GPUs have tons of low power cores. Why is this okay? Is it just because the job of rendering is not sequential at all?

In: Engineering

6 Answers

Anonymous 0 Comments

>Bonus question 2: GPUs have tons of low power cores. Why is this okay? Is it just because the job of rendering is not sequential at all?

This shouldn’t be surprising. A lot of the work that could be easilly parallelized is already heavily parallelized in the GPU, and as GPU’s become more general purpose they will absorb more of the parallelizable work besides rendering, so in the end what’s left for the CPU is the part that’s not so easy or not possible to parallelize, this already answers part of the question.

>Bonus question: How do multiple instructions per cycle work if a job is sequential?

It can happen in a mostly sequential program that there are many small operations where the order doesn’t matter, so there’s potential for a degree of micro-parallelism. For example, if you had something like x = 2+2, y = 4+5, z = x+y, then you first need x and y to calculate z, but x and y could be computed in parallel within a single core.

But ironically when you want to share variables between threads you may lose some of this optimizations in order to ensure data consistency between them. Sharing data makes multithreading harder but can’t always be avoided. When there’s a single thread the number of possibilities is limited and under the single thread assumption modern compilers can rearange and optimize the code to make it perform better, but when there are multiple threads the developers must be very specific about how they share data and interact which each other.

You are viewing 1 out of 6 answers, click here to view all answers.