Why do programs have to be manually optimized for multiple CPU cores? Why is single-core performance such a bottleneck?

1.13K views

For a long time, single core performance has been the most important feature for gaming. Though we are getting better multi-threaded games, we are still pushing for the maximum single core performance instead of cores. Why can’t 16* 2ghz cores do roughly as good job as 8* 4ghz (of the same CPU architecture), especially in gaming?

They say that software programmers have to manually split the jobs for multiple cores. Why? Why does the OS even need to operate multiple cores in the first place? To me this sounds like bad workplace management, where the results depend on pushing the limits of the same few people (cores) instead of splitting the work. I feel like making just a big bunch of cheap cores would give better performance for the money than doing tons of research for the best possible elite cores. This works for encoding jobs but not for snappy game performance.

Now, one limitation that comes to mind is sequential jobs. Things where the steps need to be done in a certain order, depending on the results of the previous step. In this case, higher clock speed has an advantage and you wouldn’t even be able to utilize multiple cores. However, I still feel like the clock speeds like 4 000 000 000 cycles per second can’t be the limiting factor for running a game over 150 frames per second. Or is it? Are the CPU jobs in game programming just so sequential? Is there any way to increase the speed of simple sequential jobs with the help of more cores?

Bonus question: How do multiple instructions per cycle work if a job is sequential?

Bonus question 2: GPUs have tons of low power cores. Why is this okay? Is it just because the job of rendering is not sequential at all?

In: Engineering

6 Answers

Anonymous 0 Comments

Imagine you’re changing all the tires on a car.

With just one person, it takes forever. With two people, you can work on two tires at once. But why stop there? If we throw 16 or 64 people at it we can change it faster than an Indy 500 team, right? Well not really. You’re still limited by your equipment, just like single-core performance. It doesn’t matter how many people you have or how fast the other steps are if your jack is a slow scissor jack. And you need a human to decide which tasks get grouped in parallel so you don’t have one guy trying to screw lug nuts on a new tire before another guy gets the old tire off.

However, having help on each tire can speed things up, but not because you’re creating more core teams to handle more groups of tasks, but because you’ve got a pipeline so one person can be readying the next task while the last person is finishing his old task.

(There are actually some attempts to let the CPU decide what can be run in parallel in a pipeline instead of the programmer through techniques like hyperthreading. But this happens per instruction, and may affect data in the instructions right before or after the hyperthreaded line, which makes this practice a bad candidate for parallelization across cores)

It’s also worth pointing out here that not only is upgrading equipment (like a hydraulic jack) faster, it’s also more efficient. Even if you could change four tires with 100 low-paid guys using cheap equipment, imagine the amount of body heat that mosh pit crew would create.

But what if we had to do a really repetitive job, like buffing and polishing the car. If we got 50 people to polish at once, we’d get done way faster than a skilled team of three people. This is the idea behind GPUs: that certain tasks like graphics can have dozens or thousands of the same calculation done in parallel.

You are viewing 1 out of 6 answers, click here to view all answers.