It is common for modern video games to be badly optimised due to, partly, inefficient usage of multiple cores on a CPUs.
I’d assume that developers know this, but don’t have the time to rectify this.
So, therefore, what are the difficulties in utilising various cores of a CPU effectively? Why does it require so much focus, money, or perhaps time to implement?
In: Other
Any time you want to work on multiple thread consistently everything is an exponent harder.
That’s on top of other complexity. For example…
Open world games introduce a lot of complexity, make it multiplayer and everything is an exponent more difficult.
Then do both in a asynch, multithtreaded way and you raise complexity by another exponent.
Btw, complexity must be seen as man time needed to deliver the task, the higher complexity the more time it takes to deliver a mechanic in the game.
At that point do you want to make a nice 60fps stable game or you suck up some part of it running at less than 40 to have some fun mechanics?
It’s tradeoff
Because parallel execution is hard and debugging it is a **MASSIVE** pain, and it’s hard to reproduce.
It happens for things that should not interact with each other too much (your auto save process, your next area preload, etc…) but race conditions are a pain.
And it gets exponentially more complex as you add threads.
If you think it’s easy try to give 5 forks to 7 people at the table on a 3 course meal with no course collision or any starving while looking at their full plates.
If you’ve ever cooked, you’ll know that adding another person to help you doesn’t necessarily make it go faster.
You can’t just add a second chef and go twice as fast, you have to coordinate: which parts will each of you work on? Who will use the knife and cutting board when? How will you make sure you’re not adding the same ingredients twice? How will you make sure the other chef isn’t waiting around for you to finish something?
Multi-core is like adding another chef (or 16). You need to plan and change how you’re going to cook. Video games in particular are long and complicated recipes, with lots of steps that rely on each other and lots of single cutting boards to share. If you’re getting by with one chef, why add another?
Let’s say a game is getting ready to render a single frame. Before it can render that frame, it needs to complete 1000 tasks/computations.
Some are straight forward. You send the that task off to another cpu (thread), itvcomes back with an answer. It’s done for that frame.
Some tasks stack, you cant run task c until task b completes, but that can’t be run until tasks a completes. So that limits the advantages of multi processing.
But we are trying to go as fast as possible, so we’ll have more complicated situations where task a + b can be run at the same time, before we can run task c. Or some more complicated version of this to, again, go as fast as possible.
Some tasks also take longer to complete than others, so you want to process those tasks in the most efficient order.
Now we also have to consider memory, as some tasks (fairly unrelated) needs access to some data in memory. In multi CPU systems, there are several layers of memory. The closer to the cpu, the faster the memory, but the smaller it is, and the more you have to copy that data between CPUs.
There are memory pools like ram that can be accessed without copying data around but that’s the slowest memory. But remember we’re optimizing here, so now we have to figure out what orders these tasks need to run, considering the data we need to access, copy the data to the appropriate CPU / data pools, trying to process tasks that access the same data, and the stacking of tasks.
On top of all that, we are trying to optimize the calculations/algos of all 1000 tasks. Which in itself is a bit of an art, balancing pure speed, with code complexity. Which may need to be updated, rebalanced, reomptimized when a change is introduced elsewhere. You might put dozens of hours multithreading some tasks only to for it to be cut out due to a feature change. So that’s a lot of wasted time that could have gone to big fixed or more content.
So also you can imagine, balancing all these things between cores, to work as fast as possible, in an environment where features are often changing, is complicated.
Two reasons
1) one is legacy, 15 years ago a game would be slower if it was multithreaded as there is a cost to context switching. While modern CPU’s are far better at this sometimes it takes a while for everyone to adapt. However the latest cpus have heavy versus light cores. So now you kind of want to still have one big heavy thread going to make sure the OS and CPU assigns it to a performance core.
2) locking isn’t free nor is it easy. I’m trying to think of a comparable example. Maybe think of things like sequenced swimming competitions, an air show with multiple plans or even a ballet.
It’s very different problem to have one person or object (a plane) doing tricks. But when you have 20 people doing it then each object has to be careful not to interfere with the others except at designated points in time.
The same thing about just doing work. If you know what needs to be done, you can just do it yourself. But if you get 20 people together for the same job, it’s not going to finish 20 times faster. Additionally you’ll be spending far more time trying to “manage the chaos” than doing the work yourself.
Because most of the actual calculations done in a video game run on the GPU… and are already parallelized to take advantage of that architecture.
What calculations are run on the CPU tend to have nightmarish dependency trees that need to be managed. If you botch parallelizing your code, you get even worse performance… and it makes debugging an absolute nightmare.
All great answers. But please also remember that most video games dont really need to tap all the available cpu power. What they are trying to do is easily doable with small cpu power. That is why some games today come with a performance mode where they lower the render resolution or other gpu dependent effects and deliver double the framerate. These games dont sacrifice cpu dependent effects because they are not using much of it to begin with.
“nine women can’t make a baby in one month”
All games have parts that have to be done step by step, cant ever be parallel. And parts that can be parallelized; to do the parallel parts can be very time consuming and generate an infinite number of bugs.
So the studio has to decide do we dedicate 6 months to make some of these thing parallel or we spend that time in the game?
And so building the game wins every time, the low hanging fruits may be implemented in multi core if there is time but there is never enough time in game development.
Latest Answers