Hello good fellows.
Recently, the devs for the game Helldivers 2 mentioned that one of the biggest issues they’re facing is that their backend code was not designed to be able to handle the sheer amount of players they’re getting, and that they’re working hard to optimize it.
I understand the broad strokes of what they mean, TLDR things were made with X people in mind, they ended up getting 10X, things are oversaturated. But how can code only work with so many people at a time? I thought it was a matter of hardware resources like RAM or processing power taken by the code per user, but if that were the case I’d imagine just adding more servers would solve the issue, something they’ve stated won’t really help that much in the long run. So I’m left wondering what really goes into scaling that kind of stuff to accomodate for more users.
Thanks in advance.
In: Technology
> taken by the code per user
You’re assuming a linear increase of resources used per user. If your increase is linear it can still be a problem, but that is very decent scaling. However quite often when something hasn’t been thought out for anywhere near the number of users for instance that it gets, there’s a good chance that something somewhere in the system has worse scaling than linear. To take a simple (maybe unrealistic example I dunno)…
Every time a player fires a gun, we need to send a message to every other player this gun was fired.
So if you have 10 players, every time a gun is fired, 10 messages.
With 100 players, every time a gun is fired, 100 messages, BUT guns are also fired 10 times as often so you get in total 100 times more messages, not 10 times more. That would be quadratic scaling.
So that is usually where the problems lay with scalability.
Some part of the system does not scale linearly, or scales linearly up to a point and then quite simply doesn’t at all, etc.
It would be pretty difficult to give specific answers unless the information came right from the devs or someone actually had a chance to look at their code. But really it boils down to the fact that games and things like that are fairly complex systems made up of lots of smaller subsystems that all have to work semi-independently while still working with one another, and sometimes while being used by hundreds of thousands of people on different hardware doing unpredictable things. And just bumping up the resources in one of those subsystems won’t necessarily fix the problem by itself since that’ll also require updating all the other subsystems it interacts with.
I think I remember the head of Arrowhead saying what they were doing is basically trying to retrofit a Vespa so it could compete in F1 and that feels appropriate. Problem number 1 is your Vespa is too slow, so the obvious solution is to just put a bigger motor in it. Problem solved, you’re ready to compete, right? Except to get the power from the motor to your wheels, you need to deliver it through a drive train, and your drive train can’t handle all the new power. So you replace that, but then your scooter wheels just burn up. So now you need to swap in better wheels. But you can’t just swap out the wheels because they won’t fit unless you update the frame of your scooter. But if you update the frame your new drivetrain won’t fit either. So now you’ve got to go back and update that again, and so on.
Obviously that’s not exactly a perfect 1 to 1 analogy, but the servers being full doesn’t necessarily mean you can just add more servers and call it a day. The data from all those servers has to connect to other systems that will also need updates to handle the extra data, and that will require extra time and testing to make sure those updates don’t break other things. And if they do break other things, then you’ve got to update those things and test them, and so on.
Often when writing code, you have to make decisions about how much memory gets allocated to variables. If you expect a variable will never hold values that are too large, you might use a smaller size to save space. It needs to be decided ahead of time because the computer needs to know which bits belong to that variable. It’s probably going to be stored right next to data that belongs to other variables. So if variables are bigger than you expected you need to rewrite code in places to allocate more space.
Another reason code might not be scalable is some algorithms don’t have significant performance difference at a low number of iterations but can be significantly slower at higher ones. There are algorithms where the time taken per iteration lowers as you have more iterations. There are algorithms that scale linearly meaning that the average time taken per iteration stays the same. And then there are algorithms that get exponentially longer to get through when you increase the number of iterations. So it may have been coded in a way that wasn’t optimal, but it wasn’t a problem when there was a low amount of users and then an increase in users revealed how inefficient the algorithm was.
Latest Answers