I was stuck in the very long queue for an upcoming online game (like a few thousands in front of me) and the reasoning for that is they need to stress the server so they can optimise it and the queue and then have servers ready for the release date. It’s always like that during betas, stress events and release dates. Before the release day, any queue is justified by ‘we need players to test the server for us’ and after the release day the justification is that ‘we didn’t expect so many players’.
Buy why do you need real people to connect to your server? If DDoS can overwhelm the server then why can’t you use it to test it and treat it as fake players that stress your server?
In: Technology
DDoS intentionally does the worst things possible. Players don’t always do the things that a DDoS would do.
Trying to make the servers handle stress takes money. The company wants to save as much as they can. To simulate players involves writing bots that look like players. That’s effort, and effort costs engineering time. That’s time that could be spent fixing bugs. The players are a huge crowd of people willing to do labor for free.
So unless they *already* have a sophisticated botnet, it’s just smarter to let your volunteer army do the testing for you.
A DDoS attack isn’t usually stressing the whole game, just a specific, internet-exposed portion of it – usually the login services. Think of it like a crowd of people all banging on the front door of a building – they don’t even want *in* the building, they’ll just walk away if the door is opened for them, but while they’re around the real customers can’t get through the crowd to enter.
That will certainly test your login servers, but an online game has a whole mess of infrastructure and services *behind* that which aren’t seeing the DDoS and need to be tested. Actual players doing actual gameplay are the best way to test since you want to test the whole thing top-to-bottom.
Sure, you *could* create a whole fleet of programmed robot “players” which can stress your game, and there are advantages to that, but you run into problems:
* Building out a big enough set of bots is potentially expensive (in time or infrastructure costs), especially if you’re creating a game that’s supposed to support very high player counts.
* Bots will never act like real players act, so it’s not a very realistic test. If the bots all spread out in an MMO but the players all cluster together into a few areas, the bot test and the player test will get very different results.
* You miss out on free marketing. Players who participate in a stress test get a free sneak peek at the game, which makes them happy (if your game’s worth being excited about!) and generates buzz (so, free social media coverage and grassroots discussion/engagement).
A DDoS and players actually playing things are two very diffrent things internaly. A DDoS is sending so many random requests that the server cant keep up the respownses even if its just denying the request. A player stress test doesnt want to overload the request but see what game mechanics cause stress for the server and possibly what the max amount of players per server is. So you can either fix the worst game mechanics or know when to put players on another server. A DDoS just tells you were the limit of your internet connection is.
The one thing that’s extremely difficult do is accurately simulate being ignorant. Simple example, it’s very hard for you to imagine getting lost in your own house. You could try to imagine what it would be like to be lost in there and think about where you would put up signs to not get lost, but until a person actually comes into your house for the first time and gets lost looking for something, you don’t know where they’ll get lost. It’s a lot faster to watch 3 people get lost and put a sign there vs evaluating every corner of every room to see where it could happen.
Users get lost, click wrong buttons, click stupid buttons, try to break things in ways you would never consider, and do countless other things that a developer who knows the game inside and out wouldn’t think of.
It’s rarely just the login that causes issues. It’s usually something far more complicated. In cases where user turn out did vastly exceed expectations, there’s nothing you can do about that. It’s far better for a company to have to buy more servers after the fact vs having massively under utilized servers that they paid for up front. Especially for a small company who may not have the money to afford that until the game starts selling well.
DDoS is very different from legitimate traffic and will not reveal bottlenecks or performance issues that legitimate traffic may encounter.
There are other methods of creating simulated traffic but it is dependent on the developer accurately predicting what the user traffic will be like and things can easily be overlooked.
So, to make sure that they haven’t missed anything, after many tests behind closed doors using simulated traffic, the developers will invite the public in for a stress test with real user traffic.
—
Launch day is different too. The resources the developer allocates are based on what they predict they’ll need and not much more because more costs money. If the playerbase greatly exceeds that prediction they often cannot just “press button for more servers” to solve it, especially if the launch day rush uncovers a new bottleneck that even the stress test was too small to find.
A stress test isn’t about overwhelming the server’s network connection. That’s all a DDOS attack does. It’s about testing all the resources.
You need real people because headless clients simply can’t give you the whole picture. Real people will have different behaviours that can’t be predicted accurately. You need people to do what they will do, so you can catch any vulnerabilities. If a headless client doesn’t do something players do, you don’t know if that behaviour will be a problem.
Think of it like “soft opening” a restaurant a few weeks before the “grand opening”.
You *could* bring in the full staff that day and run through a full evening with a handful of invited guests… but test-running a “slow night” like that wouldn’t really tell you whether your staff can keep up on a busy night (like the upcoming “grand opening”). So instead, you maybe rope off half the restaurant and only bring in half the staff; now your same number of invited guests will feel twice as busy to the servers and cooks who are working. Or, take it even further and rope off 1/4 of the restaurant and bring in 1/4 of the staff… etc.
These kinds of “practice-runs” can really help you figure out if your staff can keep up with the customers or if you have to hire way more staff, or cut back on certain menu items, etc. They are also much better at finding unforseen issues than “DDoS”-type stress tests. Like, you could totally hire a person to make a million reservations by phone or to open-and-close the front door a million times to check if your phones and doors can handle a million customers… but neither of those tests can tell you anything unforseen like how a guest asking *”Do you have any toothpicks?”* or *”Is this menu item gluten-free?”* might.
Others have noted the main issues with this, but i think it wort looking at WHY the servers are unprepared after a stress test.
Before the test, we make some assumptions about the number of players that will connect. We can figure that out from our marketing and engagement, but its still a lose prediction. We also dont want to DDOS our own servers by letting everyone in at once. Were trying to see what happens when we stress the game systems, not what happens when our login servers get nuked.
Based on the stress test were able to identify where we might have problems. Maybe the players item storage was creating too many requests and we need to fix that. We also get a sense of the real player numbers. We guess 50k players, and we get 100k during the stress test. in the ball park but still pretty off. But people enjoyed what they saw in the stress test, so we can also assume we’ll get more players than that. We need to make another prediction on the actual player count at launch. Make it too small, and we’ll have unhappy players sitting in the queue. Too big, and we might end up spending a ton of money on server space we dont need. We’d rather some players need to wait then burning potentially millions of dollars on servers we dont need. So, we make a conservative estimate. Often its too low.
A good example is Helldivers 2. Helldivers 1 had an all time peak player count on steam of about 7000 players. When making the sequal, they know they are going to get MORE players but how many is up in the air. So they hope for 5x as many players. And the game explodes. Stress tests are utterly overwhelmed. They make an even bigger server for launch. That gets overwhelmed. It took nearly 2 weeks for things to stabilize.
Predictions are hard, and sometimes your player base is insane
Latest Answers