Think of servers like a home. Your home can only hold so much right? You got people going in and out, you have new furniture, old trash, etc etc.
Severs are always taking in new data, sending out old data.
Servers also have something called databases, these store information for the server to get later on. Think of it as an organized drawer in your home.
Now, Let’s imagine your house has a broken window. Not good for security right? Well, the same goes for servers. There are new vulnerability’s being discovered constantly, so sometimes they need to go down for maintence in order to fix those new security issues. Another reason it can go down is because too many people are trying to get on the server. Lets imagine again, Your house. If it’s just you and your family in your house, it’s fine, right? Well, what if your entire neighborhood went into your house. It becomes hard to move around, you become slower because you need to navigate through all these people. Servers also only have so much room, and sometimes they need to have maintence done to help make more room for more people
I work in an operations center for a very large internet company. Maintenance is usually one of two things.
First, there may be a new version of the software that runs on these servers. The developers are constantly working to add new features and fix old bugs. When you deploy this new software to the servers you usually have to take each server offline for at least a short time.
Second, you might need to replace the hardware itself for either break-fix or upgrade work. If a component of a server fails you might have to turn it off in order to replace it, depending on the component. Alternatively, if you’re upgrading a component or the entire server to something bigger and faster you will need to turn off the old server.
There are ways to do all of this work without taking down the overall service so that the customer never notices that you did maintenance. But, these methods are more complicated and requires more total servers to take a batch offline while still supporting the live load which costs more money.
The most obvious one is patching. You can apply OS-level patches that either improve performance or fix software vulnerabilities. Sometimes you can forego the performance ones, but the security ones tend to be required sooner rather than later. Having unpatched security issues is like leaving the front door unlocked and hoping nobody tries to walk in.
Another reason is physical real-world logistics. For example, many companies had to move servers out of data centers in the UK and into mainland Europe after Brexit in order to comply with having data stored in the EU. This could also be simpler stuff, such as consolidating rack space in a data center (if you were to downsize how many racks you had, or had to move servers around for cooling reasons or to make more efficient use of space).
Another common reason is that many servers are Virtual Machines these days, and the Virtual Machine Host needs similar maintenance. The VMs can sometimes be seamlessly moved to another device, but not always.
The last one is software updates. This can be a major one, especially because many pieces of software don’t play nicely while updating. For example, if you wanted to update database software, the database likely needs to be brought offline, which means the application pointing to it also needs to be brought offline. Only after the new database version is up and running can you bring the software that uses it up.
An interenet server is no different than any other computer.
Parts fail, and software upgrades are needed.
With a professional server, you swap our hard drives or power supplies while the system is on, but replacing other parts may require it to be turned off.
You also have security updates and such that require taking the server offline. Those problems are easily solved, if you have multiple servers. Taking one offline when others are up shouldn’t have an impact.
But once huge issue can often be database maintenance. If you need to make changes to that, you don’t want people writing new data during the transition. Its just going to cause issues. So something like that could require the entire site to be taken offline while the updates are made.
Think of it like a car, when you’re using it a lot (servers often work 24/7 all year, so they are used constantly), you’ll need from time to time to go to the garage and change the tires or maybe change the oil. Similar operations exist for servers, so when a piece of it is used, it has to be shut down to change it, like you cak’t drive a car when you change something on it (modern servers have now everything doubled so that you can change one without shutting down everything, that’s part of why google is never down). This doesn’t happen with our computer because we don’t use them all the time. Also, the progress in this sector are so fast that quite often you have to change your servers to stay competitive, again like with the car industry : car from the 70s is way worse on every point than one from the 2010s. It’s the same with servers, except the time scale is divided by 5 or 10.
Latest Answers