I work in an operations center for a very large internet company. Maintenance is usually one of two things.
First, there may be a new version of the software that runs on these servers. The developers are constantly working to add new features and fix old bugs. When you deploy this new software to the servers you usually have to take each server offline for at least a short time.
Second, you might need to replace the hardware itself for either break-fix or upgrade work. If a component of a server fails you might have to turn it off in order to replace it, depending on the component. Alternatively, if you’re upgrading a component or the entire server to something bigger and faster you will need to turn off the old server.
There are ways to do all of this work without taking down the overall service so that the customer never notices that you did maintenance. But, these methods are more complicated and requires more total servers to take a batch offline while still supporting the live load which costs more money.
Latest Answers