If a social media platform is running smoothly, but the engineers leave, why can’t a platform continue to run on autopilot?

689 views

I guess this is applicable to any social media platform or other similar systems. Is it because there are always bugs to address, so it’s never really running smoothly, or other reasons?

In: 153

38 Answers

Anonymous 0 Comments

In theory, if the code was perfect, it could run on autopilot (outside of content moderation). Perfect code is a goal, but not a likely occurrence. Sometimes the silliest things break your code, and even though it only happens in this somewhat unlikely situation, you still need to fix it.

Even with perfect code, vulnerabilities are being discovered and patched (regarding the underlying language or libraries used from outside sources). Sometimes you discover vulnerabilities within your own code that need to be fixed. Any time you update something, you potentially break it. It’s not the same as updating your phone, although in a perfect world it probably would be.

Anonymous 0 Comments

If my car is running fine now, why might I need a mechanic later?

Anonymous 0 Comments

Technology isn’t perfect. It has errors, flaws, issues. Even as simple as it not being as efficient as it can be. So people are trying to fix those, so that things work better, or continue working even when other people try to attack them.

Those little changes interact with each other, and affect each other. When you have complicated systems, there’s loads of interactions that need to be monitored and maintained. Little change in system x maintained by Anna at MegaCorp means another change is needed in system y maintained by Bob at AwesomeSoft. When Charlie fires Bob, and Anna rolls out a little fix the next day, system y falls over despite nothing in it or anything that organisation runs changing.

That’s in addition to the simple constant maintenance eg hard drives filling up with all the new data being added.

Anonymous 0 Comments

Unique conditions that occur with a million to one odds happen several times a day when you’re processing billions of user requests and trying to squeeze every fraction of a percent of efficiency out of your systems. Saving 1/1000 of a penny on something can mean a million dollars a year in extra profit

Anonymous 0 Comments

Trying to explain it in simpler words.

Big websites like Twitter or Google are a bit like big cities – very complex, constantly changing systems, consisting of many simpler systems. Think about all the roads, water and electricity facilities, but also museums, police, schools, trash collection and so on. In order for the city to “work” – being a place people want and can easily live – all of those systems need to be working at all times to some level. Streets need to allow for deliveries, and for people to move around. Trash cans need to be collected, electricity needs to work, etc.

Any of those systems can fail. It could be for any, sometimes unexpected reason. Eg: a lightning strikes a local powerplant, a change in policy causes all garbage men to go on strike. Now, if trash is not collected for some time, eventually city starts to stink, and be unpleasant. If it get worse and trash piles on the streets potentially some roads get blocked. If roads get blocked, people and deliveries can not get around. The longer it takes to resolve the worse. So one failure can pull another.

Some systems failing will have bigger and some smaller impact on the overall city working. All museums closed for a week will be mostly an inconvenience. But if there was no electricity it would be probably chaos, armagedon, possibly many people dying. And again: you can imagine one system failing pulling others down. Plus the longer they are down the worse.

So you want to be able to fix things quickly. In city it would be responsibility of management of specific city companies, probably together with city government, with likely a set of people who only work on managing unexpected problems like that.

Now coming back to computer systems. Each of the city systems is something called in computers a microservice – the same program running on one or more servers. Microservices are also interrelated, and one failing often pulls another down. They also need some common infrastructure to work. In city it’s roads, canalisation etc, in computers it would be network and power. Each microservice is usually owned by a team, who takes care of it, the same way that city companies have managment. Each team usually will own more than one microservice. Which means that even in small companies you will have 10s of them, probably going into hundreds, and thousands and beyond depending on company size. Twitter likely have somewhere in high hundreds of them.

Now, what has happened in Twitter in last week, is basically 90% of city companies management quitting all at once. There is almost no one there to know that a pipe under main square is about to burst, and that unless checked weekly, the electricity will start failing in parts of the city. And there is also not many people left to be able to coordinate fixes is something breaks. And even if they are around, the chances are that they have no knowledge about a specific thing which broke, and without that a fix will take days or weeks. By which point the city may be in flames with people escaping in drows.

Anonymous 0 Comments

Imagine a Swan. It glides through the water because its feet are powering away unseen under the water. If the feet stop paddling, then the Swan will stop too.

Anonymous 0 Comments

Physics alone keeps the plates spinning a bit if the performer walks away, but sooner or later gravity hits

Anonymous 0 Comments

Something else to consider is the ability to be agile in with addressing issues or putting time into innovation. With a skeleton crew you’re “running smoothly,” but there’s no one working on new features or products while your competition is putting more effort in. It’d be a good way to get left behind.