If a social media platform is running smoothly, but the engineers leave, why can’t a platform continue to run on autopilot?

681 views

I guess this is applicable to any social media platform or other similar systems. Is it because there are always bugs to address, so it’s never really running smoothly, or other reasons?

In: 153

38 Answers

Anonymous 0 Comments

Operating systems update. The app needs to update along with them or they won’t work. Plus security updates are needed. At a minimum.

Anonymous 0 Comments

Because like a plane, it will run into problems naturally and from surrounding conditions, so if you don’t keep the entire thing maintained the wrong problem unchecked can completely break it apart

Anonymous 0 Comments

A site like Twitter is not fully self contained. It uses many (probably thousands) of third party libraries. These libraries are constantly being updated for new features, security risks, stability etc.

That means you need to frequently update your app to at the very least use the new libraries. Not doing so won’t break it right away, but sooner or later (hint: usually sooner) there will be a breaking change such as an older version being deprecated, or a field name being changed, that requires you to not only update the library you tell your program to use, but to make some changes internally as well.

Plus anything running at the scale of Twitter has a whole lot of infrastructure supporting it, usually in the cloud, that requires specific types of engineers (DevOps, DevSecOps, etc).

Anonymous 0 Comments

Who keeps it up to date with new hardware and software? The whole rest of the internet will continue to move forward. How long until their app no longer works on phones, or their website displays disjointedly on modern browsers?

What happens when some little thing goes wrong, as is often the case with computers, and nobody’s there to fix it?

Anonymous 0 Comments

In my experience with IT, it’s rare to have a completely uneventful day.
– Hardware goes down
– Networks stop responding
– Software becomes obsolete
– Operating Systems need to be patched

There are certain things that you’ll be able to keep working for a while. Then it gets to a point where other employees can find a workaround without having to get to the guts of the server room….

But at some point the work around a create a drain on productivity, then just stop working altogether.

Sometimes things can be fixed just by doing a reboot, but that’s not always easy.

I work for a small company with less than 100 office workers, and doing a complete reboot can easily take 30 minutes.

Some things will automatically start working again, others you’ll have to manually log into a part of the system and force things to start back up.

Plus, a system is only as reliable as its least experience user…. people open e-mails with viruses, leave passwords unsecured, forget passwords…. With an average user running things on autopilot, things break very easy.

Anonymous 0 Comments

The site is running smoothly *because* all the staff are constantly doing things. And it’s not just the engineers. Moderators are removing bad content, lawyers are responding to requests from governments, project managers are making sure projects run on time, and accounting staff are paying all the bills.

It’s like saying “this hotel is running very smoothly. Why would it matter if 80% of the staff left?” It’s the constant, almost invisible effort of the humans that keeps it going. Sure, the building isn’t going to fall down. But there’s not going to be enough staff left to wash and change the sheets, make guest keys, change the air filters, start the giant coffee pots in the morning, receive deliveries of soap, or pay the electric bill.

There’s a whole class of people called Site Reliability Engineers (SREs) whose whole job is to keep large websites working. Here’s a very fascinating thread from an experienced SRE just listing all the ways a large tech company can collapse:

Anonymous 0 Comments

Every system, whether digital or physical, requires routine maintenance to ensure all its features are functional. That’s where engineers and technicians come in, they’re the ones who check and maintain respective components in the system.

In addition to maintenance, the system also needs to be updated regularly to maintain cross compatibility with other systems.

So in the context of social media platforms, routine maintenance may be for stuff like the hardware that holds account information, media files, etc. or for UI interactions on different platforms.

And updates could be stuff like OS compatibility, especially for mobile apps that require optimisation for multiple OS, addition of new features or fixing of bugs.

These things are not something that can be fully automated, if at all.

(I do engineering work in a different field so I’m not sure how accurate this info is with regards to digital infrastructure and systems but it should be similar enough)

Anonymous 0 Comments

A hard drive fills up. That can crash a server. And take down any services that rely on that server.

That’s just one example of a small failure that if left unchecked degrades the system. Enough small failures and you start to have reliability issues across the system. It starts as a few things slowing down or not functioning until cascading failures bring the whole thing down.

Anonymous 0 Comments

So when the platform was first being created, the developers had to make a bunch of tradeoffs in order to meet deadlines and solve immediate issues. The price they paid was code that would create problems down the road and require additional workarounds. A lot of the code that is still in the codebase is this legacy code. The engineers know about these problems and can anticipate when they are going to become a real issue. Without the engineers, the platform can run okay for a little while, but the built-in problems will eventually compound and it will crash.

Anonymous 0 Comments

Lots of reasons but I’ll give you 3.
1. Day to day fires. Projects at Twitter scale stress limits on systems in different way based on lots of factors and you need people around to adjust for those changes.
2. Security and privacy. Twitter is now a massive hacking target for bad actors around the world. If no engineers are around, they become a bigger target.
3. Tribal knowledge. Knowing how a system behaves and all of its idiosyncrasies, how systems work together, why decisions were made in the past, what lessons were learned on the way, all of these things, are more important to running a system than the bits.