Eli5: Why is it so hard for social media companies to remove spam and Bot accounts?


Not really hard, but consumes resources, and the spurious accounts can be included in totals to report to investors.

In addition to some good technical explanations that I’m sure this thread will get, another reason is that they don’t want to.

Bot accounts are generally good for social media companies (up until the point it becomes too obvious that they’re bots). They make it appear as though the company is more popular, which is perhaps the most important metric of a social media company.

It means celebrities and influencers have more followers. It means advertisers have more reason to pay for ads. It means in general, there is more activity on that particular social network.

From my understanding a lot of bots sole purpose is to just download a tweet or whatever the second it happens so if the person deletes it there is still a record.
And a few years down the road tweets can be used for background, perspective, truth and or dishonesty. Every media outlet and thousands of individuals do this. Not really a bad thing

It is important to remember that the volume of traffic on modern websites is so large that tasks like “remove bot accounts” must be done by automated systems.

The thing is, how exactly do you identify what accounts are bots? It might be easy enough for a human to tell but can you specify a series of unambiguous logical rules to flawlessly identify a bot vs. a human? If you start banning real users in error you can cause big problems with your business.

It is a harder task than you might think. You can’t just search for certain words because nothing is really unique to spam. Repeating the same message in several places might be a clue but you are just as likely to catch Aunt Betty posting the same Bible verse to several groups, or trigger on someone’s signature.

Remember that it is also a continual competition between the spammers and the providers. As soon as they find a filter that works the spammers are off finding a way around it. They can add randomized garbage to their posts so it isn’t easily recognized as the same message. They can even just copy parts of other posts to blend in.

They are trying to find a needle in a haystack and the needles are actively hiding.