These systems are, as you may have guessed, created to help differentiate robots and real humans. Back when the internet was first created, there wasn’t really a worry that people interacting with a website would be anything other than human. But eventually as the internet became more popular and more websites with more interactivity were starting to pop up, people figured out that it was really easy to create a computer program that interacts with a website to do something.
This started to become a huge deal when websites started allowing you to post comments, send messages, and more. The internet was basically a battleground where website owners that had comments sections or allowed people to sign up for accounts had to fight against the people creating computer programs to interact with websites. It eventually became clear that the website owners were losing, so it was decided that a system needed to be created to tell if someone trying to interact with your website was a computer program or a human.
This led to the creation of The “Completely Automated Public Turing test to tell Computers and Humans Apart” or CAPTCHA. The theory behind how a CAPTCHA worked was basically that people were pretty good at reading text that had been distorted in some way, but computers were *really* bad at doing that. So you generate a string of random letters and numbers, and then pass that through a set of filters that made the letters all wavy, and then you show the resulting wavy image to something that is trying to do something with your website. If you get it right, you’re probably a human so your submission goes right on through. If you get it wrong, you’re probably a robot and your submission gets blocked.
But then an up-and-coming company called Google, that already let you search most of the internet, got a bunch of books and wanted to digitize them so they could be easily searched with computers. Since computers were pretty decent at reading text, they were able to convert almost all of the words in their digitized books to text that a computer can understand. But because the conversion process wasn’t totally perfect, there were words that the computers couldn’t read. So the engineers at Google looked at how the distortions in CAPTCHA looked similar to the distorted words they were seeing with their book digitization efforts and said “Let’s make this better”. So they bought CAPTCHA and made a few modifications.
From then on out, CAPTCHA would show you two words. One where the system knew exactly what the word was and one word that was actually a picture from a book they were trying to scan that they weren’t sure about. If you got the distorted word right, you’re not a robot and the second word didn’t matter. But behind the scenes, Google was collecting information on these words. Once enough people agreed on what a particular unknown word was, they could use that information to improve the process of digitizing books.
But because they were using that image to improve how computers read distorted text, computers eventually got good at solving those kind of CAPTCHAs, so they had to think of something else. So since the technology behind self-driving cars was starting to get built around that time, and since computers weren’t that good at knowing what they were looking at, and Google having a massive database of billions of images of the roads around the word, they decided to propose a new test: Click on the road sign in this image, or select the crosswalk in this image, or click the traffic lights or the motorcycle, or some other thing that a self-driving car might want to know. And for some of the images they knew what they were and they could use that to tell if you were a human or not, but sometimes you’d be presented with a second image where you have to select the thing from an image where they don’t know what they’re looking at. So this was again used to improve how computers see the world.
But because they used this information to make computers good at identifying road objects, computers got good at solving those CAPTCHAs, too. So there’s a new test out there: We’re collecting all of your information anyways, so what if we figure out the difference between the information from a computer and the information from a real person? So they made a brand new type of CAPTCHA that uses things that are totally unknown to anyone. It might be things like how you moved your mouse before ticking the “I’m not a robot” checkbox, how big your screen is, what web browser you’re using, how up-to-date is your device, etc. It is genuinely a mystery as to exactly how they’re determining who’s a robot and who’s a human right now. Eventually computers might get really good at bypassing the current system, so they’ll probably come up with another one once that starts to be an issue. It’s just a cat and mouse chase.
But all of this is also protected by a TON of patents, and Google doesn’t exactly want its competitors using the technology that they own the patents for. So alternatives for CAPTCHA were created. Some of these include things like sliding a puzzle piece across the screen, clicking on the bananas, figuring out the right orientation of a picture, or even just plain figuring out what the distorted word is like before. These techniques have varying levels of success, but as a website owner: If someone is trying to bypass your “Not a robot” detection you might have bigger problems so you’ll use what you can.
All of this is just to keep automated computer programs from using websites to send spam or create a ton of accounts that try to send people viruses, or whatever other bad things people want to do with well-known websites!
Latest Answers