eli5: How do Captcha’s know the correct answer to things and beyond verification what are their purpose?


I have heard that they are used to train AI and self driving cars and what not, but if thats the case how do they know the right answers to things. IF they need to train AI to know what a traffic light is, how do they know im actually selecting traffic lights? and could we just collectively agree to only select the top right square over and over and would their systems eventually start to believe it that this was the right answer? Sorry this is a lot of questions

In: 3352

Well, the AI is already pretty well trained for the captchas – they are just refining rather than building from scratch. So, for example, maybe _one_ of those images or words is confusing to the AI and _that_ is the one getting trained but the others are all known.

Regardless, they aren’t actually verifying you based on your answers; they are tracking your mouse movements to make sure there is enough noise in the data to ensure you are human. That is what verifies you, not your answers.

If you’re looking at one of those picture grids where it wants you to do something like picking all the traffic lights, then you have 9 pictures to start with.

There’s at least 1 picture that it definitely knows has a traffic light.

There’s at least 1 picture that it definitely knows doesn’t have a traffic light.

Then there are up to 7 pictures that it isn’t sure whether or not they have traffic lights.

When you make your selection, the system is making sure you selected the positive control, making sure you didn’t select the negative control, and assuming those are correct, it passes your CAPTCHA, and it also adds the data about the unknown pictures that you entered.

The whole “traffic light CAPTCHA being used to train AI cars” is actually a myth, at least with respect to Google and Waymo. They have explicitly refuted the idea that they’re using CAPTCHA data to train automated cars.


Recently I had one where it asked to identify all the “cabs”. 2-3 of the images clearly had a cab, which i selected. One image had a yellow car that was not a cab. The captcha continued to fail until i selected the yellow car as well, even though it was clearly not a cab.
I felt much worse about it than I probably should have, in providing wrong information.

I did m-turk for a while. One of the common things you got paid for was to do the pictures for the captcha. They would ask you to select the ones with cars or traffic lights or whatever.

I’m relatively sure those captcha things don’t actually use ai at all and just rely on you answering the tiles it knows are correct.

Captcha is not really selling an image-recognition product. The whole purpose of a captcha is to stop automated inquiries while still allowing humans to navigate in a natural flow. It’s not an IQ test either.

The images you select, or puzzles you complete, simplifies it all.

captchas actually look at a lot more data. Captchas capture some mouse movements, keystrokes, last page visited, how you entered the website, your browser, attempts made, and some other super secret information.

You see the puzzles because you have a monitor, but a computer doesn’t actually need a monitor, or a display, to browse through the internet. A bot can successfully complete the captcha and still be denied entry to a website.

Captcha is an umbrella term for a variety of tests to identify if the answering party is a human, hence the name, Completely Automated Public Turing to tell Computers and Humans Apart.

First captcha test were randomly generated characters. Since computer generated the answer and knew it on the server side, it was assumed answer cannot be stolen, only answered. But computers and developers are useful at solving issues. They used character recognition tools to solve them. Then it became an arm race, they started warping text, using math questions, etc. In all cases computer randomly generated an answer and a question to go with it. Only thing server needed to do is to check your answer.

Then someone got a clever idea; people will try to answer it the best way they can, so we should start asking questions that we don’t know the answers of. Character recognition is not bulletproof, so ask the words we are not sure. If enough people say that word is “triangulation” computer will use this information to enhance future recognition performance. This is called blind entry, where multiple people are asked to identify same thing without knowing what others answered, and it has been in use for data entry tasks. Captcha is a way to utilize free labor.

Today we are using pictures because we are done with words (probably). Yet another computer term is computer vision where we process images to extract information, find barcodes, read text, identify an object or plant, face id. Computer vision systems also employ systems for recognition, most common is Neural Networks. A neural network is a very complex system where you train by giving the system thousands of taxi images and telling it “hey if you see something like this, say it is a cab.” Then you will feed pictures of other cars and birds and planes. When you feed a new picture system will says it looks like car %60, but also looks like a boat %35. Computer will find some pictures very confusing but will provide a possibility for each object type it learned before.

Now you see the pattern, for training people need pictures of objects and name of the objects. To get this data you need people to identify them, this is where we come to the picture, literally.

Computer will select some of the pictures it is sure of and some it is not and use us as dat entry operator for blind entry.

With the 2 word ones that used to be more common before the select a picture ones, there was one easy to read word and one difficult to read word. The easy one was a the control word, so you could just answer that one correctly and put whatever you wanted for the difficult one.

An important thing about Captcha the top answer didn’t cover: you are training artificial intelligence. The reason you are getting fire hydrants and bycicles in your captcha grids is because you’re training self driving car software. Before that it was spelling out what words an image contained. Same thing; you were training image-to-text AI.

Most of the data in CAPTCHAS have already been verified by humans in control runs. So that grid will have a reference in a database that essentially says “Correct Panels: 1, 3, 5”

What you do as a human that helps AI train, is you contribute your results as error metrics. “Even humans get this wrong.” is a great help to AI, since it can then be taken in as a somewhat acceptable parameter. Let’s say the answers are 1, 3, 5, and 7, but 95+% of humans only mark 1, 3, and 5.

That now becomes a passing result for an AI as well, and they’ll try to get 7 as well, but remember, humans also fail that particular piece, so if the AI misses it, it’s not considered to be part of the error.

Top answer is correct, but ommits some critical information. After all, some Captchas ask you to simply check a box. Asking you to identify the correct images is only half the puzzle.

In the background, it also checks HOW you select the pictures. Computers being robotic, and humans being.. well… humans, we both have very different ways of clicking on things.

A very good example is the timing. Computers generally measure time in milliseconds. There are 1000 milliseconds in a second. If I ask you to click on five objects, the amount of milliseconds between each click would vary, greatly. 500…295…106…952…431.. all (mostly) half a second apart.

Computers have very structured processes. They almost always complete the same action in almost the exact same time (specific to the actual computer, how fast it can generally do things, and how much else its trying to do at the same time). If I was to ask a computer to click on five objects, the milliseconds between would look more like 50… 80… 30… 70…100. They still vary.. but nowhere near as much as a human.

Yes, in this case you could tell the computer to wait a random time between each click, but there are many other details about the way they click that outs them as computers.

We don’t know the full scope of this. If we did, it would he that much easier to make a bot that could fool the system, so companies will not tell you the exacts.

TLDR; They look at the finer details of your mouse clicks (how long it takes between each click as a basic detail, for example), and computer vs human input is very, very different. They still check the right pictures, as others have said, but that’s only half of it. We live in a world of machine learning. Computers can tell which pictures have traffic lights in them pretty easily.

They don’t. They examine your mouse movement and response times to verify you aren’t a robot

Why did they stop being letters? Were algorithms developed that could defeat the letter captcha?

There are a few types of captcha, but I’m going to explain the modern and familiar one from your example, with the traffic lights.

Imagine it’s your job as a human to decide if I am a robot. We are in the same room. You have some pictures, some of which you know are correct, some of which you know are not, and some of which you don’t know.

You show me the pictures and I get the ones that are right, don’t choose the wrong ones and choose some of the unknown ones.

Because I got the right ones right and didn’t choose the wrong ones, I am pre-qualified. You now have to decide if you think I’m human based on when you watched me make the decisions. If I was made of shiny metal and stiff armed and jerky, moving like C-3PO, you know I’m a robot. If I look pretty human but still am stiff you might be suspicious as well. You then either let me go, or give me another chance.

Aside from verification, when someone is takes a captcha, whether they cleared as being human, the choices they made, and how they behaved are all recorded. How they behaved is used to train the program that watches the person to see if they look like C-3PO. The answers to the traffic lights and other objects are collected as a datasets which are used to help further research into computer based learning, as well as for AIs that are used to identify road based features.

Sometimes they don’t know. I had one the other day kept rudly telling me to.click all the buses. I had clicked all the buses. I triple checked. It still wouldn’t let me progress. It definitely thought there was at least one more bus and I was a moron. There wasn’t. I had to click the refresh to get different pictures. I now live in fear that one day I will be declared a robot by a robot with no right of appeal.