eli5: How do Captcha’s know the correct answer to things and beyond verification what are their purpose?

311 views

I have heard that they are used to train AI and self driving cars and what not, but if thats the case how do they know the right answers to things. IF they need to train AI to know what a traffic light is, how do they know im actually selecting traffic lights? and could we just collectively agree to only select the top right square over and over and would their systems eventually start to believe it that this was the right answer? Sorry this is a lot of questions

In: 3352

16 Answers

Anonymous 0 Comments

Well, the AI is already pretty well trained for the captchas – they are just refining rather than building from scratch. So, for example, maybe _one_ of those images or words is confusing to the AI and _that_ is the one getting trained but the others are all known.

Regardless, they aren’t actually verifying you based on your answers; they are tracking your mouse movements to make sure there is enough noise in the data to ensure you are human. That is what verifies you, not your answers.

Anonymous 0 Comments

If you’re looking at one of those picture grids where it wants you to do something like picking all the traffic lights, then you have 9 pictures to start with.

There’s at least 1 picture that it definitely knows has a traffic light.

There’s at least 1 picture that it definitely knows doesn’t have a traffic light.

Then there are up to 7 pictures that it isn’t sure whether or not they have traffic lights.

When you make your selection, the system is making sure you selected the positive control, making sure you didn’t select the negative control, and assuming those are correct, it passes your CAPTCHA, and it also adds the data about the unknown pictures that you entered.

Anonymous 0 Comments

The whole “traffic light CAPTCHA being used to train AI cars” is actually a myth, at least with respect to Google and Waymo. They have explicitly refuted the idea that they’re using CAPTCHA data to train automated cars.

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

Recently I had one where it asked to identify all the “cabs”. 2-3 of the images clearly had a cab, which i selected. One image had a yellow car that was not a cab. The captcha continued to fail until i selected the yellow car as well, even though it was clearly not a cab.
I felt much worse about it than I probably should have, in providing wrong information.

Anonymous 0 Comments

I did m-turk for a while. One of the common things you got paid for was to do the pictures for the captcha. They would ask you to select the ones with cars or traffic lights or whatever.

I’m relatively sure those captcha things don’t actually use ai at all and just rely on you answering the tiles it knows are correct.

Anonymous 0 Comments

Captcha is not really selling an image-recognition product. The whole purpose of a captcha is to stop automated inquiries while still allowing humans to navigate in a natural flow. It’s not an IQ test either.

The images you select, or puzzles you complete, simplifies it all.

captchas actually look at a lot more data. Captchas capture some mouse movements, keystrokes, last page visited, how you entered the website, your browser, attempts made, and some other super secret information.

You see the puzzles because you have a monitor, but a computer doesn’t actually need a monitor, or a display, to browse through the internet. A bot can successfully complete the captcha and still be denied entry to a website.

Anonymous 0 Comments

Captcha is an umbrella term for a variety of tests to identify if the answering party is a human, hence the name, Completely Automated Public Turing to tell Computers and Humans Apart.

First captcha test were randomly generated characters. Since computer generated the answer and knew it on the server side, it was assumed answer cannot be stolen, only answered. But computers and developers are useful at solving issues. They used character recognition tools to solve them. Then it became an arm race, they started warping text, using math questions, etc. In all cases computer randomly generated an answer and a question to go with it. Only thing server needed to do is to check your answer.

Then someone got a clever idea; people will try to answer it the best way they can, so we should start asking questions that we don’t know the answers of. Character recognition is not bulletproof, so ask the words we are not sure. If enough people say that word is “triangulation” computer will use this information to enhance future recognition performance. This is called blind entry, where multiple people are asked to identify same thing without knowing what others answered, and it has been in use for data entry tasks. Captcha is a way to utilize free labor.

Today we are using pictures because we are done with words (probably). Yet another computer term is computer vision where we process images to extract information, find barcodes, read text, identify an object or plant, face id. Computer vision systems also employ systems for recognition, most common is Neural Networks. A neural network is a very complex system where you train by giving the system thousands of taxi images and telling it “hey if you see something like this, say it is a cab.” Then you will feed pictures of other cars and birds and planes. When you feed a new picture system will says it looks like car %60, but also looks like a boat %35. Computer will find some pictures very confusing but will provide a possibility for each object type it learned before.

Now you see the pattern, for training people need pictures of objects and name of the objects. To get this data you need people to identify them, this is where we come to the picture, literally.

Computer will select some of the pictures it is sure of and some it is not and use us as dat entry operator for blind entry.

Anonymous 0 Comments

With the 2 word ones that used to be more common before the select a picture ones, there was one easy to read word and one difficult to read word. The easy one was a the control word, so you could just answer that one correctly and put whatever you wanted for the difficult one.

Anonymous 0 Comments

An important thing about Captcha the top answer didn’t cover: you are training artificial intelligence. The reason you are getting fire hydrants and bycicles in your captcha grids is because you’re training self driving car software. Before that it was spelling out what words an image contained. Same thing; you were training image-to-text AI.