How does reverse image searching work?



How does reverse image searching work?

In: Other

Instead of submitting a string of words to a search engine your uploading an image and then it compares it to everything else it has indexed

The search engine indexes loads of photos and keeps a note of the relative location of some (or all?) of the pixels. When you search with an image, the search engine looks at the relative location of the pixels in your image and compares that to its database of images and presents pictures that match.

A “perceptual image hash” of your sample image is calculated. Depending on the scheme, this is a number between 8 and a few hundred bytes in length. It contains information about the image in a condensed form. Two images that are similar would have similar, but probably not identical, hashes.

In simplest terms, they compare the hash of your image against the hashes of every image in their database. Other images whose hashes differ only by a few bits are very likely to look similar.

There are algorithms to find matches much faster than brute force.

A super simple “perceptual image hash” is to convert the image to gray-scale, then scale the image WAY down to an 8×9 pixel image of 256 grayscale levels. Then scan each row. If the pixel to the right is brighter than the one before, write a ‘1’. If the next pixel is the same or darker, write a ‘0’, scan the entire row and then the other rows.

This will result in an 8 byte ‘hash’ of the image. Other images that that look similar might have fairly similar patterns of brighter/dimmer pixels when scaled down so tiny.

If you XOR your image hash against the other image hash, the only bits left will be the differences. If the number of difference bits are few, the images are probably similar. Two totally unrelated images would tend to differ by about 32 bits. But images that differed by 6 bits probably look similar,.

There are FAR better perceptual hashes than the one I described.

It depends on what kind of search you mean.

If you’re asking about finding the source or a higher resolution version of an image, like what TinEye provides, that involves taking the input image, crushing it down to a fuzzy mess, and comparing the crushed versions to all the other crushed versions it has already seen. Two images that were very similar should produce even more similar crushed results.. think how hard it would be to tell two nearly identical twins apart if you had poor eyesight and you took off your glasses. This allows the algorithm to run way more comparisons, since the crushed versions have way less data in them. This should weed out almost all the garbage that didn’t even look remotely similar to the original. From there, you can do as many extra passes you want with progressively less and less crushing each time until you get a satisfactory level of similarity.

Searches that actually try to guess what is *in* the image and subsequently try to find more of it, like Google reverse image search, are far more complex. These tend to be computer systems that are fed an obscene amount of training data to teach them how to identify certain objects, until they become sort of good enough at finding those objects in something it has never been seen before. (I would suggest CGP Grey’s video [How Machines Learn]( for an ELI5-friendly introduction to one of the simpler ways this can be done.)

These search tools can also store metadata they scrape up along with each image while out on their adventures to scrape up new images to store in its database of things it’s seen. This metadata can be things like the file’s name, when and where it was taken or made, what website it was on, and what other text was on the page with it. This data can be thrown into more mundane keyword scraping algorithms to be used as extra clues to what it may contain.