How can the camera on my phone identify when it’s seeing text or a face or somethingelse?

92 views
0

I mean, aren’t all things the camera sees are just ones and zeros? How do they program it to understand what’s what?

In: 2

Short answer: Algorithms

Long answer: Your brain also just gets electrical inpulses from your eye nerves and has to interpret that stuff somehow. Computers can emulate that with deep neural networks

Yes, it’s all digital. But there’s nothing that says that the computer can’t also analyze what it’s displaying. A “1” has a certain shape, which means that the ones-and-zeroes have a certain pattern as well. You can code the computer to recognize that pattern of ones-and-zeroes as a “1”, or something that looks like a face.

Most modern image recognition is done using machine learning.

The problem with “directly” programming a computer to recognize things like faces is that many of the real-world concepts we take for granted are actually really, really complex. Like, imagine trying to describe what a face looks like to an alien who has never been to Earth or seen a human before. It might go something like this:

**You**: A face has two eyes, two ears, a mouth, and a nose.

**Alien**: I see. What are these “eyes” of which you speak, and how may I come to recognize them?

**You**: Well, an eye has a circular part called an “iris”, which is usually blue, green, or brown in color, surrounded by white stuff.

**Alien**: Fascinating. A human confection called “blueberry parfait” has recently become quite popular on my planet. I had no idea it was made with eyes.

**You**: Uh, no. Although some eyes are blue, they’re never *that* shade of blue. They’re also glossier than blueberries and have wavy patterns running through them.

And so on. With a computer, you would also need to explain what “wavy” and “glossy” mean, exactly what shades of blue are acceptable, etc. And what happens if a face does *not* have two eyes visible, so our first statement doesn’t apply? What if lighting conditions cause eye whites to not actually appear white? Basically, trying to articulate exactly what makes something a face will inevitably lead to a never-ending series of explanations, clarifications, and exceptions, and there’s no way you’ll ever be able to think of everything.

Instead of designing a program specifically to recognize faces, a much better approach is to design a program that can learn how to recognize any kind of object, which you can then *teach* to recognize faces. The way you teach such a program to recognize a specific object is similar to how you would teach a human: by showing it things that *are* examples of that type of object, and things that are *not* examples of that object. In this case, pictures of faces and pictures of not-faces.

As for *how* these programs learn, state-of-the art systems use models called “deep neural networks”. You can think of a neural network as an incredibly complex mathematical formula with millions of unknown values (called “parameters”) to be solved for. At first, all of these parameters are set to random numbers.

The model is trained by feeding it images in batches. For each image, the formula will output a number representing how the model thinks it is that the image is a face. Initially, these outputs will be nonsense because we used random numbers for the parameters. However, using math (specifically calculus), we can work out how to tweak the parameters so that the model’s answers come closer to the correct answers. After being fed enough batches of images, the neural network will eventually learn what makes a face a face.

How do we recognise faces in a picture? The simplest way is that we scan over the image and look for obvious features – two eyes, a nose and a mouth for example. When we find something that matches that we can then examine it in more detail and see if it has the other elements that make up a human.

Computers do the same, but instead of looking at a page and looking for colours, they scan through all the ones and zeros and look for the same patterns.