How does Siri hear me say “Hey Siri” if it’s not constantly listening to my conversation or me speaking?

655 viewsOtherTechnology

How does Siri hear me say “Hey Siri” if it’s not constantly listening to my conversation or me speaking?

In: Technology

28 Answers

Anonymous 0 Comments

[deleted]

Anonymous 0 Comments

Well, Siri is designed to listen for the specific phrase “Hey Siri” to activate. When you say “Hey Siri,” it triggers Siri to start listening for your command. It’s like giving Siri a special wake-up call! So, it’s not constantly listening to your conversations, but rather waiting for that specific phrase to be spoken.

Anonymous 0 Comments

The way speech assistants work today is that they have two components. A small device with a speaker and microphone as well as Internet access which can record your speech, and a bunch of servers which can analyze these recordings and convert them into commands. The analysis takes too much power to be done on the smaller devices. But they can do some analysis. If they are only looking for a specific phrase, and that phrase is easy to detect, then the device can be set to look for this. And if they find something that might match the phrase they send it to the service to verify if it is right or not.

What you end up with is a device that is constantly listening to your conversations. But it is not constantly uploading that to the Internet. It is only uploading if it thinks you mention the key word. The issue here is that because of its limited processing power it will detect a lot of false positives. So you might be having a normal conversation but the device with its limited processing power might think you are saying “Hey Siri”. It then uploads this recording to the speech recognition service where they analyze it with better hardware and detects that you were just having a normal conversation. It then stores this recording for later training without your knowledge.

Anonymous 0 Comments

It does constantly listen to your conversations. However, it doesn’t transmit anything until after hearing “hey Siri.”

Anonymous 0 Comments

[deleted]

Anonymous 0 Comments

There’s 2 chips.

The first chip listens for the “start phrase”. It will parse all sound it receives, but is a “dumb device” and can’t do anything other than match the start phrase.

Once the start phrase is matched, the chip will *pass through* the sound data onto the main processor. The main processor runs the required software and has the internet capability to decode sound into an interpretation. The interpretation could be an instruction, message, or any other purpose supported by the software.

TLDR: the first chip hears all, but can’t do anything except check for the activating phrase. Only after this is activated, is the sound passed onto the main chip, which can understand all the data in the audio.

Anonymous 0 Comments

An analogy may be something like battery powered security cameras. They are always watching, but don’t start recording and uploading to their cloud until there is motion.

Anonymous 0 Comments

think of it kind of like the way you as a human arent constantly paying close attention to all the sounds around you, you hear them yes but you dont always pay attention. when someone says your name, thats like saying ‘hey siri’. the phone starts listening when it hears its audio cue and only then will it pay attention to what you say

Anonymous 0 Comments

The device *is* constantly listening to you, but it simply ignores everything it hears until it picks up the start phrase, and *then* starts transmitting the rest over the internet.

Anonymous 0 Comments

If you’re interested in going (way) beyond the ELI5 version, Apple has a Machine Learning Research blog post where they explain how it works in great detail: [Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant](https://machinelearning.apple.com/research/hey-siri). That link is from 2017, but you can also [search](https://machinelearning.apple.com/research?domain=Speech+and+Natural+Language+Processing&page=1&q=siri) for more recent posts showing subsequent updates.

One of the most interesting things, I think, in the original post is how the threshold for detection is dynamic. It calculates how confident it is that you actually said “Hey Siri” and if it’s just below the threshold, it temporarily lowers the threshold so that if you immediately repeat the request, it will be caught the second time.