Eli5: How does Google ‘listen’ to you? Is there actually a program running in the background on your phone that records your voice and sends it to google?

373 views

Eli5: How does Google ‘listen’ to you? Is there actually a program running in the background on your phone that records your voice and sends it to google?

In: 2057

13 Answers

Anonymous 0 Comments

On your phone and other “personal” devices it waits until you press a button and then it listens.

On a home automation or other “smart” device, there are two programs running at once, one program records a certain number of seconds on a loop, this recording is local only and runs all the time overwriting itself as it goes.

A second program looks at the first program while it is running and if it detects the “wake” word it sends the snippet of voice to the cloud api to translate into an action.

Your voice only leaves your device after it detects a wake word though this is not a perfect system and false positives occur.

*If your phone is listening without a button press, that is an accessibility feature you enabled and you should probably disable it to save on your battery even if you aren’t paranoid

Anonymous 0 Comments

For sure there’s an algorithm that listens to what one says. This is the one that starts the assistant when one says “ok Google” or “hey Google”. We don’t know the full extent of that algorithm. We only know it turns the assistant on in the cases explained above. But technically it might record much more data. Of course, if that’s the case, it should be explained in the terms and conditions, and it is not.

Actually, you can even find the recordings of your voice in your Google account:

https://myactivity.google.com/activitycontrols/webandapp?utm_source=my-activity&hl=en&product=29

Anonymous 0 Comments

I was talking to a friend about a pregnant person teaching yoga when we were having dinner last week; next day I got an ad for pregnancy yoga on my Instagram.

I am a 29 year old man.

Anonymous 0 Comments

Eli5 for me too: Is it possible to confirm either way if our phones/echo fotos/etc collect data by spying our conversations?

Anonymous 0 Comments

If you’re talking about the Google Assistant, remember when it asked you to say “Ok Google” when you set it up? It uses that and a mic. It keeps listening, and if what you said in the mic matches your own unique “Ok Google”, it will record what you say after that and search the internet.

If you’re talking about ads, unless you’re using the Google Assistant, Google isn’t actually listening to you.

It just knows which websites you visit frequently if they’re using Google Analytics or visited from Google Search, which apps you use to login with your Gmail or Google Workplace, which videos you like to watch on YouTube, or if you’re on Android, everything you do on your phone.

Based on these and several other signals, it creates your digital twin – an identical version of who you are online. Then it serves ads for your twin. Above 95% of the time, that twin is actually you.

Anonymous 0 Comments

Well… IMO google listening is unfeasable, processing audio and analyzing it on a large scale requires a huge amount of computation…

What is feasble, that google has your habits, the habits of your friends and family, what you did, when, your ISP data and everything…

That metadata gives them the ability to make a machine learning model of you… kind of creepy but google knows you better than your wife… sometimes even yourself…

Your desires, present and future, your intent, your lifecycle… Compared to computers… Humans are easy to figure out….

Anonymous 0 Comments

nothing is being sent to google. imagine like vintage clap-on clap-off lamp. it doesn’t send anything to anybody. it just works if it detects specific sound meaning specific frequency and specific amplitude. now. google, echo and devices are smarter clap lamps. phone or smart home device has microphone always on (you can turn off but dont be that paranoid), output of the microphone is being sent to pretrained neutal network (when youre setting up ok google you’re repeating this word multiple times). this algorithm is not power hungry so can be used on a background to look for specific sound. after it detects pattern, then it wakes up and does thing. nothing is being sent to google while you dont say ok google, you can assure by disconnecting from network and saying “hey google”, it will still recognize, without network, so its indeed run by yout device. its nothing complex, just a clap lamp on steroids.

Anonymous 0 Comments

Speech-to-text is computationally intensive. Listening all the time for any and all possible words on a phone would drain the battery too quickly, as your phone would essentially be “always” thinking hard.

However, it’s a lot less work to just try to scan for the *one* short sound you’re explicitly looking for, and ignore everything that’s not a close match. “Did he say a word? Well it wasn’t the one I’m looking for so I don’t care. I won’t spend any time thinking about it.”

And it takes even less work to just recognize “there was a new sound of some sort just now.” That just means looking for spikes in volume.

So there’s a principle in computing, when doing short-circuit evaluation, you check the very quick stuff first and skip the more complex check if the quick check proved the complex check can’t be true.

So in this case that means you do things in this order:

Step 1 – Is there even a speech-like rise in volume at all?
(If “no”, then abort.)

Step 2- Did it match the short wakeup sound I’m looking for?
(If “no”, then abort.)

Step 3 – Okay, now start engaging the computationally expensive speech-to-text algorithm to look at the rest of the sound.

So the hours your phone sits in your pocket and the average volume of the room doesn’t really rise at all, your phone just sits on Step 1 over and over. “No rise in volume? Okay How about now? No? Okay how about now?” This doesn’t take much thinking at all. There’s a little bit of “smoothing” logic in there so sharp sounds like knocking on a table don’t wake it up. It has to get a volume rise that has a bit of a duration to count as being “maybe speech”.

Then there’s chatter where people in the room say “Did you see that ludicrous display last night?” “What was Wenger thinking sending Walcott on that early?” And this seems to fit the volume pattern of speech so that time around your phone gets past Step 1 and gets to Step 2. Was there anything in there that sounded like the wakeup sound “Ok Google,”, (or “Alexa,” or whatever your service uses as its wakeup sound.) When the answer is no, then it still doesn’t bother engaging Step 3 yet. It knows that “Did you see that ludicrous display last night?” doesn’t contain “Ok, Google,”, but it doesn’t know what it *does* contain. It doesn’t know that it contained the word “ludicrous” for example. It just knows it didn’t contain “Ok, Google” so it ignored it.

Then someone says, “Ok, Google, pizza delivery near me”, and THIS time, it gets all the way to Step 3. It had the sustained volume pattern that speech has. It had the magic sound “Ok, Google”, so THEN it started running the speech-to-text algorithm, which is expensive and power-hungry, to work out the “pizza delivery near me” part of it. It’s power-hungry, but it only needs to do it for a few seconds and then it’s done, rather than leaving it on at all times.

To make all this possible, it also has a small rolling audio buffer that keeps the last several seconds of audio. It needs that because by the time it decides your sounds are worthy of speech-to-text, you’ve already said them a second ago. They’re in the past.

As to why it seems like it’s listening to you all the time in a creepy way, that’s because it’s *really good* at guessing from other context clues (in a way that really is creepy). Let me give an example. Me and a group of strangers were sitting at a table in a gaming store. We didn’t know each other and were there waiting for an event to start. We had no prior social contact, no facebook links or anything like that. We were talking about movies. The subject of The Aadams Family came up. We talked about how good the child actor who played Wednesday was at nailing the role. Then we moved on to other stuff. At no point did I google anything about it. At no point did I take my phone out of my pocket. And the subject of “Wednesday Aadams” wasn’t a thing I had mentioned or looked up for years. But later that evening, there it was in my auto-complete as soon as I typed a “w” into google’s search bar, the first autocomplete suggestion from just typing “w” was “Wednesday Aadams”, and I was like, “WHAAA?” It sure sounds like it’s listening in all the time, otherwise how would it know the subject was ever even mentioned? Well the answer was almost even worse than that. No matter how much you try to turn off “location tracking” various services keep insisting on having it on to work at all. I hadn’t typed anything about “Wednesday Aadams” at all, but one of the other strangers at the table had searched for “who played Wednesday Aadams” on *their* phone, and then someone else at the table looked up the IMDB page for her. What google had done is used the location tracking to conclude “Two people who weren’t you performed google searches on the same subject during the time that they were in very close proximity to you. You remained in proximity to them for about a half hour. So you were probably having a conversation with them about it.”

For another example of a creepy but useful thing the location tracking does – it’s the reason Google Maps instantly knows where the traffic jams are, and reports the real current travel times along the roads, not the hypothetical ‘speed limit’ travel times. It’s because lots of the people in those cars happen to have Android phones, and their location tracking is on. Google deduces the traffic speed by watching those phones move.

Anonymous 0 Comments

I don’t think they do this, but if you’re talking about tracking your conversations for ads and things like that they wouldn’t send your conversations back to a server to process, that would take an enormous amount of data. It’d be much more efficient to just have a background process running that listens for relatively small set of keywords, and when it hears them just send that with some context information back to a server to add to your profile.

Anonymous 0 Comments

I talked about a person I had never been able to find on Facebook and a couple days later they showed up on my people you may know list for the first time ever.