If you’re interested in going (way) beyond the ELI5 version, Apple has a Machine Learning Research blog post where they explain how it works in great detail: [Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant](https://machinelearning.apple.com/research/hey-siri). That link is from 2017, but you can also [search](https://machinelearning.apple.com/research?domain=Speech+and+Natural+Language+Processing&page=1&q=siri) for more recent posts showing subsequent updates.
One of the most interesting things, I think, in the original post is how the threshold for detection is dynamic. It calculates how confident it is that you actually said “Hey Siri” and if it’s just below the threshold, it temporarily lowers the threshold so that if you immediately repeat the request, it will be caught the second time.
Latest Answers