Your device (Google, Apple, Amazon) has some little pieces of you saying some letters and words, then when you look for something, your device compare what you are saying with that records and that’s how it works, also every time you use voice recognition you are improving it, because your new voice records are saved and then analysed too.
In present days it’s mostly like it happens in our brains: flow of sounds separated into set of different frequencies, then it’s feeded to neural network, which remembers how each word sounds like (or, to be specific, how it’s looks like a sequence of sets of frequencies). Neural network can compress, scale or transform sounds, so it can recognize words even if there’s some noise on record, or if it’s said with different pitch characteristic to person’s voice, or with different tempo.
Latest Answers