how on earth does Shazam work?

502 viewsEngineeringOther

I’m always utterly amazed that my phone can hear something, and match it – how’s it do that??

In: Engineering

8 Answers

Anonymous 0 Comments

Other comments are missing what a fingerprint is.

A spectrogram is the result of applying a fourier transform to the input signal, it produces a matrix shaped `number of frequencies X time instants`. Basically now the content of any frequency at any point in time is known.

Then, a set of points (local maximums) are selected so that they spread across the whole spectrogram. Since these points are local maximums its likely they’re gonna survive even if the recording comes from a noisy environment.

Each of those maximums is paired to another maximum which is close in terms of frequency and time, the pairs with lower energy content are discarded (energy is the value of a point).

A fingerprint is the result of applying a certain hashing function to a pair of points, it takes the frequency and time instant of each point into account.
N pairs = N fingerprints
For any song a LOT of fingerprints are produced and stored in a database.

When you send a recording to Shazam, it goes through this process of fingerprint extraction. The extracted fingerprints are then used to query their database and if you’re lucky there will be some (many) matches.

Those matches are then filtered out to exclude false positives. For example:
* song A 100 fingerprints matched
* song B 20 matched,
* song C 10 matched

It’s likely the recording you sent is taken from song A.

SOURCE: I’ve implemented a similiar audio fingerprint algorithm

You are viewing 1 out of 8 answers, click here to view all answers.