AnswerCult

Question

1.13K viewsJuly 7, 2024Engineering Other

Question 100.55K July 7, 2024 0 Comments

I’m always utterly amazed that my phone can hear something, and match it – how’s it do that??

In: Engineering

8 Answers

Answer 1 · 2024-07-07T19:12:47+00:00

Anonymous Posted July 7, 2024 0 Comments

Shazam records a section of the song, creates a spectrogram (like a fingerprint for the specific song) then tries to match that specific finger print to other songs.

Answer 2 · 2024-07-07T19:23:47+00:00

Songs are made of sounds. Sounds (more generally, any kind of wave) can be mumbled, jumbled, mixed and many things, but they have a nice property: if you mix two notes (frequencies) together even if they mix they can be mathematically divided again in a thing that is called a spectrogram, that is basically a list of all the notes that are played together at a single time. This is really nice, because even if you have sound jumbled and mumbled you still can divide it and have a nice fingerprint of the song. And each instrument, voice, and hence song has a peculiar spectrogram, which is what our brain uses to discern different sounds. Notes are like the colors of sound.

What Shazam does is calculate this fingerprint, and since different songs have different sounds, it can be used to identify a song. And like colors, it’s really difficult to distort a sound so much that it cannot be determined, because frequencies tend to stay the same even with noise or obstacles, unlike amplitude (volume) that can be used to recognize songs but only if the recording is really really accurate, because noise and obstacles have a greater impact on amplitude than on frequency

Answer 3 · 2024-07-07T19:38:38+00:00

What I’m wondering is did Shazam have to spend money for some kind of licensing fee for all the music it had to analyze to make the identifying fingerprints? I’m gonna guess no since that would have been prohibitively expensive.

Answer 4 · 2024-07-07T20:20:35+00:00

He is the alter ego of Billy Batson, a boy who, by speaking the magic word “SHAZAM!” (acronym of six “immortal elders”: Solomon, Hercules, Atlas, Zeus, Achilles, and Mercury), is transformed into a costumed adult with the powers of superhuman strength, speed, flight, and other abilities.

Answer 5 · 2024-07-07T19:51:43+00:00

Other comments are missing what a fingerprint is.

A spectrogram is the result of applying a fourier transform to the input signal, it produces a matrix shaped `number of frequencies X time instants`. Basically now the content of any frequency at any point in time is known.

Then, a set of points (local maximums) are selected so that they spread across the whole spectrogram. Since these points are local maximums its likely they’re gonna survive even if the recording comes from a noisy environment.

Each of those maximums is paired to another maximum which is close in terms of frequency and time, the pairs with lower energy content are discarded (energy is the value of a point).

A fingerprint is the result of applying a certain hashing function to a pair of points, it takes the frequency and time instant of each point into account.
N pairs = N fingerprints
For any song a LOT of fingerprints are produced and stored in a database.

When you send a recording to Shazam, it goes through this process of fingerprint extraction. The extracted fingerprints are then used to query their database and if you’re lucky there will be some (many) matches.

Those matches are then filtered out to exclude false positives. For example:
* song A 100 fingerprints matched
* song B 20 matched,
* song C 10 matched

It’s likely the recording you sent is taken from song A.

SOURCE: I’ve implemented a similiar audio fingerprint algorithm

Answer 6 · 2024-07-07T20:41:22+00:00

Chunks of every song get turned into numbers called vectors, and those vectors get stored in a database.

When you record a bit of a song in the Shazam app, your recorded data gets turned into the same kind of vector numbers that they put in the database.

They compare the vector numbers from your recording to the vectors stored in the database. The closest set of numbers is probably a chunk of the song you’re looking for.

https://www.elastic.co/blog/searching-by-music-leveraging-vector-search-audio-information-retrieval

Answer 7 · 2024-07-07T21:31:15+00:00

If you want to find out the details, this course covers it: https://www.coursera.org/learn/audio-signal-processing (free)

It’s one of the last modules, so you will need to work your way through the FT, STFT, Harmonic model, etc. to get the technical knowledge to really understand audio feature extraction.

I’ve done this course myself and it’s very good, if you like mathematics and audio signal processing.

Answer 8 · 2024-07-07T21:55:42+00:00

What’s even more cool is that the Pixel phone comes with a chip that does this. Every song. It then saves the titles automatically for you so if you’re out and about and hear a song you like, my Pixel will have it already logged.

AnswerCult

how on earth does Shazam work?

8 Answers

Search questions

Popular Questions

Latest Answers