At some point, when you enabled that feature, google downloaded a file onto your phone of thousands (millions?) of popular songs. If the entire song was downloaded, it would be more space than your phone has, so a “fingerprint” method is used. If you’ve ever seen audio editing software, you’ll see that graph of the song going up and down with the olume. If it’s recorded with a few mics, you might see a graph for each “channel” (input). When playing back in stereo, you would see two graphs.
Shazam (the original song identifier) took this idea and realized that just by looking at that graph, which is much less information than the actual song, you could create a “fingerprint” of the song and identify it. However, just volume changes weren’t quite enough information. Instead, they broke down the song into frequencies (how high or low pitched sounds are). They then took a few particularly popular, but narrow frequencies and focused on the graph of a couple of those. Now you have a few layers you can compare over time, and a sample of a song can be easily matched to any point within a song – it can still identify a song, even if you start mid way through, because of this (like identifying part of a picture, with volume, frequency and time being the dimensions instead of Up, Down, and Left to right)
One huge advantage of looking at specific and narrow frequencies is that it allows more precise data in the fingerprint, but it also allows the system to filter out background noise from your phone while identifying the song.
Before you say “well that sounds easy” – you have to realize there is also a TON of math going on in the background converting songs to a digital format, and using that format to create the digital fingerprint. Analog (actual sound) travels in waves. Digital recordings prefer straight lines. A ton of calculus goes into translating these straight lines back into wave form. Shazam (or google) takes advantage of this math to further compress their fingerprints.
An interesting addon: Shazam’s idea of fingerprinting songs was picked up by the FBI (and google and a few other companies) to combat child pornography. When CP is confiscated, a fingerprint of it can be created that doesn’t contain the actual image. This fingerprint can be used to effectively scan databases to look for that same image being stored elsewhere – even if certain encryption steps are taken – all with minimal workload on the server’s part.
TLDR: Shazam, the original song identifier, creates a barebones fingerprint of the song which you can download, taking up minimal space and compare against what the phone is listening to.
Latest Answers