Family Encyclopedia >> Electronics

How music identification apps work

Since its debut in 1999, Shazam has been used to identify songs over fifty billion times, not even counting IDs from Soundhound, MusicID and other sound recognition apps.

SummaryWhat is the idea behind these apps?How do they work?Song FingerprintsMusic and more

From a user's perspective, it's simple:launch the app, press a button, and let your phone play the song. After a few seconds, even with background noise and distortion, the app will tell you what the song is. It works so fast and so well that it seems almost magical – but, like most magical things these days, it's mostly driven by algorithms.

What's the idea behind these apps?

How music identification apps work

Simply put, the process looks like this:

  1. The app's database contains a massive collection of song "fingerprints" or small bits of data about the song's unique sound patterns.
  2. When a user presses the "Record" button, the app listens to the music and creates a fingerprint based on the few seconds of audio it hears.
  3. This fingerprint is checked against the database of existing fingerprints. If your ten-second fingerprint matches part of a song, you get your song's result (hopefully correct). If not, you will get an error.

If you're just looking for a surface-level explanation, that's all you need to know. The really interesting part is how you get that fingerprint.

Song-fingerprints

How music identification apps work

It all starts with a spectrogram, like the one in the graphic above, taken from an article written by one of the founders of Shazam, Avery Wang. It is basically a graph with time on the x-axis (horizontal), frequency on the y-axis (vertical) and amplitude represented by different levels of color intensity. Any sequence of sounds can thus be converted into a spectrogram, and any point in the spectrogram can be assigned a set of coordinates. Just like that, notes can be numbers.

If all you had to do was match a few sounds to each other, you could stop here. If you want to sift through a database full of millions of songs, however, a detailed spectrogram contains far too many data points to sift through at any speed.

The big breakthrough in music recognition was the realization that you can identify sounds with just a few pieces of data:peaks or loudest parts. Not only does getting rid of most of the low-energy parts of a song decrease the size of the spectrogram, it makes apps less likely to identify muffled, consistent background noise as part of the target sounds. Imagine a city skyline – the most identifiable parts are the tops of buildings, not the floors in between, and that's what you can see from the farthest distance.

So every second of every song is whittled down to some of the most intense data points; everything on the city rooftops is removed except the top. But it's still not efficient enough to be immediately searchable, so the next step is to "chop" that sequence of peaks. Hashing simply takes a set of inputs, runs them through an algorithm, and assigns them an integer output. In this case, the hash is generated by taking two of the high intensity peaks, measuring the time between them, and adding their two frequencies together.

The result is a string of numbers, easily storable and searchable. When a computer reads this hash, it will recognize them as representing frequency and distance-time. Once all the peaks in the song have been identified and hashed, the transformation is complete:the song now has a unique 32-bit number that serves as its identifier in the database. More importantly, every second of the song is represented by the numbers.

When your phone hears music, it goes through exactly this process:it filters out all but the highest points, chops them up, and creates a fingerprint for the few seconds it recorded. Once that's done, your phone just needs to see where the matching strings of numbers appear in the database, allowing it to match the detected frequencies and timing to the right song and send it back to you in a matter of seconds. seconds.

Music and more

This technology has been most widely used for music recognition, but sound recognition apps can also work with movies, commercials, TV shows, birdsong, and more. Shazam and Soundhound are the best known, but now you can also Google what song is playing and get an accurate answer.

And if you're wondering, "Do these companies keep track of which songs are being asked about?" " The answer is yes. Music ID stats have actually been able to predict the success of songs and artists with a fairly high level of accuracy, and major record labels like Warner have contracted with apps like Shazam to help find artists. promising. So if you want to support an artist, you might as well do your part and search for their song! You can just help them take off.