There’s a few things behind it.
First, music notes are waves with a particular frequency. So any note can be translated to the number of waves per second. For example, one note might be 440 waves per second.
Second is that if the number of waves per second between two notes are multiples of each other, all the peaks and valleys of the waves line up and they sort of sound the same. So a wave that was 220 waves per second and a note that was 880 waves per second would all line up with the 440 waves per second earlier. We consider those to be the same note on a different octave.
Third is that some combinations of waves sound pleasant to us while others sound grating, like nails on a chalkboard. It’s all based on the ratios between the frequencies of the notes.
And that leads to the way we divide up notes for music- each doubling of notes is considered the same note just in a different octave. The standard tuning most places use these days is that A=440 (but also A=220 and A=880 since it’s doublings) but the ratio is more important than the exact number, and in the past some places might have tuned A=435 or A=450. In between each octave is divided into twelve notes, which gives us a good compromise with having plenty of choices for soothing ratios between notes without being too complicated for musicians to play.
Because of the way the ratios work, we only typically play seven of the twelve notes at a time so those seven main notes are always labeled A-G, but sometimes at the start of a piece it may say “unless we say otherwise, play all the Fs a step up” or “unless we say otherwise, play all the Es and Bs a step down” if those combinations will produce the right ratios.
As far as how the in-between steps are chosen, there’s a couple ways to do it. The “best” way to do it from a sound perspective is to start with a base note (like A=440) and calculate all the other notes using the ratios from there. But that would require you to change the tuning of the notes for different songs, so for instruments like pianos, they’re usually spaced evenly apart (by logarithm, not evenly linearly because of the way hearing works) which keeps the ratios close enough that they still sound good even though they’re a little bit off from the perfect ratios.
Latest Answers