We use 44.1 kHz as it was a reasonably small rate that was sufficiently much higher rate than 20 kHz.
There are two effects we needed to consider. The first sets the big picture (double the rate), and the second refined it up a bit (a few kilohertz more).
First, to be able to reproduce any waveform that contains frequencies of up to 20 kHz, we need to sample at minimum at 40 kHz. (This is called Nyquist theorem. You can perfectly reproduce any frequency of less than half the sample rate.)
If we would just sample our original analogue audio at 40 kHz, there would be a huge issue: aliasing. Say, there is an ultrasound component at 29 kHz. Due to sampling at 40 kHz, this sound would appear indistinguishable from a 11 kHz sound. An inaudible sound just became a nasty audible artefact due to our sampling. Same would also happen to any ultrasound at 51 or 69 or … kHz.
Fortunately, there is a conceptually simple solution to aliasing: Before we sample the sound at 40 kHz, we filter out any sound above 20 kHz. No more aliasing, and we get a perfect reproduction of all audible frequencies.
Unfortunately, this exact filter cannot be realised without large compromises to sound recording equipment. It is very hard to pass, say, 19.9 kHz whilst blocking, say, 20.1 kHz.
Fortunately, there was a simple fix suitable for the technology back in the 1970s: rather than sample at 40 kHz, sample at a bit more than that. Now, we still need to filter out anything above 22.05 kHz, but as 20 kHz is much less than that, it is comparably easy to construct such a filter even with analogue electronics.
Latest Answers