To give a bit more background than some of the other answers have:
Sound, when we hear it, is the vibrations of your eardrum being moved back and forth by a slight change in the air pressure next to it in your ear. Or if your head happens to be underwater right now, the water pressure. But there’s something moving next to your eardrum, causing your eardrum to get moved back and forth a bunch. Your ear picks that up, transforms it into nerve signals in to your brain, you perceive “sound”. You already knew that.
If the air in your ear is going from “the most pressure right now” to “the least” and then back to “the most” again 1000 times per second, we say that is a frequency of 1000Hz (or 1kHz, 1 thousand cycles per second). It’s often said that humans can “hear from 20Hz to 20kHz” – but as we get older, that 20kHz top end drops off (our ears get a bit less good at the high-frequency stuff as we age). Most of the important information in human speech is generally between 300Hz and 3kHz.
A microphone is kinda like your eardrum, but it turns the air vibrations in to an electrical signal that we then process with other electronic stuff. Before we got in to CDs and “digital sound”, this was all analogue – meaning the signal level can be basically anything from zero to “totally overloaded” and anything in between. When we were recording to analogue things like magnetic tape (or earlier, shellac cylinders or discs by using a needle), there were limits in how high an of audio frequency could be recorded and played back. With magnetic tape, in the era of reel-to-reel tape machines you could choose what speed the tape would move through the machine to set if you cared more about higher audio quality (faster speed) or recording on the same length of tape for longer (lower speed).
When we move over to digital sound, the recording equipment detects and stores “what was the value of the electrical sound signal right now? (as a number)” a bunch of times every second. A single capture of “the signal was at X level” is known as a “sample”, and the number of times every second that these measurements are taken is called the “sample rate”. Because each individual snapshot of signal-level is a number in a fixed range (for CDs that’s 65,536 different possible level values – being 16 bits), they are no longer infinitely variable in the way that the original electrical signal was – they’re digitized. Turned in to digits. Ignoring fancy compression technologies (because CDs don’t use them), this means the higher the sample rate, the more computery-data is generated per second of recording time. CDs are stereo, so both Left and Right channels are recorded. Two bytes per channel for each sample, multiplied by the sample rate, is 176,400 bytes per second. So you’d fit maybe 8 seconds of that on to a 1.44mb floppy disc (y’know, those “save icon” things).
The “44.1kHz” number for a CD is the sample rate – that is, the number of times every second that the analogue electrical signal from a microphone is measured. The absolute highest sound frequency that can possibly be represented by a stream of 44,100 samples per second is half that number – or 22.05 kHz. And that’s the best case – if sample #1 lands on the positive peak (maximum electrical signal level), sample #2 lands on the negative peak (minimum electrical signal level), and sample #3 catches the next positive peak, and so on. If the air is vibrating faster than that, meaning the electrical signal is changing faster than that, those changes cannot be recorded. That is the “Nyquist-Shannon sampling theorem” stuff that the other answers are jumping straight to – they’re entirely correct, I just preferred to take the scenic route.
**But** because humans wouldn’t be able to hear any audio frequencies above 22kHz anyway, it doesn’t matter that CDs can’t record them (or play them back).
Latest Answers