A lot of people are discussing how we can tell because of compressed audio or low quality equipment, but theres also a distinct difference even when you’re listening to a high-quality poscast taped in a recording studio. Here’s an explanation for how we can tell, even with high quality recording and playback equipment where background noise and other factors are eliminated:
You ever had someone whisper into your ear? You know how when people use certain sounds, like an T sound or a P sound and those feel much louder than the rest? These sounds are called “plosives,” and they happen when you block off your airway for a second and release that stored up energy all at once, rather than consistently like you would with an A sound or an E sound. They’re usually called something like hard consonants because of this.
In normal speech, these are generally close enough to the same volume that it doesn’t really register as different. When they’re spoken directly into your ear though, you can think of it as the extra bit of air that comes out making the sound feel a bit louder and more intense. In this sense, a microphone is basically an ear; even with a pop filter (which is designed to stop that extra air from entering the microphone) some of that plosive energy is still picked up. This effect, on top of the extra sharpness and clarity you get from speaking so close to your “listener” (the microphone) means that the sound is picked up in a completely different way than if it were spoken in-person. When the sound is reproduced, all of this extra information gets reproduced as well, so even if you’re listening through speakers from a distance, you’re still hearing as though your ear was where the microphone was when the words were spoken. That’s the information that was fed to the microphone, so that’s what the speakers are told to reproduce.
This process is called “mediation,” where the information we use to perceive the world goes through an extra process before it reaches our primary senses, and then our brain interprets it. There are also a bunch of other ways microphones add in extra mediation compared solely to the human ear outside of the ones I already described, but I could sit here all day and talk about the nuances of different conditions and that’d veer more out of ELi5 territory than I already have.
Source: communications degree incl. experience in sound studies
TL;DR: We can tell when something is a digital reproduction because the way we are listening (our position relative to the “speaker,” the volume we’re hearing the words compared to the intensity they’re being spoken at, etc.) doesn’t line up with the way we’re hearing.
Latest Answers