Sound is wave, you can imagine it as a line on a graph.
Think back to school, you remember that lesson where they drew little rectangles under a curve?
They said that the more rectangles you drew the more closer all the rectangles would match the curve.
Well digital audio does that.
The sounds moves a magnet a tiny bit, they record how far the magnet is pulling.
This forms one box under the sound curve.
Now they just have to make several tens of thousands of boxes for every second of music.
For example, a CD has 44,100 of those boxes for every second.
For the human ear that’s close enough to the real curve that we recognize it as sound.
If you think of a digital picture, you essentially have a bunch of tiny, single-color pixels which, when put together, make a pretty good recreation of whatever you take a picture of. The same is true for a digital audio file, it’s essentially a ton of tiny “pictures” of the audio signal and when you put them together and play them back, it’s a fairly accurate and sometimes exact replica of the original audio, depending on the audio format. You have an analog-to-digital converter which takes in the audio as input, and separates it into millions of tiny pieces of sound, often 44,100 times per second. Each of these tiny pieces is an approximate record of the sound of the audio signal during that fraction of a second. When you play an audio file, the computer stitches all these tiny fragments back into a continuous sound which is played by a speaker.
Audio isn’t unique. That was your first mistake.
Sound is just a pressure wave in air. Sound only has meaning because you’ve learnt that certain sounds have a meaning.
If you know what a sine wave looks like, that’s basically what sound is. A compression and a rarefaction in air.
That can easily be turned into an AC electrical signal because it’s the same thing, except positive and negative voltage.
That can easily be turned into a bunch of ons and offs, 0s and 1s, which is what digital is.
A digital representation of a waveform carves it up into slices. For a CD like others have said, that is 44100 times a second.
Nyquist-Shannon theory states that to recreate a waveform, any waveform, you need to sample it twice. Humans can hear between 20Hz and 20000Hz. Two samples per wavelength therefore would be 40000Hz. Add a bit for posterity and you get 44100Hz.
Nothing unique about it. Simple physics.
Pretty much every week ELI5 has some question about how this or that can possibly be stored in binary. The answer is the same in every case.
Binary is just a number system. Rather than having 10 digits like we have in our decimal system, binary has 2 digits, which we represent as 0 and 1. Any decimal number you can think of, can be represented in binary. Computers use binary because they’re electronic devices, and it’s easier for them to understand two different electrical voltage levels (on/off, high/low etc) than it is to try and measure a range of different values with varying voltage. “Bits” and “Bytes” are just word we use for computers working with individual binary digits (eg. an 8-bit number, aka a Byte, is just a binary number with 8 digits (10101101), just like “47” is a 2-digit number). “Bit” is actually shorthand for “Binary digit”.
Computer instructions can be represented by numbers. CPUs are built with a list of instructions they can perform, where each instruction is indicated by a number, which then also tells the CPU to expect more numbers as parameters for that instruction. So you could build a CPU where 0110 is the instruction for “Add”, which will the add the 2 4-digit numbers that are sent next. So you send 0110 0001 0101 to “Add 1 and 5” and get 0110 (6) as output. x86 is one such standard instruction set for CPUs.
But data can also be represented as numbers. Text can be represented as numbers, with each character being represented by an 8-bit number. This is the ASCII standard, but other standards have been developed (eg. Unicode) so we can use non-English characters and other stuff like Emoji too.
Images can be represented as numbers. At the basic level all you have to do is specify the amount of Red, Green and Blue for each pixel in order from left to right, top to bottom. Add stuff like compression, transparency, metadata etc and that becomes the specification for a file format like JPEG or GIF.
Sound can be represented as numbers. Sound is just a wave of pressure changing over time from high pressure (compression) to low pressure (expansion). Measure the amount of pressure, say, 44100 times per second, and you can recreate the pressure wave. Store those measurements as a list of numbers, and now you have essentially a WAV file. Add compression and metadata, now you have an MP3.
So in answer to your question, sound isn’t unique. It can be measured, and represented as numbers (not to an infinite level of detail, but enough to more or less resemble the original) just like most other information can, and anything that can be represented as a set of numbers can be stored in binary on a computer. And as long as you store those numbers in a specific known format, people can write software that can understand it and possibly use the machine’s output devices (monitor, speakers, printer etc) to recreate it.
Latest Answers