Simplest way to understand is picture a typical sine wave, with a horizontal line through the middle of the wave. Now pick a point above the line and draw a line from this point to the middle line (and its to be perpendicular)
Computers store sound as a series of these measurements. A positive number says the point is above linr, a negative number says the point is below the line.
You’ll see sound quality represented as both a frequency, and a bit value eg 8bit, 16bit, 24bit.
Frequency represents how many of these measurements you’re making per second. So 44Khz sound you have 440000 measurements to represent the sound.
The bit value is the resolution of your measurement. So for 8bit sound, the maximum value/range/resolution you can measure is +127, the lowest is -127 and it’s one byte per measurement . For 16bit it’s ±16384, and two bytes per measurement. For 24bit sound it’s 3 bytes per measurement.
Now if the sound is stereo then that’s two streams of sound, one channel for left and one for right. Each stream is stored separately. If the audio is Dolby 5.1 then that’s 6 audio streams, all stored separately.
So putting it all together, if you have stereo 16bit 44khz audio, that’s 2×2×44000 bytes of data to store one second of sound.
Now there are various compression and encoding algorithms to store sound data so it takes less bytes, but this is the simplest way to store sound digitally.
Latest Answers