Computers can only represent a limited amount of steps when representing something like an image or sound. Even if a computer is computing decimal places, there is always going to be a limit on how many decimal places it can work with.
This has the side effect that if there’s something like an image with a smooth gradient, it has to represent that gradient from dark to light as something like `[0.0, 0.5, 1.0, 1.5, 2.0, …]`, and depending on the distance between the gaps, the jumps between each “step,” can be extremely noticable. Sound is similar, in that each “sample,” has to be represented as a value, so the lower the precision, the larger the steps between them, and the worse the audio can sound.
Dithering fixes this by jumbling the data a bit, sort of like “stippling,” and image, to _fake_ a smoother image/sound without actually having to increase audio quality.
This picture shows what dithering is with images:
[https://upload.wikimedia.org/wikipedia/commons/1/18/Dithered_bgw.png](https://upload.wikimedia.org/wikipedia/commons/1/18/Dithered_bgw.png)
Basically if you want to smoothly transition from black to white but you don’t have a lot of shades of gray, you use patterns of dots instead.
The same is true for audio. CDs are a digital format, that means that they use whole numbers like 1, 2, 3 to represent the audio signal.
Sometimes, especially for very quiet music, there aren’t quite enough “levels of gray” to represent the audio signal quite right. So they use dithering to approximate it.
It turns out that if you don’t dither properly, it can add some audible noise / sound artifacts that you can hear. With good dithering the artifacts are smoothed out so they’re not perceptible.
The Wikipedia article is long but it has some examples of audio dithering you can listen to.
[https://en.wikipedia.org/wiki/Dither](https://en.wikipedia.org/wiki/Dither)
You might not be able to hear the difference unless you’re using high-quality headphones connected directly to a PC. If you’re using cheap speakers or bluetooth headphones then you won’t hear the difference because the difference is too small for those to reproduce.
It’s randomization to prevent undesirable patterns when quantizing something.
For example, when recording sound digitally you measure the sound and give it a value. If you want to reduce the number of samples by half (to make the file smaller) you’d have two samples you need to turn into one. You could round them, or discard one, but because sound is a repeating pattern if you do it the same way every time it creates weird distortions in the sound.
Dithering randomizes these, so instead of a distortion you can hear there’s just a little more noise in the sample that doesn’t really effect how it sounds.
TLDR dithering is a tiny, quiet amount of noise that’s added to the audio file so when it’s converted to a CD-quality audio file, the file converts more cleanly.
This is a tricky one for me to explain to a 5YO, but I’ll do my best to get as close to a 5YO-level explanation that I can. First, try to imagine the typical graph of a sound wave, the big wavy line that goes above *and* below the x axis. This is the image I’ll be working from.
Let’s first talk about capturing audio on a computer. Computers work with binary code – all information stored is either a 1 or a 0. This makes it hard to record a constantly moving, constantly changing source like a sound file, a signal that has infinite points over time. So, the best way to record audio on a computer is to take “snapshots” of the audio file. As the sound comes in to the computer, we take a picture of where we are above or below the x axis.
This leads us to two terms:
Sample rate – how many times per second we take a picture
Bit depth – how many 1s and 0s we have at our disposal to say how far above or below the x-axis we are. The higher the bit depth, the louder the sound we can record.
Most audio these days are recorded at a sample rate of 48000Hz, which means we’re taking 48,000 pictures a second, and it’s recorded at 24bit, which means we have 24 1s and 0s as our loudness resolution.
CDs are written at 441000Hz and 16 bit, so we lose quality when we go down. When we trim 8bits from every snapshot, we run the risk of introducing noise into the audio file. We cut those 8 bits from the bottom of the audio wave, we might have some low level noise that exists below the threshold of 16bit and is then registered as silence, but sometimes that noise will peak back up into the 16bit spectrum and suddenly you have noise. So in the conversion from 24 to 16 bit, we can have instances where we have digital silence and then little spurts of low level noise. Not ideal
This is where dither comes in. It’s super quiet noise that is just loud enough to still be audible in the 16bit world, but so quiet that virtually no one can hear it or notice it. We put dither on an audio wave before we convert it to CD quality – we just print it on to the audio file.
Latest Answers