How can float numbers of same bit depth as fixed ones give better resolution (SNR in audio) while both can represent same number of unique values?

236 views

How can float numbers of same bit depth as fixed ones give better resolution (SNR in audio) while both can represent same number of unique values?

In: 0

9 Answers

Anonymous 0 Comments

Let’s say you have 6 digits and you want to represent positive values from 0 to 999 999. Easy, we just store the value as it is.

This gives us a fixed precision regardless of what value we want to store. Unfortunately that’s not great. A human ear can’t really tell the difference between 999 998 and 999 999, they are both ear-splittingly loud and only differ by 0.0001%, so we have way more precision than we need there. On the other hand, the difference between 5 and 6 is huge, ~20%, meaning that quiet sounds are heavily quantized. This gets worse if you later boost the volume of them, since the precision is already gone.

Ideally we’d want to shift some of the precision at the top to the smaller values. That’s were floating point numbers come in. Instead of just storing 6 digits, we remove two digits and use them as an exponent instead, like how scientific form numbers work.

X.XXX * 10^YY

The first part (X.XXX) is essentially our significant digits, while the second part (YY) is the magnitude of the number.

This greatly changes what numbers we can represent using the same number of digits. The smallest number we can represent is still 0 = (0.000 * 10^00), but the next number is (0.001 * 10^00) = 0.001. Before, we could only go from 0 to 1, so that’s a 1000x improvement in precision for small values!

Not only that, we can now represent much larger values too! Our old max was ~10^6, but we can now do (9.999 * 10^99), which is an unimaginably huge number!

As expected, the drawback is the precision of larger values. 1 000 000 is now written as (1.000 * 10^06). The next number after that is (1.001 * 10^06) = 100100. Our precision is therefore only 1/100th of what it used to be, but it’s good enough for a recording of a jet engine.

A major advantage of floating point is that scaling does not affect precision. If we multiply a number by 10, we just add 1 to the exponent. We still have the same relative precision.

32-bit floats work about the same, but in binary and with a few more tricks.

+- 1.mantissa * 2^(exponent)

1 bit is used for the sign, so that we can represent negative numbers too. We then have 8 exponent bits, giving us an exponent from -128 to +127. The remaining 23 bits form the mantissa (significant binary digits). There are also special values that represent positive/negative infinity, as well as NaN (sqrt(-1), 0/0, inf/inf, etc).

This range and relative precision is amazing, letting us multiply together values of massively different scales, such as the mass of the sun (10^30) and the gravitational constant G (10^-11) and get great relative precision.

And if that *still* isn’t enough? 64-bit double precision floats have 10 exponent bits and 53 significant binary digits.

You are viewing 1 out of 9 answers, click here to view all answers.