AnswerCult

Question

427 viewsJanuary 1, 2024

Question 100.55K April 5, 2023 0 Comments

How can float numbers of same bit depth as fixed ones give better resolution (SNR in audio) while both can represent same number of unique values?

In: 0

9 Answers

Answer 1 · 2023-04-05T09:09:06+00:00

Floating point numbers have a logarithmic scale to them. That is, they can have, say, eight unique values like 0.01, 0.03, 0.1, 0.3, 1, 3, 10 and 30. While integers would have something like 0, 1, 2, 3, 4, 5, 6, and 7. This is not a real example, just a demonstration of the principle.

Answer 2 · 2023-04-05T09:09:06+00:00

Floating point numbers have a logarithmic scale to them. That is, they can have, say, eight unique values like 0.01, 0.03, 0.1, 0.3, 1, 3, 10 and 30. While integers would have something like 0, 1, 2, 3, 4, 5, 6, and 7. This is not a real example, just a demonstration of the principle.

Answer 3 · 2023-04-05T10:05:38+00:00

Let’s say you have 6 digits and you want to represent positive values from 0 to 999 999. Easy, we just store the value as it is.

This gives us a fixed precision regardless of what value we want to store. Unfortunately that’s not great. A human ear can’t really tell the difference between 999 998 and 999 999, they are both ear-splittingly loud and only differ by 0.0001%, so we have way more precision than we need there. On the other hand, the difference between 5 and 6 is huge, ~20%, meaning that quiet sounds are heavily quantized. This gets worse if you later boost the volume of them, since the precision is already gone.

Ideally we’d want to shift some of the precision at the top to the smaller values. That’s were floating point numbers come in. Instead of just storing 6 digits, we remove two digits and use them as an exponent instead, like how scientific form numbers work.

X.XXX * 10^YY

The first part (X.XXX) is essentially our significant digits, while the second part (YY) is the magnitude of the number.

This greatly changes what numbers we can represent using the same number of digits. The smallest number we can represent is still 0 = (0.000 * 10^00), but the next number is (0.001 * 10^00) = 0.001. Before, we could only go from 0 to 1, so that’s a 1000x improvement in precision for small values!

Not only that, we can now represent much larger values too! Our old max was ~10^6, but we can now do (9.999 * 10^99), which is an unimaginably huge number!

As expected, the drawback is the precision of larger values. 1 000 000 is now written as (1.000 * 10^06). The next number after that is (1.001 * 10^06) = 100100. Our precision is therefore only 1/100th of what it used to be, but it’s good enough for a recording of a jet engine.

A major advantage of floating point is that scaling does not affect precision. If we multiply a number by 10, we just add 1 to the exponent. We still have the same relative precision.

32-bit floats work about the same, but in binary and with a few more tricks.

+- 1.mantissa * 2^(exponent)

1 bit is used for the sign, so that we can represent negative numbers too. We then have 8 exponent bits, giving us an exponent from -128 to +127. The remaining 23 bits form the mantissa (significant binary digits). There are also special values that represent positive/negative infinity, as well as NaN (sqrt(-1), 0/0, inf/inf, etc).

This range and relative precision is amazing, letting us multiply together values of massively different scales, such as the mass of the sun (10^30) and the gravitational constant G (10^-11) and get great relative precision.

And if that *still* isn’t enough? 64-bit double precision floats have 10 exponent bits and 53 significant binary digits.

Answer 4 · 2023-04-05T09:52:46+00:00

Part of the bits in the number select a floating window that the remaining bits can vary within at the given instant. With an integer format you have only one window that spans the entire range.

Suppose your signal is gradually decreasing in loudness like at the end of a programme. In this example it can be in range from 0 to 15 (4 bits). At the beginning you use the whole range using sixteen coarse steps. It is ok because any tiny variations are masked by loud sound.

In the mid point of our interval the signal is between 0 and 7. With an integer format you can use only eight steps. The ear has adapted to quieter level and could start to discern more detail. Towards the end, the level is between 0 and 1, and the integer format has only 2 steps to work with. This is really bad. Our ear is now the most sensitive. And there are 14 steps on our scale that will never be used.

With floating point you can use additional bits to make a note about the range that current value is within. Let there be 4 ranges. In the beginning the range is 3 and nothing changes, but you have spent 2 extra bits for no gain. But in the mid point the selected range is 2. You still have the same sixteen steps. Near the end, the range is 0, but on our scale between 0..1 there are still sixteen steps. So now we can record 0,0.125,0.25,0.375…

Think about where you can hear noise in a practical example. It will be during quiet sections.

There are digital compression and analogue companding systems that work on a similar principle of adapting to the current level for improving the dynamic range.

Note that floating-point doesn’t permit both quiet and loud sounds to exist at the same time. When we are in range of 0..16, we can only work with whole steps and can’t add 0.25. Signal processing that involves addition of many quiet signals such as reverb still needs a generous number of bits for the intermediate results, but the final output can be more compact.

Answer 5 · 2023-04-05T09:09:06+00:00

Floating point numbers have a logarithmic scale to them. That is, they can have, say, eight unique values like 0.01, 0.03, 0.1, 0.3, 1, 3, 10 and 30. While integers would have something like 0, 1, 2, 3, 4, 5, 6, and 7. This is not a real example, just a demonstration of the principle.

Answer 6 · 2023-04-05T10:05:38+00:00

Let’s say you have 6 digits and you want to represent positive values from 0 to 999 999. Easy, we just store the value as it is.

This gives us a fixed precision regardless of what value we want to store. Unfortunately that’s not great. A human ear can’t really tell the difference between 999 998 and 999 999, they are both ear-splittingly loud and only differ by 0.0001%, so we have way more precision than we need there. On the other hand, the difference between 5 and 6 is huge, ~20%, meaning that quiet sounds are heavily quantized. This gets worse if you later boost the volume of them, since the precision is already gone.

Ideally we’d want to shift some of the precision at the top to the smaller values. That’s were floating point numbers come in. Instead of just storing 6 digits, we remove two digits and use them as an exponent instead, like how scientific form numbers work.

X.XXX * 10^YY

The first part (X.XXX) is essentially our significant digits, while the second part (YY) is the magnitude of the number.

This greatly changes what numbers we can represent using the same number of digits. The smallest number we can represent is still 0 = (0.000 * 10^00), but the next number is (0.001 * 10^00) = 0.001. Before, we could only go from 0 to 1, so that’s a 1000x improvement in precision for small values!

Not only that, we can now represent much larger values too! Our old max was ~10^6, but we can now do (9.999 * 10^99), which is an unimaginably huge number!

As expected, the drawback is the precision of larger values. 1 000 000 is now written as (1.000 * 10^06). The next number after that is (1.001 * 10^06) = 100100. Our precision is therefore only 1/100th of what it used to be, but it’s good enough for a recording of a jet engine.

A major advantage of floating point is that scaling does not affect precision. If we multiply a number by 10, we just add 1 to the exponent. We still have the same relative precision.

32-bit floats work about the same, but in binary and with a few more tricks.

+- 1.mantissa * 2^(exponent)

1 bit is used for the sign, so that we can represent negative numbers too. We then have 8 exponent bits, giving us an exponent from -128 to +127. The remaining 23 bits form the mantissa (significant binary digits). There are also special values that represent positive/negative infinity, as well as NaN (sqrt(-1), 0/0, inf/inf, etc).

This range and relative precision is amazing, letting us multiply together values of massively different scales, such as the mass of the sun (10^30) and the gravitational constant G (10^-11) and get great relative precision.

And if that *still* isn’t enough? 64-bit double precision floats have 10 exponent bits and 53 significant binary digits.

Answer 7 · 2023-04-05T09:52:46+00:00

Part of the bits in the number select a floating window that the remaining bits can vary within at the given instant. With an integer format you have only one window that spans the entire range.

Suppose your signal is gradually decreasing in loudness like at the end of a programme. In this example it can be in range from 0 to 15 (4 bits). At the beginning you use the whole range using sixteen coarse steps. It is ok because any tiny variations are masked by loud sound.

In the mid point of our interval the signal is between 0 and 7. With an integer format you can use only eight steps. The ear has adapted to quieter level and could start to discern more detail. Towards the end, the level is between 0 and 1, and the integer format has only 2 steps to work with. This is really bad. Our ear is now the most sensitive. And there are 14 steps on our scale that will never be used.

With floating point you can use additional bits to make a note about the range that current value is within. Let there be 4 ranges. In the beginning the range is 3 and nothing changes, but you have spent 2 extra bits for no gain. But in the mid point the selected range is 2. You still have the same sixteen steps. Near the end, the range is 0, but on our scale between 0..1 there are still sixteen steps. So now we can record 0,0.125,0.25,0.375…

Think about where you can hear noise in a practical example. It will be during quiet sections.

There are digital compression and analogue companding systems that work on a similar principle of adapting to the current level for improving the dynamic range.

Note that floating-point doesn’t permit both quiet and loud sounds to exist at the same time. When we are in range of 0..16, we can only work with whole steps and can’t add 0.25. Signal processing that involves addition of many quiet signals such as reverb still needs a generous number of bits for the intermediate results, but the final output can be more compact.

Answer 8 · 2023-04-05T10:05:38+00:00

Let’s say you have 6 digits and you want to represent positive values from 0 to 999 999. Easy, we just store the value as it is.

This gives us a fixed precision regardless of what value we want to store. Unfortunately that’s not great. A human ear can’t really tell the difference between 999 998 and 999 999, they are both ear-splittingly loud and only differ by 0.0001%, so we have way more precision than we need there. On the other hand, the difference between 5 and 6 is huge, ~20%, meaning that quiet sounds are heavily quantized. This gets worse if you later boost the volume of them, since the precision is already gone.

Ideally we’d want to shift some of the precision at the top to the smaller values. That’s were floating point numbers come in. Instead of just storing 6 digits, we remove two digits and use them as an exponent instead, like how scientific form numbers work.

X.XXX * 10^YY

The first part (X.XXX) is essentially our significant digits, while the second part (YY) is the magnitude of the number.

This greatly changes what numbers we can represent using the same number of digits. The smallest number we can represent is still 0 = (0.000 * 10^00), but the next number is (0.001 * 10^00) = 0.001. Before, we could only go from 0 to 1, so that’s a 1000x improvement in precision for small values!

Not only that, we can now represent much larger values too! Our old max was ~10^6, but we can now do (9.999 * 10^99), which is an unimaginably huge number!

As expected, the drawback is the precision of larger values. 1 000 000 is now written as (1.000 * 10^06). The next number after that is (1.001 * 10^06) = 100100. Our precision is therefore only 1/100th of what it used to be, but it’s good enough for a recording of a jet engine.

A major advantage of floating point is that scaling does not affect precision. If we multiply a number by 10, we just add 1 to the exponent. We still have the same relative precision.

32-bit floats work about the same, but in binary and with a few more tricks.

+- 1.mantissa * 2^(exponent)

1 bit is used for the sign, so that we can represent negative numbers too. We then have 8 exponent bits, giving us an exponent from -128 to +127. The remaining 23 bits form the mantissa (significant binary digits). There are also special values that represent positive/negative infinity, as well as NaN (sqrt(-1), 0/0, inf/inf, etc).

This range and relative precision is amazing, letting us multiply together values of massively different scales, such as the mass of the sun (10^30) and the gravitational constant G (10^-11) and get great relative precision.

And if that *still* isn’t enough? 64-bit double precision floats have 10 exponent bits and 53 significant binary digits.

Answer 9 · 2023-04-05T09:52:46+00:00

Part of the bits in the number select a floating window that the remaining bits can vary within at the given instant. With an integer format you have only one window that spans the entire range.

Suppose your signal is gradually decreasing in loudness like at the end of a programme. In this example it can be in range from 0 to 15 (4 bits). At the beginning you use the whole range using sixteen coarse steps. It is ok because any tiny variations are masked by loud sound.

In the mid point of our interval the signal is between 0 and 7. With an integer format you can use only eight steps. The ear has adapted to quieter level and could start to discern more detail. Towards the end, the level is between 0 and 1, and the integer format has only 2 steps to work with. This is really bad. Our ear is now the most sensitive. And there are 14 steps on our scale that will never be used.

With floating point you can use additional bits to make a note about the range that current value is within. Let there be 4 ranges. In the beginning the range is 3 and nothing changes, but you have spent 2 extra bits for no gain. But in the mid point the selected range is 2. You still have the same sixteen steps. Near the end, the range is 0, but on our scale between 0..1 there are still sixteen steps. So now we can record 0,0.125,0.25,0.375…

Think about where you can hear noise in a practical example. It will be during quiet sections.

There are digital compression and analogue companding systems that work on a similar principle of adapting to the current level for improving the dynamic range.

Note that floating-point doesn’t permit both quiet and loud sounds to exist at the same time. When we are in range of 0..16, we can only work with whole steps and can’t add 0.25. Signal processing that involves addition of many quiet signals such as reverb still needs a generous number of bits for the intermediate results, but the final output can be more compact.

AnswerCult

How can float numbers of same bit depth as fixed ones give better resolution (SNR in audio) while both can represent same number of unique values?

9 Answers

Search questions

Popular Questions

Latest Answers