Why can floating point store more values than integers?

91 views
0

In a 32-bit floating point, it was said that the highest possible value is 3.4028235 x 10^38. However, when we evaluate this, it will be equal to 340282346638528860000000000000000000000. This whole number would require more than 100 integer bits right? My question is: If that is the case, how come this number requiring more than 100 bits fitted in a 32-bit floating point?

In: 3

The same way you were able to write 3.4028235 x 10^38 instead of 340282346638528860000000000000000000000, a floating point number stores both the first n digits of the number and the exponent, saving a ton of space. The exact way it does this is a bit complicated and I forget the details, but it’s explained well [here](https://youtu.be/dQhj5RGtag0) if you want to get into it

The problem is… see all those zeros on the end of your integer value? Those digits aren’t stored in the floating point. If you add 1 to your floating point, the number won’t change at all – it’ll just disappear as a rounding error.

Floating point point numbers have a wide *range* having a large number of possible digits left and right of the decimal point, but they lack *precision*, only storing those ~7 digits and then filling in the extra spaces with zeroes. The fact that they are binary may result in what looks like higher precisions when you try to write them out in base 10, but they’re not. Past 7 digits, anything you see is effectively noise.

Or alternatively, “3.4028235 x 10^38” is literally (well, in binary) how your number is stored with only that much room to work with.

In floating point the number is stored in a similar fashion to the scientific notation example you gave, only that the powers are in binary not decimal. There are certain number of bits for the mantissa (23 in single precision), the remaining bits for the exponent (8) and one for the sign. This allows a wide range of orders of magnitude to be encoded in the same list of numbers. This advantage is lost when converted to an integer.

A 32-bit float can actually store fewer values than a 32-bit integer. The floating point values cover a much larger range but many of the possible values are exactly equal to other values. Most obviously, floating point has negative zero.

It can’t store more values, it’s just that the values it can store are distributed differently. Within 8,388,608 of 0, 32 bit floating point numbers are packed closer than 1 apart. More than 16,777,215 away from 0, they are more than 1 apart.

That ‘more than 1 apart’ is what allows them to store much bigger numbers than integers of the same size. The *difference* between the largest 32 bit floating point number and the second largest has 31 digits when written in decimal so they get *really* spread out.

If you include positive and negative infinity, positive and negative zero, and positive and negative “not a number” all as separate values, there are the same number of 32 bit floating point numbers as there are 32 bit integers.