AnswerCult

Question

1.25K viewsJanuary 3, 2024

Question 90.84K May 10, 2021 0 Comments

0.1+0.2==0.3 evaluates to false in most programming languages because the result is somewhat 0.30000004, what’s the reason behind this?

In: Engineering

5 Answers

You are viewing 1 out of 5 answers, click here to view all answers.

Answer 1 · 2021-05-10T22:12:26+00:00

This is a problem introduced by what we call **floating-point numbers**, or simply “**floats**” for short. Another answer here pretty succinctly states what the problem is broadly, I’ll delve into a little more detail about why floats exist in this way (while keeping it relatively simple).

I’m going to assume you’re readily familiar with what an **integer** is in the context of programming and how it’s stored. They’re whole numbers that are allotted some fixed number of bits to represent the number. The amount of bits you give and the exact method they’re encoded in will determine how big or small of numbers you can store. Most computers these days will store a typical “int” as a 32-bit value. This can store any number between 0 and 4,294,967,295. You can optionally choose to sacrifice one of the bits to serve as a negative (creating a “signed” integer), which will halve the amount of positive values you have access to, but then allow you to store just as many negative numbers too.

If you want to store a *fractional* value (not a whole number), one way you could choose to do that would be to store a literal fraction, i.e., a numerator and a denominator. 1 and 10 to represent 1/10, for example. If you do it this way, the error you spot would never occur as long as neither piece overflowed in any way. One problem with something like this is that implementations of this approach tend to be very wasteful of bits… imagine using two 32-bit integers to store a single fraction. You now are taking double the bits to store a fractional value than you are an integer value. You could cut each in half to make the composite the same size, but that vastly reduces the values that your numerator and denominator can be. These implementations also tend to be very inefficient in computations, since the computer has to keep track of which one is the numerator and which one is the denominator, and probably has to do a lot of register shuffling to keep the two from getting mixed up when performing math operations.

To solve both of these, **floats** were created as a compromise. They have a fixed number of bits, usually 32 (64 bit floats exist and are often called “doubles”, meaning “double-precision”). The bits are essentially split into two pieces, the “base” and the “mantissa”. The “base” is, more or less, just a whole number that uses *only* the minimum number of bits necessary to describe its value. All other bits are assigned to the mantissa, which for the purposes of this ELI5 can simply be thought of as “the magic decimal garbage”. **NOTE** that the actual way the number is stored in memory isn’t actually an int followed by extra garbage that just denotes the fractional piece; that would still give us the computational speed problem. The real implementation is far more convoluted than that, but thinking of it this way for now makes it simpler to digest.

The *float* type gets its name because of this key fact that the division line between the two pieces “floats” around, not existing in any one specific place but instead varying based on what number it’s actually storing. If the base is a large number, it will eat more bits and push the boundary over; if it’s a very small number, it will eat fewer bits and the boundary will move the other way. The more bits the mantissa is given, the more precise fractions it can describe. This yields a pretty elegant system where numbers that are very close to 0 (and thus, have very small bases and big mantissas) can hold very precise fractions, while numbers that are extremely large (huge bases and tiny mantissas) start to lose out on precision. Since, as describe before, floats aren’t actually int + decimal, this trading of bases for precision can even be completely inverted, where whole numbers themselves start being skipped over. This makes floats useless for counting at very high values, but allow them to store way, WAY larger values than integers can if you don’t care about being super duper accurate.

The result of this compromise is that the float can only store a finite number of fractions between any two whole numbers. These fractions are more or less on equal intervals where the interval size is determined by the number of bits that the mantissa has access to. This creates granular “steps” to decimals between numbers. Anything that comes up in math operations that doesn’t land neatly on one of those steps will get rounded to the closest available step. **Your error happens because the fraction `1/10` does NOT land on one of these little steps.** It gets rounded to something very close to, but *not quite equal to* 1/10. If this was a calculation with very messy decimals and you only care about the answer precise to two or three decimal places this wouldn’t matter, but if you are trying to add two numbers like this together and you’re expecting a clean, precise output, you will quickly notice the tiny errors building up.

AnswerCult

Why isn’t 0.1+0.2=0.3 in most programming languages.

5 Answers

Search questions

Popular Questions

Latest Answers