Why can’t computers calculate decimals as floats properly?

317 views

Why can’t computers calculate decimals as floats properly?

In: 2

9 Answers

Anonymous 0 Comments

The basic version is that it’s really tricky to represent decimals in binary, so we use approximations. Look up IEEE754 if you are curious about the inner workings, but that is far too complicated of a beast to describe here.

Anonymous 0 Comments

Computers work in binary, also known as base-2. So representing 1/2, 1/4 or even 3/64 can be done very accurately.

Problem is 10 is five times two (10 = 5 * 2). So to represent 1/10 accurately you need to store data in base-10. Any other base cannot store this fraction correctly.

To tag along what /r/pocok5 said – if a computer operated in trinary (base-3) it could represent 1/3 perfectly.

Anonymous 0 Comments

Most numbers are not possible to represent in a finite number of digits in the decimal form. 1/3 = 0.33333333333333333333333… repeating forever. In fact, there are a lot of (infinite amount) numbers that are such that they don’t have a regular pattern repeating, such as pi. For those numbers you would need infinite memory to write them down accurately, and since infinite memory cannot exist in real life, you need to compromise hard between being able to write numbers to a limited accuracy and buying a RAM stick the size of the planet to run Windows Calculator.

Anonymous 0 Comments

You can, we just have to understand what “properly” means in this context. In order to represent a larger span of numbers with the same amount of bit space you have to sacrifice accuracy. This means that, in a sense, each float number actually represents a *range* of numbers, not just a single number. By that I mean there are many numbers which map to the same specific float representation.

So when you give the computer a number then tell it represent it as a float, then do some arithmetic (or even just try to get back your original number) the answer you get from your computer is going to be “wrong.” But it’s not really wrong, it’s right according to the agreed upon protocol for how float numbers are supposed to work. The computer is doing everything properly it is just you have used an inappropriate format to get the answer you want.

When you choose to use float, you must accept that you will lose this level of accuracy for whatever operations you wish to perform and either account for that in other ways or use a different number format.

Anonymous 0 Comments

Decimals are coming you have a number of base 10. They are ways to represent fractions in a decimal number system 0.1 = 1/10 and 0.01 = 1/100 so a multiple of 1/10^n

The standard floating point number computer use is binary numbers and uses a binary fractions, not decimals. So the first number has a value of 1/2 the next of 1/(2*2) = 1/4 and so on that is a multiple of 1/2^n. The problem is 1/10 is not the sum finite amount of 1/2^n fraction so it can’t be exactly represented by a binary floating point number.

If I am not misstating so can all binary fractions be exactly represented by decimal fractions exactly. The reason is 1/2 = 0.5 and all other of them will be 0.5 multiplied by itself.

It is not because base 10 is special. 10 can be evenly divided by 5 and 2. That means bases 5 and 2 work fine. So do bases 4 and 8 because they are 2 x 2 and 2 x 2 x 2 and any other number that just has the prime factors 2 and 5. This means base 3 6 7 9 11 12 13 14 17 and so on have fractions that not can be exactly represented with decimal number. A simple example is 1/3= 0.33333333 and so on forever. But in base 3 it is exactly 0.1

So because 10 have the prime factors 2 and 5 but 2 only have 2 there are decimal fractions that can be exactly be represented by a binary fraction.

That said you can have a floating point number with base 10. Scientific notation is just that so 1.25 * 10^3 is a footing point number equal to 1250. 1.1 * 10 ^-2 =0.11 You can do calculations like that on a computer too there is just not a dedicated hard wart to it and you need do the maths with regular instruction. It can be done and you can fide libraries that do that like https://github.com/libdfp/libdfp

Anonymous 0 Comments

A lot of the issues comes from the base conversion.
Imagine instead of converting from base 10 to base 2, we were converting to base 9.

Converting integers is easy, e.g. 11 becomes 12, 23 becomes 25 etc. Fractions is harder though. 0.5==1/2 becomes 0.45, 1/3 becomes 0.3, but what should 0.3 become? The best you can do is approximate.

The same is happening in the conversion of our decimal numbers to binary, we have to come up with an approximation. It’s this approximation that introduces the errors you see

Anonymous 0 Comments

They actually can, if they are programmed to do so. But they don’t do this “natively” – so to speak. There are certain libraries that handle what we call “arbitrary precision arithmetic” and they can do exactly what you describe, but at a very hefty performance cost. (Warning, this post will be very long. But I’ll try to explain exactly why).

The reason that arbitrary precision arithmetic isn’t really the default way we handle decimals is because it’s not the “language” that the CPU speaks. The computer only innately understands binary, and so if we’re trying to do things related to base-10 fractional numbers, we have to find ways of trying to decide how to represent it in binary first. Luckily, this isn’t particularly complicated for whole numbers. We can easily calculate what any number’s binary equivalent will be, and vice-versa back to base-10. But what about decimals?

This is where things get way more complicated. Let’s take the number 15.1. How do we represent this in binary? Do we make a number like 00001111.00000001? If we decide to represent it this way, how would we differentiate between 15.1 and, say, 15.01? Do we also make that number 00001111.00000001? How do we tell the difference between these two?

As you can tell, it creates a problem. Where do we put the decimal point? It’s totally ambiguous what this number could mean because we have no idea how many 0s are supposed to be after the decimal point. And in fact, our problems don’t just end there. Our base 10 number used 10 digits (0 to 9), but we had a separate symbol to represent the decimal point, so we really have 11 symbols. The computer only has two “symbols” for binary, and those are 1 and 0. They represent either a wire being on or off. We would need some kind of tri-state transistor that could take three symbols to even represent the decimal point at all, and CPUs don’t use these, they do everything in pure binary. So, as you can see, we have multiple problems to solve here.

**To solve this (and to represent fractional numbers properly in binary), we’re going to need to create some kind of convention/format that we can agree on.** One primitive way to start would be to basically do what we, as humans, do when we’re solving arithmetic on paper, and just split everything up one digit at a time. Let’s assume that we want to convert decimal to binary like this, and since we only have digits 0 through 9 to worry about, we only really NEED 4 bits (a total of 16 combinations) to represent this. This leaves us some extra ones, but luckily, this is a good thing since we need to pick one to signify our decimal point too. I’m going to pick 1111 to signify our totally arbitrarily decided “binary decimal point” – so anywhere you see 1111 below, it’s a decimal point.

15.1 becomes: 0001 0101 1111 0001

15:01 becomes 0001 0101 1111 0000 0001

1000.16 becomes: 0001 0000 0000 0000 1111 0001 0110

23,598.776 becomes 0010 0011 0101 1001 1000 1111 0111 0111 0110

*(Remember, we decided to make 1111 signify our decimal point. We decided this arbitrarily, we could have made it anything that wasn’t already taken, but 1111 is convenient because it’s easy to remember.)*

**This would work, but there is still a problem: This is hugely inefficient. We just created a massive number of additional calculations we have to do.**

Think about it. We’ve already used this many digits just for small numbers, and if we were to try to add it up, we would have to do tons of little calculations on each digit (just like we do as humans on a sheet of paper). And this would be become monstrously expensive with larger numbers, because we’ve effectively “wasted” a lot of bits doing this. *As you can probably see, even though our convention makes intuitive sense to us, it’s not really the most efficient way for the CPU to handle things since we’re having to create a totally abstract number system, translate to it and process it, then translate back.*

So, to get around this, we came up with a different language/format for representing fractional numbers in the 1980s, and this was called IEEE 754. It involves a totally different way of representing the numbers that uses scientific notation instead. Rather than inefficiently trying to force a base 10 representation of the digits and performing calculations on them individually, we can now perform calculations on the entire number in binary and do it all at once.

IEEE 754 basically has three parts: A sign (the very first bit that tells us whether it’s positive or negative), another 8 bits that signify an exponent, and 23 bits (called the “mantissa”) that signify the fraction following the decimal. We can perform calculations on this “in native binary” rather than breaking things up into arbitrary decimal-like representations, and it makes calculations FAR more efficient for the CPU and saves us a lot of hassle and time.

*You might be wondering why we represent every number using just 32 bits, but believe it or not, we can actually handle an incredibly wide range of numbers using just these 32 bits. We can represent small numbers with very decent precision like this, and if we’re willing to tolerate a little bit of a larger margin of error, we can represent huge numbers too (Numbers with hundreds of digits? No problem. Powerball jackpot multiplied by the number of atoms in the observable universe? Coming your way, plus or minus a few dollars.). We also have 64 bit floats that are known as “double precision” floats, and these have more bits and can be more precise when we need them to be.*

**IEEE 754 became standardized in the 80s, and now nearly all CPUs (and even most microcontrollers) now have dedicated hardware and CPU instructions that can process them.** They are far more efficient, way easier to calculate in binary, and are usually more than precise enough for most of what we do (and even give us tons of extra flexibility to do things we couldn’t do in ordinary binary, such as represent numbers with hundreds of digits with only 32 bits). Even though there are a huge list of reasons for why it was adopted, there are still exceptions for where it isn’t quite practical (bank transactions are a perfect example). We have arbitrary precision arithmetic libraries to handle these kinds of situations, but they are inefficient enough that we don’t usually use these by default.

Anonymous 0 Comments

They can! You can write software to do it, using the operations supported by the computer.

There are two big differences between “proper” decimal numbers and floats: one is that floats work with binary numbers rather than decimal (base-10). That’s just a matter of building the right hardware, and we *could* certainly make computers which operated directly on base-10 numbers. It has been proposed before, but it hasn’t caught on.

The *other* difference is trickier. Computer instructions work with fixed-length data. Add a 64-bit number to a 64-bit number, for example.

And true decimals might have any length. 1.6345454234123424534524234 or 456347772234424767435234255634234235234 or whatever number you can think of. It might not fit into any specific length. So the computer can’t handle those natively in hardware. But you can still write software to deal with them.

Anonymous 0 Comments

Since you understand enough to know what a float is, you should read a paper called

what every computer scientist should know about floating-point arithmetic