https://attackofthefanboy.com/articles/the-quake-iii-algorithm-that-defies-math-explained/
I know next to nothing about programming and physics, but I do know this article intrigues me. I’m mostly interested in how the algorithm “allowed the game to function faster and more smoothly,” and why it’s “seemingly mathematically impossible.”
In: 119
That article kinda misrepresents it.
The algorithm is just a very whacky way to get a very good first guess for a previously well known algorithm (newtons method).
The algorithm takes a shortcut by subtracting a random looking variable from the value in integer form, wich happens to be the “best guess” for the root, not the actual root it has to calculate. This is then used as the starting point of newtons method wich iteratively approaches the real solution from there.
It’s kinda a rule of thumb solution. Like, “if you want to know someones weight, half their height in cm to get the weight in kg”. That’s obviously wrong, but it’s a good starting point for an algorithm that plays “warmer, colder” with the correct number.
~~Modern hardware kinda made this obsolete, back when Quake3 came out there was no dedicated 3D GPU hardware available, so they had do use this workaround basically.~~
>allowed the game to function faster and more smoothly
You need to run this calculation a lot (once per surface, on every frame), so speed is paramount. Traditional division and root calculation is rather slow, while multiplication and addition are much faster on a computer. So by using this wacky algorithm, you get more frames per second with the same hardware.
>seemingly mathematically impossible
It (ab)uses the IEEE 754 format (which is a standard on how floating point numbers are represented in bits) in a way that wasn’t intended. Namely by interpreting the bits in a completely different way, a whole number. From this, you can calculate the logarithm of a number very easily by exploiting the properties of IEEE 754.
If you just look at what the formats are on paper, this seems impossible, because this conversion was never intended to be done in this way.
My attempt at a no-math ELI6 version. It’s long, but most of it is setting up exactly what the problem *is*, so we can get to what the algorithm is actually doing to avoid that problem.
### What we’re trying to do
Quake III is a 3D game. In 3D games, worlds tend to be made up of a patchwork of triangle-shaped flat spots. To calculate convincing lighting when drawing the game world to the screen, the computer has to look at all the triangles currently visible on the screen and run some light-bouncing calculations on each one.
### How computers do math
Computer math is a little different from “normal” math. Computers can’t really solve algebraic equations in quite the same way a human mind does with pencil and paper. (You can program a computer to mimic it, but that’s Very Hard^(tm) to do, so most computers don’t bother.) Instead, a computer has to rely on some ways of going about it that look very roundabout to a human.
Sometimes, it has to go through lots of steps that slowly meander their way towards the answer they are looking for instead of solving it directly. Like, remember long division from gradeschool, and all the steps that took? Something closer to that. Two operations that are like this are calculating square roots, and division (unless it’s division by 2, that one is special and super easy).
Now, computers have the benefit of being Very Fast^(tm). Even if you give them curveballs that need the roundabout algorithms, they’re still going to solve them a lot faster than you can. Even the weakest pocket calculator is using these strategies, and would you want to race one of those in a math exam?
### The problem
Even so, Very Fast^(tm) can still be slow if you bog it down with enough busywork to do. Remember that task of running lighting equations on triangles from earlier? One of those equations happens to involve both division AND a square root. Both of the slow ones at the same time. And the computer has to do it for *every* triangle on the screen. That’s going to chew up a tremendous amount of time. Almost certainly more time than the game has to run smoothly on the hardware it was designed for.
### The solution (the “algorithm that defies math”)
The “algorithm that defies math” (known as the [fast inverse square root](https://en.wikipedia.org/wiki/Fast_inverse_square_root)) is a clever workaround that takes advantage of some really weird and specific quirks in both mathematics itself and the way computers happen to be built. When we do this, we can effectively convert our algorithm involving square root and division into an alternative algorithm that that only has multiplication (which is fast), division by 2 (which, as mentioned before, is a special case and is even faster), and logarithms (which should be *horrendously* slow, but we have a clever trick for that).
One caveat of this new alternative algorithm is that it doesn’t give you a *precisely correct* answer, but after a fairly small number of iterations it gives you a *pretty close* answer, close enough for the lighting engine’s purposes. It’s overall faster than doing the division and square rooting the normal way, meaning we can save computation time. This in turn means the game can run at a faster speed.
### A side tangent about floats
That “clever trick” for computing logarithms comes from the fact that a computer has two major ways of thinking about numbers. One way is only for numbers that have no decimal part (integers), and the other way is for numbers that do have a decimal part (so-called “floats”).
Computers have to handle floats differently, because keeping track of that pesky little decimal point and what is on the left and right of it is not as straightforward as you’d think.
A naive way to go about it would be to store the left and right chunks separately. This means you’d need to reserve spaces in memory to hold two potentially very big numbers. We would call that “fixed point” notation, because you always know where the decimal point is–it’s in between the two chunks. But this method tends to be wasteful in practice, since in most practical situations if one side has a lot of digits in it, the other side tends to not (or they become unimportant anyway). It also means that when you do math with numbers stored this way you have to keep track of two pieces at the same time, which can be cumbersome.
An alternative is to instead set it up in such a way where you cram the entire number into a single chunk, and then keep track of where the decimal point inside the chunk is supposed to go, kind of like [scientific notation](https://en.wikipedia.org/wiki/Scientific_notation). This is called “floating point”, as the decimal point now gets to “float around”, which is where the name “float” comes from.
Computers have special dedicated hardware to transform one number into the other kind (either integers to floats, or floats into integers). This special hardware is Very Very Fast^(tm) because it is baked directly into the physical system that does this and *only* this job.
The magic here is all in the fact that, by a really weird coincidence, if you convert an integer into a float using the special hardware, then try to read that converted data as integer data *without* converting it back, what you essentially get is the logarithm of the number you started with. Which is kind of crazy.
So, whenever you need a logarithm calculated, instead of going about it the super long way, you can just abuse this integer-to-float conversion hardware (which is Very Very Fast^(tm)) as a makeshift “dedicated logarithm calculator”.
Suppose you are tasked with converting speed limit signs from miles per hour (MPH) to kilometers per hour (KPH). Your supervisor tells you, “Don’t worry, just multiply the MPH by 1.609344,” and leaves you to get to work.
You don’t have a calculator.
But – you make the observation that you get close enough to the right answer by adding:
KPH ~ MPH + MPH/2 + MPH/10
All of which is pretty easy to do in your head, and will give you an answer close enough that it won’t matter on a sign.
The inverse square algorithm seeks to get close enough to the right answer by doing quick calculations rather than slow complicated calculations.
That article is very sensationalistic.
It is just a regular optimization tricks and hardly any thing that “defies math”.
Instead of using the “exact” math function (which takes many CPU cycles to perform) one can use an approximation instead which is faster but not as precise.. but precise “enough” for a computer game.
First it gets an approximation of the value and refines this approximation using a single iteration of Newton’s method.
Newton’s method is not as interesting here because it is used in many other situations as well. One thing that is important here is that our initial approximation (that the method receives) must not be totally off because otherwise our result gets worse with each iteration and quickly blows up. For example we want to calculate inverse square root of 123, if we told the method that our initial guess was 1 (which is always a better guess than the number itself), the first iteration of the method would give us back the number -60 which is a negative number and not acceptable for a square root. After the second iteration the value becomes 13283910 and there is no way to recover from that.
The interesting part is the initial approximation. So how could we estimate the inverse square root of a number like 123? Well, the number can equivalently be represented as 1.23 * 10^2. We ignore the 1.23 part and focus on the 10^2 part. We can apply the inverse square root directly on this part. (10^2)^(-0.5) is the same as 10^(2*-0.5) which is 10^(-1). So we multiply the exponent by -0.5 (or divide by -2) to get our estimate. In this case, the result would be 10^-1 = 0.1 which would be our initial approximation. Using Newton’s method then turns it into 0.0885 which is not too far from the expected value of 0.0902. If we run a second iteration of Newton’s method we end up with 0.901. The important part is that getting the (inverse) square root of a number like 100 or 10000 is extremely simple whereas dealing with a number like 1.23 is a huge challenge.
So we can get a decent approximation just by looking at the exponent. And this exponent is essentially the log10 of the number. If you have the number 123 then the log10 is 2.09 where the 2 corresponds to the 10^2 part from before (and the 0.09 corresponds to 1.23 because 10^0.09 is 1.23). We already know what to do with the log10. We divide it by -2 and use that as our new exponent. I.e. -2.09/2 is our new exponent which means the number is 10^(-1.045) and that is exactly the inverse square root.
Having the log of the number basically tells you how many digits (minus 1) it has. Now, the computer works in binary but the idea is the same. The first step is to calculate the log2 of a floating point number (represented as exponent E and mantissa M) and it happens that log2(x) = 1/2^23 * (E*2^23 + M) – 126.9549534 is a good approximation. How they arrived at that approximation is not that important right now; it’s just a mixture of a Taylor series of log(x+1) and some fine tuning in the interesting region of x values.
The important part is that the formula above incorporates both exponent and mantissa (whereas our initial approach only uses the exponent) and it uses them in a very inexpensive way. It is just addition and subtraction and multiplication.
The (E*2^23 + M) is x interpreted as an integer, which we call i. So log2(x) = 1/2^23 * i – 126.9549534. To get the inverse square root in log space, we divide by -2 and get -log2(x)/2 = 126.9549534/2 – 1/2^23 * (i/2).
The last obstacle is to go back from the log2 representation into the (exponent, mantissa) representation. It’s basically just some algebra and they arrive at (exponent, mantissa) = 3/2 * 126.9549534*2^23 – i/2 where the right hand side is calculated as an integer and then interpreted as a floating point number.
Latest Answers