Eli5 Halting Problem


I watched videos and read about it but I still don’t understand why is important and proof of it. Can you please explain me what it is?

In: 60

In computer programs, some run through and end, and others may continue forever.
Imagine you want to make a different computer program, that could determine if any computer program will either stop at a point in time (halt), or continue forever (not halt).

The methodology of proving it is a bit difficult, but in short, **IF** you *could* make such a program (I’ll call it **P**) that correctly determines whether ANY program halts or not, you could make a different program (that has **P** as a part of it), that would trick **P**. This would be a contradiction, as **P** was supposed to predict ANY program, and this one doesn’t.

Therefore, it is IMPOSSIBLE to create a program **P** that correctly predicts if **any given program** either halts or not.

Consider a computer program that looks like:

let i be an integer, starting at 0
while i < 10
i = i + 1

This program will loop 10 times and then end (halt). Now consider another program that looks like:

let i be an integer, starting at 0
let j be an integer, starting at 0
while i < 10
j = j + 1

This program sits in a loop always incrementing j, but it never touches i. It is an infinite loop.

The halting problem poses the question: if you are given a program, how do you determine if it will ever end? There are plenty of programs like the two here where the answer is fairly obvious just from inspection. The challenge, though, is how do you come up with a procedure that will always, 100% of the time, tell you the answer?

What makes this problem interesting is that it is actually unsolvable–and you can prove that fact! No matter how you lay out the process by which you could analyze a program there will always be some programs that your solution gets the wrong answer for. We prove this fact through contradiction, where we assume one thing and then take a bunch of logically sound steps until you arrive at a statement that is clearly false. As long as all of the logic is sound the only remaining conclusion is that the assumption was false.

As a building block here it’s useful to note that programs are themselves just data. At one point in computer science history (at a time where “computer science” was really just a branch of math) that’s a novel concept, but these days people are used to the idea since you can download a program as easily as a gif of a cat jumping into a box and falling over. Programs take in data and emit data, but since a program *is* data you can have a program that takes in a program and tells you something about it.

We take our first step in the proof by assuming the existence of a program that takes in a program and then tells you whether that program would halt or not. Perhaps we call that program HALT(x), which is to say that you pass in program x to the HALT program and it returns True or False based on whether it halts or not. Building from there we consider another program:

if HALT(x):
loop forever

This program also takes a program. It asks HALT if the program will halt, then if that program would halt then LIAR will loop forever. If the program would loop forever then LIAR immediately halts. This leads us to a problem: what happens if you call LIAR(LIAR)? You take the data representing LIAR and pass it into the LIAR program. It takes that data and asks HALT If the program will halt or not. If HALT says it halts then LIAR turns around and loops forever–a contradiction! If HALT says that LIAR will loop forever then LIAR turns around and immediately exits! No matter what HALT is guaranteed to have given the wrong answer when it was presented with this program. This means that no matter how good HALT is at figuring out if a program will loop or not it will *always* give the wrong answer for some cases.

As for why this is important, it would be way more important if things went the other way. Imagine if we could come up with a program that can tell if a program will finish. We could then write a program that amounts to:

let i be an integer, starting at 1
loop forever:
if <some mathematical conjecture is false for i>:
keep looping

If we took this program and fed it into a working HALT function then we’d be able to immediately prove that conjecture–if HALT says the program halts then there must be some value of i that allowed the program to reach the halt statement; if HALT says the program runs forever then there’s no way to reach the halt statement.

Since no such HALT function can exist we’re left with the less useful state of knowing when we should stop looking for a solution. Since we can prove that it’s impossible to find a HALT function that means that if we can show some other problem to be equivalent we immediately know that it’s unsolvable, too. That becomes more powerful thanks to the extremely basic terms in which the computers and programs of the halting problem are defined.

Consider what happens to your computer when you try to overload its poor little brain. Everything slows down, the cursor turns into something spinning, and eventually you get a popup that says ‘I’m broken, want me to stop?’ Sometimes you want it to because it is just not working, but sometimes it just needs a little more time to finish what it’s doing. It would be great if there was a way to tell which is which.

That is the halting problem; can you tell when a computer will ever finish what it’s doing?


The proof:

Imagine that you have written a program that solves the halting problem (I’ll call this the halting program). You can run the program, feed it the program and input of any arbitrary program, and it will output ‘yes’ if the program will ever finish, or ‘no’ if the program will never finish.

Now, you also write a second program (I’ll call this the no program). If you tell this program ‘no’ it does nothing. If you tell this program ‘yes’, then it will do an infinite loop of ’10:PRINT, “Hello” 20:GOTO 10′, and will never finish.

Now you write a third program (I’ll call this the liar program) that calls on the first two programs. First, it runs the halting program, then feeds the output of that into the no program.

Now, you run the liar program, and put its code as its input. This code will first run through the halting program, which will output ‘yes’ or ‘no’. If the halting program outputs ‘no’, then the no program will do nothing, and the liar program will finish its calculations. But, if the liar program finishes its calculations then it has halted, and the halting program should have said ‘yes’. But if the halting program says ‘yes’, then the no program should run forever, so the liar program doesn’t halt, and the halting program should have said ‘no’

Whatever the halting program calculates, the no program makes that output incorrect. Since it is trivially easy to show that the no program exists, that means the halting program can’t exist, and therefore there is no way to tell if a computer will ever finish what it is doing that is always correct.

I’m going to leave the mathematics mostly behind and take you on a descriptive journey.

You leave your house and go to the grocery store. Just before you enter someone walks in in front of you. Where are they going to go?

Do they turn left? Right? Do they freeze and grab a basket? Are they getting a phone call and they stop to answer the phone? How fast are they going to walk?

The scope of what they *could* do at any point is huge, and the Halting Problem basically posits that there will be no simpler way to predict what they do than to have the real live human do whatever they’re going to do, live, right in front of you. It’s easier for the universe to manufacture that exact human on that exact day than to simulate, with certainty, what the person will do.

Let’s move to computers. The Halting Problem says that there’s no better way to predict when a program will end (halt) than just running the program to see what it actually does.

This matters because it means (here comes math again) operating systems can’t predict the behavior of programs running on them in *polynomial time*–basically there’s no straightforward equation to tell you how much longer to wait for any arbitrary process to finish, become responsive, take free resources, etc… that’s better than than just letting it run.

So the best we *can* do is spinning beach balls and task managers that allow you to kill programs and frustrating progress bars that never seen to know how long something will really take. Because math says nothing will be better than letting the thing actually happen and seeing how long it takes.

The Halting Problem takes a program (and it’s input) as it’s input, and tells you whether or not that program (run on the provided input) will eventually stop (halt) or if it will keep running forever (loop). This is what’s called a *decision problem*, meaning that it takes an input and gives you a yes or no answer as it’s output, but that’s less important.

You can just thing of the problem as of a computer program. So solving the Halting Problem really means writing a computer program that will tell you whether or not a program that it gets as input will halt or loop. This part is important, and I feel like it trips many people up: a program is really just text, so it can be an input of another program.

What’s important about the Halting Problem is that it’s undecidable, which means that it’s impossible to write a program that will tell you wether any program will loop or halt. And when I say impossible, I don’t mean we haven’t came up with it yet. I mean we have proven that it’s impossible. Now that on it’s own isn’t that interesting, there are actually many problems that are undecidable, but the Halting Problem is an interesting intro to this, because it’s relatively simple, the sketch of the proof is not that complicated, and, most importantly, it’s a program that we would really like to have (because it would make writing safe code much easier).

The full proof is beyond the scope of this subreddit, there are many videos on it (don’t feel discouraged if you don’t get it on your first try, it gets a little confusing, that’s just how high level math is). But here’s the idea:

Imagine that we can solve the Halting Problem. That means there is some program that will tell us wether a given program will loop or halt, for any program. We’ll call this program the Oracle (this is actually a technical term for this kind of program, which I just find funny). Now, using this oracle, let’s make another program that we’ll call Smith, which will take another program as it’s input, and work like this:

1. It will check if the input program (A), loops or halts, using the Oracle.
2. If A halts, then Smith will loop forever.
3. If A loops, then Smith will halt.

Now this might sound strange, but it is a valid program that you could write on your computer – if you had the oracle.

But here’s the thing. What happens if we give Smith itself as it’s input? Well, let’s say that the Oracle says it loops. But then Smith will halt! What if the Oracle says that it halts? Well, then Smith will loop! The Oracle was wrong in both cases! Which means, that the Oracle can’t exist. This is called a proof by contradiction: we first assumed that something is true, then we found some example that contradicts our initial assumption, which means it must be false. This proof is also very simplified, since it doesn’t take into account that the program also has inputs. But this is the basic idea.