Eli5 When applications crash, why do they provide an obscure error code instead of a small description of what the issue actually is?


Eli5 When applications crash, why do they provide an obscure error code instead of a small description of what the issue actually is?

In: Technology

The error code can let the developer know exactly what the error is.

Using a small description of the error isn’t useful to anyone and just creates more confusion. It’s also more work for the developer since they would need to include descriptions for hundreds of error codes.

If it’s an extremely common error, they often do include descriptions. For instance, open a file on your computer, then try to cut-paste it somewhere else. Windows will specifically tell you that it can’t do that operation because the file is currently in use.

However, if the error is uncommon or if they developers don’t think that the average user will be able to resolve it themselves, then it’s better to have an obscure error code that is extremely descriptive for the technical support team rather than a generic description of the problem that the user can’t solve on their own and isn’t descriptive enough for technical support to immediately diagnose the issue.

Of course it depends on the application. Some do intentionally write out an obscure code. However most of the time the error message is a perfect description of what the problem is but is written for the developer who have the code in front of them. Many error messages are actually from third party libraries and tools used to make the application and have no idea what the application actually does so it can not give a description that any user can understand.

If the application crashed, it means that something unpredictable by the developer happened. Therefore, the description of the error that occurred cannot be described briefly, since the developer needs a full description to fix the bug. For developers, this description is quite understandable.

imagine you write a letter to your parents, congratulating their birthday. you send it to the post office, you put the usual stamp on it, like every year. but the post office had increased prices, your letter is underpaid and gets returned to you. Now imagine the perspective from your parents: they don’t get the annual letter. They might get confused, and call you if everything is okay with you.

You won’t expect from your parents an accurate description of what happened, how should they know what happened.
You also wont expect an accurate description from the post office, that your birthday greetings could not be delivered. You would expect a message your letter was underpaid – and it would be up to you to figure out this prevented your parents to getting the card.

If a program crashes – its a bug.

Bugs are unanticipated behaviour – if the developer realised what the problem was during development they would have designed that issue to not fail or fail gracefully.

Therefore there is rarely any end user useful information that can be conveyed in a bug message. If the program crashed because the file type you tried to read in was wrong – well unless the developer wrote the code to check that, they aren’t going to be able to give you a helpful error message. If they did write the code to check the file type – then they probably would have prevented the crash in the first place!

A lot of errors are really buried in the details of the code and unless you have intimate knowledge of the software they won’t help. If you got an error message:

IndexError: list index out of range

What do you do with that information?

Also a lot of software is compiled, so the actual machine code ran by your computer is difficult to directly link to the source code. The error message will at least be some sort of reference to tell the developer what function failed.

Finally the developer may not want to post full stack traces of errors, since that may reveal too much about the inner workings of the software.

A husband and wife are packing for a trip. The husband goes into the bathroom to pack his toiletries and the wife asks him to grab hers too while he’s in there. Now consider these two possible scenarios:

1. While the husband is gathering everything, he accidentally drops her toothbrush into the toilet.
2. When he arrives in the bathroom, he finds her toothbrush already in the toilet.

In both cases, he has to report back to his wife that her toothbrush is in the toilet and that she’ll need a new one. In the first scenario, the husband knows exactly how it got there so he can fully describe what went wrong and he can probably figure out what to do next time so it doesn’t happen again.

But in the second scenario, the toothbrush was already in the toilet so he doesn’t know exactly what happened. All he knows is that *something* happened and the toothbrush being in the toilet was the end result. A google search suggests that a child or pet is likely responsible for it getting there, but that’s just the most common reason other people have identified for why their toothbrush is in the toilet. There’s no guarantee that that’s what happened to this toothbrush.

Error codes in software are much the same. Often times, the program only knows about the final state that includes the error, but doesn’t know enough about how it got into that state to provide an accurate description of what happened to the user. The error codes are designed to describe that error state and the people can take that (and sometimes other diagnostic information, such as logs) and reverse engineer what happened. They then share the root causes online so the next time someone googles that error code, they can see a list of previous known causes and try those solutions first. But there’s never any guarantee that the solutions Google finds will work for any given instance of an error code because every instance is different.

The problem is that the thing doing the reporting doesn’t know exactly what caused the chain of events.

Let’s say you have a video game that allocates some memory to store each enemy’s hit points. When an enemy’s shot, it loads the hit points from the memory location belonging to a particular enemy, subtracts 1 HP of damage, and stores the new hit point number in the memory location. Except if the damage was lethal, it instead de-allocates the memory for that particular enemy’s hit points.

It’s a simple and common design for this type of thing.

Now whenever a program accesses memory, the OS checks for access to memory that’s been de-allocated (or was never properly allocated in the first place). All the OS “knows” is that an instruction attempted to access memory at address 93441720. So the error message “Process coolgame.exe invalid memory access at address 93441720” is literally all the OS knows about the error.

So what actually happened? Well in this particular level, there’s a helper character that fires shots at enemies in addition to the player. The helper character’s shot hit the enemy, killing it. Then the game processed the player’s shot, which also killed the enemy on the same frame, and attempted to decrease the enemy’s hit points by 1 — but the memory storing the enemy’s hit points was already de-allocated.

You should know that the OS’s “de-allocation detector” is very coarse-grained. Because of the way the CPU design works, it can only flag 4000-byte regions as “allocated” or “unallocated.” And there’s also some time delay involved; the OS has to run some bookkeeping to determine when regions become allocated / unallocated, so regions that become unallocated are only flagged when the OS bookkeeping code runs.

So an incredibly specific sequence of events has to occur to cause this crash:

– You have to be playing in a particular part of a particular level for the helper character to show up
– You and the helper character have to shoot at the same time
– Both shots have to hit the same enemy on the same frame
– The enemy has to have exactly one hit point remaining so the first shot is lethal
– No other enemy data is located in the same 4000-byte region as the dying enemy, making it possible for the OS to enable the “unallocated” flag for the whole 4000-byte region
– The OS’s unallocated-region bookkeeping code happens to run in between processing the first and second shot, causing it to actually flag the region as unallocated so the game crashes

Again, the OS, which is the thing that’s giving you the error message, has no idea about any of this. It doesn’t know that the memory is used to store enemy hit point data, or anything about the sequence of events that caused the program de-allocate that memory. Heck, the OS doesn’t even know your program is a game. All the OS knows is that the program attempted to access memory address 93441720, which was not properly allocated at the time of the access attempt.

Instead of having the OS terminate the program, the developer could write their own code to check for invalid memory access. But the developer can’t make much better error messages.

You see, in order to write the error message and the code to trigger it, the developer would have had to realize it’s possible for damage to be applied to an enemy whose hit point memory has been de-allocated.

But of course, if the developer had this knowledge, they wouldn’t write the detailed error message! Instead they’d simply fix the bug, and you’d never see the crash, because the developer fixed it before you got the game.