I understand that machines fail for numerous physical reasons, however, I’ve never understood how computer or programs that were working fine all along can suddenly crash or break down if there’s no moving parts and the code hasn’t been otherwise recently patched or updated. This has bugged me for over 25 years and it finally occurred to me that I should it.
In: Technology
To add on to the other answers, it is possibly for software that is working correctly, running on an operating system operating correctly, all running on hardware that does not otherwise have any problems to suddenly fail.
It’s rare, but because of the very very very small size of the individual electronic components on an integrated circuit, they can experience interference from cosmic rays. This can cause things like a single bit to flip the wrong way.
It’s possible to engineer software to be tolerant of single bit failures, but it isn’t cheap to do so and is almost never worth the investment for common software products.
Where it is worth the investment is for things like avionics boxes that control weapons release on aircraft. There are government regulations for things like this where a command to launch a weapon cannot be a single bit in a message. It would have to be at least two bits, they have to be opposite, and they have to be on different message words (words are usually 8, 16, or 32 bit sizes) and they can’t be adjacent. There are other reasons for having such regulations, but the general idea is a single isolated bit failure can’t cause a weapon to inadvertently launch (There are usually a lot of other things that prevent weapons from launching, but each step in the process gets this kind of treatment).
Latest Answers