If you have anything that runs for at least days, you want to design your program to have “saves” or checkpoints, etc.
For an eli5 example, let’s say you are searching for really big prime via brute force, and sequentially. You could simply memorize the last number you check. And then continue the search from there.
It depends on how they have designed the computer and written the calculations. But it is not hard to imagine that they keep a note of where they are in the calculation as they go along. So if something happens they can always just go back to the stored data and pick right back up again where they left off. You do the same yourself. If you are doing some long calculations you probably keep a pencil in your hand to write down the numbers as you go along. If you get distracted with something or even if you just get tired and want to go to sleep you can pick up again where you lef off. You might need to go back a few steps until the last number you wrote down but in the grand scheme of things this does not matter that much.
You can’t. Unless your entire systems supports hot swapping and has redundencies. For instance servers usually have hard drives that are hot swappable, meaning can be replaced while the machine is still running. Some external GPUs I believe can also be hot swapped, I have yet to hear about CPUs and RAMs though.
Edit: to answer your second question: We usually design software in such way that it can either be safely stopped and restarted or more commonly use the divide and conquer method and write software that works with smaller chunks of data to add to bigger collective and can therefore be stopped before proceeding to solve the next problem.
Latest Answers