Start a timer with a fixed interval, like a stopwatch or a clock on the wall. Eventually the hands go all the way around. That’s it.
The bug is that you can’t know how many times the hands have gone around. You don’t know if it’s the first time around, the tenth time around, or even the millionth time around.
Computers had a timer that started at 1 January 1970. The timer is a 32-bit number, ticking once per second. When it advances past 3:14:07 on 19 January 2038, that timer loops back around.
The way most computers store time is by counting the number of seconds since Jan 1 1970. But they can only store numbers up to 32 (binary) digits. One of the digits is actually being used to store whether it’s a negative number, so there’s actually only 31 digits. That means computers just eventually run out of room to keep counting.
Fortunately, computers have already moved to 64 bits, so they’ve doubled the number of digits they can use.
It’s 2038 not 2039. It’s similar in a way to the Y2K issue. Time calculation can be very important for computers, for calculating loops, when certain events occur, calendar date, etc. Linux and Unix computers had used a 32-bit value as a counter, specifying the seconds since Jan 1, 1970. It’s actually a signed value, so on 19 January 2038 it goes negative (bit 31 is set), due to the way computers calculate positive/negative numbers.
Back in the 1970’s, the people developing the UNIX computer operating system decided to design the system clock like this:
– Pick a particular date and time in the past (the “epoch”)
– The system clock keeps track of the number of seconds since the epoch.
They picked midnight on January 1, 1970 for the epoch. This is reasonable so far, but the 2038 problem is due to a single bad design decision:
– They used 31 bits to store the number (it’s actually a 32-bit counter, but one bit is used for something else (a sign bit)).
Now with 2 bits, you have 4 possible patterns: 00, 01, 10, 11. You can use these 4 patterns to store the numbers 0-3.
With 3 bits, you have 8 patterns: 000, 001, 010, 011, 100, 101, 110, 111.
With 4 bits, you have 16 patterns: 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111.
Whenever you add a bit, you get twice as many patterns. (Because when you have one more bit, you can do all the patterns you had before starting with an extra 0, then do all the patterns you had before *again* starting with an extra 1.)
With say 5 bits, you don’t have to list all the patterns to know how many there are. You just multiply the number of 4-bit patterns by 2, to give 16×2 = 32 possible patterns for 5 bits. You can also figure this out “from scratch” by multiplying five two’s together, that is 32 = 2x2x2x2x2 = 2^5.
So for 31 bits, you have 2^31 possible patterns, which works out to 2^31 = 2,147,483,648. So after 2,147,483,648 seconds the clock will overflow. We can translate that number of seconds into years as follows:
– There are 60 seconds in a minute.
– There are 60 minutes in an hour, so an hour is 60×60 = 3600 seconds.
– There are 24 hours in a day, so a day is 60x60x24 = 86,400 seconds.
– There are 365 days in a year, so a year is 60x60x24x365 = 31,536,000 seconds.
Dividing 2,147,483,648 by 31,536,000 gives 68.096, meaning the patterns will no longer be correct starting approximately 68 years after January 1, 1970. (A more precise calculation shows the exact time it occurs is when the clock ticks over from 03:14:07 to 03:14:08 on January 19, 2038.)
When they decided to use 31 bits, the original designers of UNIX may have been aware of this problem, but thought “Nobody will still be using our operating system in 68 years.” If that’s what they thought, they would have been wrong: The descendants of the UNIX operating system are very widely used today, especially in servers and mobile devices (including most smartphones, as the phone OS’s of both Apple and Google are descendants of UNIX).
Operating systems are finally [starting to fix the problem](https://en.wikipedia.org/wiki/Year_2038_problem#Implemented_solutions) but this problem is far bigger than just an OS issue, it affects a whole ecosystem, for example:
– You need one fix for your operating system, another fix for your programming language, and a third fix for programs written in that programming language
– If you have a program that saves dates/times in a file, you have to update how it’s stored, but you still want to be able to read older files that were created with a non-updated version.
– If your file has fixed-size records then you’re going to have a big problem because where do you put more bits for the timestamp without making the record bigger and breaking all the code that works with the old record size?
They faced a similar problem when dates rolled over from 1999 to 2000, which turned out to be a non-problem partly because it was easy to explain the problem and convince companies / governments to put resources into fixing it.
With the 2038 problem things are worse:
– It’s harder to explain “We’ll have a problem rolling over from 03:14:07 to 03:14:08 on January 19, 2038” than it is to explain “We’ll have a problem when the year rolls over from 99 to 00”, so it’s harder to convince people that we should be spending resources fixing it.
– We have far more computers and are far more dependent on them than we were in 2000 (and nobody thinks this trend will reverse itself between now and 2038)
– The ease of sharing code over the Internet means a lot of software isn’t self-contained: Your software may have a 2038 problem because you used Bob’s code, and Bob used Charlie’s code, and Charlie used Dan’s code, and Dan’s code has a 2038 bug. It’s not uncommon for a website to have like 1000 different pieces of code from different people.
Latest Answers