Pretty much the title.
I understand this is a law but can there be a reason why this happens?
For background: for almost all real life data like population, GDP and other real word stats, probability of it having 1 is like almost 30% which keeps decreasing with 9 to be least probable.
But why this happens, is this just a fascinating pattern in randomness?
In: 130
Say you are looking at the average household salary of a population. If most of the range is between 40k and 200k then 100k out of 160k of the salaries start with a 1. Thats basically what causes Benfords law, when you cross an order of magnitude half your values will start with a 1 until you get up to the 2s.
An important distinction is that this NOT true for _any_ data. Benfords law is specifically about numbers spanning several orders of magnitude.
The reason it works is that in a range spaning a few orders of magnitude, there simply tends to be more numbers starting with 1.
For example from 0 to 200 there are 121 numbers that start with 1, and only 12 that start with 2 and 11 for the digits 3-9.
And as you increase the range to include more Hundreds, it starts to equalise, and if you looked at the numbers frok 0 to 999 it _would_ be distributed equally, but in real life you won’t have _exactlyy_ numbers from 0 to 999 in a data set. It’ll be a rough range, and as soon as you start getting into the 1000s you start having more number 1s again, so any data set that doesn’t have a strictly defined cut off and spans several orders of magnitude is more likely to contain more numbers starting with 1 than any other.
It works when the numbers are spread equally *logarithmically*. For example, consider a set of numbers equally distributed in the range 1..6. Now look at a second set of numbers formed by raising 10 to the power of each of those first set numbers. You now have a set of numbers in the range 10 .. 1,000,000, but they’re not evenly distributed. You’ll find that about a sixth of them are in the range 10 . .99, a sixth in the range 100,000 .. 999,999, and so on. When you look at the first digits of those numbers, they obey Bentford’s law.
I had it explained like this:
To move from 1 to 2, you have to go up by 100%
To move from 2 to 3, only 50%
From 3 to 4 33%
…
from 10 to 20? 100%
20 to 30? same thing.
It takes more relative movement to go from lower starting numbers to the next number, than it does from Higher starting numbers, to the next number. So you will tend to spend more time in the lower numbers.
Imagine a statistic that grows by 10% every year. Let’s say the statistic starts at 100,000 in the first year.
In the second year, it will be 110,000.
Third year, it will be 121,000. Fourth year; 133,100. Fifth year; 146,410. Sixth year; 161,051. Seventh year; 177,156. Eighth year; 194,872.
Finally, having spent 8 years in the “starts with 1” category, it will graduate to the “starts with 2” category in the 9th year, hitting 214,359, and it will stay in the “starts with 2” category for just three more years, until the 13th year, when it hits 313,843.
The “starts with 3 category” lasts for three total years.
Then the categories for 4, 5, and 6 last for two years each.
Some years later, the statistic in year 24 sits at 895,430, and will grow to 984,973 in year 25, which is the only year that the statistic spends in the “starts with 9” category, until it grows into the millions the following year.
Once it is in the millions, it will again start with 1 from year 26 (at 1,083,471) to year 32 (at 1,919,434).
So for statistics that grow at a natural rate, they tend to spend more time in the “starts with 1” category, because that’s where they grow the slowest. Once they reach the other “starts with *x*” categories, they grow faster and faster, thus spending less time in those categories.
I always thought of the hammer game where you try to ring the bell. Data is made from something happening so you start at one first then go through all the numbers before reaching one again when you start a new digit. So imagine the hammer game you hit it your more likely to start with a on than a nine because you have to go through one to get to nine and if you do get through nine you go right back to one again.
For many things, units are arbitrary. What I mean by that is that, for example, there is no particular reason why we measure lengths in metres. We could use *any* length as a base for units. If there is a “normal” way first digits are distributed, then that distribution shouldn’t change if change all our measurements to he based on another unit.
Eg let’s say we are measuring the lengths of something which can vary by many orders of magnitudes – eg the lengths of rivers. If we measure them and express their lengths in metres, we can look at the distribution of leading digits. But we can then convert them to any other unit of measurement, by multiplying all the lengths by some factor. So if we double all the measurements, then we are expressing the lengths in units of “half-a-metre”.
Now, let’s see which numbers in the first set of measurements map to numbers beginning with 1 in the second set of measurements. Every number beginning with 5,6,7,8 or 9 will map to a number beginning with 1.
So if there is a standard distribution of first digits, 1 must appear as a first digit equally as often as 5,6,7,8 and 9 combined!
By considering other multiples, we can find the distribution which is impervious to multiplication by a factor, and that is the one given by Benford’s Law. To be precise, 1 appears in a proportion of log(2/1), 2 with a proportion of log(3/2), 3 with a proportion log(4/3) etc – where all the logs are done base 10.
We can, in fact extend Benford’s law to other bases, by using the same formulas but changing the base of the logarithms.
Latest Answers