How are seemingly random events predicted with such a high degree of probability?

546 viewsMathematicsOther

Main question: Why does a particular highway have roughly the same number of automobile-related deaths every year if a car crash is unexpected and dependent on the actions of individual drivers?

More detail: in Tennessee, they have those electronic signs on the interstate that occasionally show the number of roadway fatalities in the state for the current year and beneath that it will show the previous year. The numbers are almost always close, within a reasonable margin of error and accounting for both slightly more drivers on the road each year fur to population growth.

Having been in a serious car crash, I understand the seemingly arbitrary way a sequence of events can play out depending on multiple factors such as driver awareness, vehicular dependency, roadway conditions, weather, etc. Even a highly skilled and fully focused driver in a perfectly functional car on a road with no issues can be involved in a crash, say due to a different driver’s condition, a deer running out into the road, or a sudden gust of wind blowing a newspaper onto their windshield and obscuring their vision for a few seconds.

This may by a multi-part question, but it’s been in the back of my mind for years now and I woke up today wondering about radioactive decay so I started reading about isotope decay. That’s not really what this question is about though, but the seemingly random probability of an isotope decaying in an independent manner (not related to the actions of any other isotopes nearby) made me kind of connect these two ideas and start to wonder about it again.

So getting back to the roadway fatalities, in this case for the state of Tennessee, I found the 20 year statistical data at the site linked below. It shows some ups and downs over the years, presumably with the reductions coming from improved safety features in cars, yet the total number for 2020 (1,221) is closer to the total for 2001 (1,251) than it is to the total for even the previous year 2019 (1,148).

So despite population growth and enhanced safety features, we are kind of right back where we started, or rather, where we’ve always been.

I could also expand this question to cover other events that should ideally never occur, such as murder. Why does something as abominable and world-shattering like murder, at least from an individual perspective, happen with roughly the same frequency and rate when looking at a large sample size? Shouldn’t something like that be the exception and not the norm? Is it somehow related to density?

This has me wondering about probability, fate, design, and all sorts of things both rational and irrational.

Anyway, thanks for reading this. Even if nobody responds, I think it’s helped to just get it out in writing for the next time I think about this in a few months.

TLDR: Why do independent actions and events that deviate from the norm happen with almost certain predictability?

In: Mathematics

10 Answers

Anonymous 0 Comments

It’s known formally as the law of large numbers. A large number of random events will always converge to its true probability.

It’s easiest to talk of a coin toss for simplicity. What is the true probability of a coin toss? 50.000000000…% heads. We’ll assume a fair coin, not funny business. Can you predict a coin toss? Absolutely not. There are way too many factors going on. It’s very complicated, and even the slight deviation can cause weird bounces that drastically change what happens. Either way, it’s still going to result in a heads or tails. You’re not predicting the a 40 year old male, left handed, flips it with his thumb, coin flies 2m into the air, spin 20 times about the North West axis at 3.6% tilt, hit the table at 30°, bounce east, flies 0.5 m sideways, slides off the table, falls 1.5 m, lands on its spine with 1m/s velocity, rolls 3m with a 5m radius curve to it, and then falls on its heads side, and it’s a Mexican peso from 2015 and the bird eating snake side is what you decided was heads. Those details don’t matter. It’s heads.

Toss it once. It’s either heads or tails. Let pretend each of these is its own timeline, if you’re in one of them, that is all you see. You don’t know about the other cases. So we have 100% heads, or 0% heads as timelines. So far, not looking good. No where near the true probability for either of the two possible timelines. Tossing a coin once is not a good way to find out it’s true probability.

Toss is two times. What are our options? HH, HT, TH, and TT. We’ll, we could have 0% heads, 100% heads or 50% heads. So now 2/4 times lines have converged to the true probability. Looking better. And were only at 2 tosses.

Toss it four times. What are our options? Well, there’s sixteen of them. I’m not writing them out, but the 50% cases are HHTT, HTHT, HTTH, THHT, THTH, TTHH. So 6 of 16 are the 50%, or 38% of timelines. It’s the most common, but it dropped from 50% past time. However, there a lot of cases that aren’t 50%, but aren’t as bad as 0% or 100%. HTTT for example is 25% heads. The 0% and 100% cases are now only 2 of 16.

Toss a coin 1 million times, and the number of timelines with 50% (with rounding) is going to be astronomically high, well over 99.9999% of timelines. The 0% heads case is going to be 1 in 2^1000000. That’s an absurdly large number. It’s like 300,000 digits long if you were to write it out.

So, onto highways. Yes, it’s very complicated. Even more than a coin toss. There’s not way to predict a crash. However, you’ve narrowed all those factors down into two outcomes just like a coin toss. Fatality or no fatality.

Well, clearly the odds of crashing and dying are way less than a coin toss. If they were 50%, we wouldn’t do it. It’s about 1.3 per 100 million vehicle miles in the US. So driving one mile and trying to estimate the probability of drying is clearly not going to get you even close to finding the true probability. However, there are a LOT of miles driven in the US. It’s about 3 trillion a year. So yes, you’re converging to meaningful and consistent probability that is not lost in noise. It doesn’t matter how complex any given crash is, enough miles are driven that the true probability is being approached. This probability hides all the complexities that can’t be predicted, just like probability of a coin toss hides how hard it is to predict a given coin toss.

For a given highway, the data is going to be less strong, as you have less numbers, so will jump around more per year. Just like tossing a coin 10 times has a real chance of not giving you close to 50%. The highway in question is probably not exactly 1.3 per 100 million vehicle miles, and you might see swings of tens of percent each year. The busier the road, the more consistent the data will be, as you have more numbers. Same with things like murder rates. The murder rate for a small island country is always going to be really good, or really bad, as one random murder swings the numbers. Murder rates for US or Brazil, the stat is going to be stable. (And changes over time likely to reflect actual changes to the true probability). For radioactive decay, the 2^1000000 I had for 1 million coin tosses is a tiny child’s play type of number. You’ll never not get the half life. It’s just so laughably improbable, the numbers are mind-boggling.

Anonymous 0 Comments

Pick a random event. Does not matter if it is one atom that is to decay or an auto accident on a specific section of road.  Now out of 100,000 atoms, or 100,000 autos passing that section of road. You can never identify the exact auto that will have an accident, or the exact atom to decay.

However, to say out of 100,000 cars that pass this section of road 2 will have an accident is possible.  The same with atoms.  You never know which auto it is, or which atom.

Your conflicting picking the exact event against the many potential events.

Anonymous 0 Comments

These are just examples of the law of large numbers. If you have a large number of trials in which an event can occur with a given probability, the fraction of trials in which the event occurs will be close to that probability. The larger the number of trials, the closer it will be. More generally, if you keep repeating a measurement and the results are independent and always follow the same probability distribution, the average result will approach the mean of that probability distribution as you take more and more measurements.

With radioactive decay, you’re often dealing with unimaginably large numbers of individual decays, so the results often effectively behave as if they are completely deterministic. That’s why plots showing radioactive decay often have nice smooth curves instead of random jumps.

Anonymous 0 Comments

The simplest way to visualize probabilities is probably to think of dice. If you have a 100 sided die, predicting a specific outcome is really hard. But if it’s weighted correctly by definition if you roll it a million times the each side will come up approximately 10000 times.

Now something like a fatal car crash could be modelled as a complicated chain of dice rolls.

One die for the speed of the driver, one for the reaction time of another, one for mechanical integrity of the cars etc. Now these dice are not weighted uniformly i.e. most people will be driving the sameish tempo, but if all of these things have some defined probability there also is a combined probability for all the combinations of those things which result in a fatal crash.

Given the number of driver interactions on a big roadway is really big, a lot of those chains of dice rolls will even out to something predictable.

Anonymous 0 Comments

Recently someone asked whether you could squish bacteria on a flat surface, and the answer was that even the flattest surface would appear craggy up close, with lots of places to hide. This is the opposite of that question.
The little quirks that cause one event may seem so special and important, but if you pull back far enough, it all flattens out.

Anonymous 0 Comments

The key point that you aren’t getting, which is obvious from your tldr, is that these events aren’t random in the way you think they are.

It’s also the key behind using statistical methods to predict much larger populations, like using a sample of 1,300 people to reasonably accurately reflect the opinions of a pool of voters that’s more like 180 million or something.

Let’s take your example of road fatalities in Tennessee. Tennessee has millions of people driving on the roads. They drive, in total, about 55 billion vehicle-miles a year. The roads in Tennessee are not changing very quickly. The driving habits of the people in Tennessee aren’t changing very quickly. The weather in Tennessee isn’t changing very quickly. The safety features of the cars in Tennessee aren’t changing very quickly. So why wouldn’t you expect there to be almost exactly the same number of bad things happening every year?

You are correct that any given accident can be interpreted as the end result of a long sequence of events, some of which are relatively rare. But I think you’re fundamentally underestimating how many of those possible events there are. This isn’t a problem with you. Human beings are extremely bad at dealing with very big numbers. But that’s what’s going on. Whether a specific person gets in a car accident can’t be predicted with a high degree of certainty (although I’m sure you would agree that a given person getting in an accident is much more likely if the roads are icy, even if that’s all you know). But how many people over an entire state, in an entire year, getting car accidents is pretty easy to predict based on the history because of all the things that don’t change much from year to year.

Let me give you an example that might help. The odds of dealing a royal flush in poker are about 1 in 650,000. This means that if you shuffle your deck randomly and you deal out 650,000 hands, the likelihood that you’ll get at least one royal flush is about 63% (I can explain why it’s not 100%. If you care, but it’s not important to my point here).

If you are a casino and you deal 650 million hands of poker a year, you know you should end up dealing about 630 royal flushes per year. If you end up with a number that’s a lot bigger or a lot smaller, you should get concerned about whether you’re not shuffling fairly or whether somebody’s cheating. Actually, there are specific statistical ways to figure out whether the number you get is so much different from what you expect that something is wrong or different.

How are you able to predict such a rare event with pretty good accuracy? The answer is, there are so many opportunities for it to happen that you end up getting a bunch of rare events. You don’t have to know the exact conditions of how the royal flush was dealt, how the deck was shuffled, etc. to make this prediction. In fact, that’s kind of the whole point.

The bottom line is just that, when you’re talking about 55 billion miles traveled by vehicles, it doesn’t matter that crashes are rare and each crash can be decomposed into a bunch of steps, any one of which might have prevented the crash if it was different. All of those other chances get built in to your observed frequency. Unless something changes in the factors that drive the road fatalities, no pun intended, you’re going to get the same every year.

Anonymous 0 Comments

How many times do you think 100 coin flips will land on heads? Each flip event is random, yet youd probably be pretty close with your guess

Anonymous 0 Comments

So this is a statistical principle. When you have an event with a certain probability, like a car accident, there will be some variation day-to-day or week-to-week. But the larger the data set you have, the more likely the numbers are to inch closer to the true statistical average.

So let’s say 1 in 1,000 cars that pass by crash in this highway. (That would be really high but for our purposes we will use a clean number.) if you start counting at a random time, and the first 1,000 cars pass, you should see about 1 accident but you may see zero accidents or you may see 2 or 3. The sample is small enough that you’ll have some random variation. If you watch 5,000 cars pass, you’re still likely to not see exactly 5 cars. But if you watch a million cars go by, it’s highly unlikely that the data you gather will deviate from the true statistical likelihood of a crash.

The data you’re citing will be on a highway that millions of cars cross a year so the states will be fairly stable, accounting of course for changes to the population, the road itself, the types of cars on it, and whatever disruptions COVID caused to people’s driving patterns.

Overall driving a car is much safer than it used to be. The tots number of vehicle fatalities are less than they were in the 70’s and 80’s despite there being twice as many cars on the road and a lot more miles driven on average.

Anonymous 0 Comments

Thank you to everyone who has answered so far, I really appreciate your insight and responses. The car crash thing makes a little more sense now, but I’m still a bit stuck on the homicide numbers being roughly equal every year. I could see them remaining stagnant if the killers are never caught, which allows the same people to continue killing year after year. And I know some cities do have low solve rates for murder, but if we could magically find out who was behind every murder somehow, I believe statistically there would be extremely few repeat offenders in the larger sample size.

I guess the underlying question in my mind, from a problem-solving point of view, is how can we as a society reduce those factors that contribute in larger ways to these undesirable events like roadway fatalities and murder?

What is the largest population size possible to not have a single murder in any given year? I know small villages and towns around the world can go decades without a single murder, so it’s not impossible in a relative microcosm of larger society. And I remember from anthropology class that someone determined a long time ago that the optimum number of people in any given living area is something like 12 to 18 people per square kilometer or something. I may be misremembering but basically it was determined that bad things start to pop up due to inherent human nature when we exceed that number.

So is it primarily a matter of population density that drives up the murder rate? I could probably understand that. Of course there are abberant genetic factors that can predispose individuals towards violent behavior which we can more easily correlate to a statistical model, basically turning psychology into biology and mathematics.

Anonymous 0 Comments

It’s the law of large numbers. A sufficiently unlikely thing becomes likely when done enough times. 

Think of it this way. It’s big news for you if you win the lottery. It’s so unlikely that is incredibly unexpected. But it’s not really that exciting that somebody won the lottery. That happens all the time.