Data analyst here. Forgive me if this is a bit meandering or inaccurate in some fashion. I welcome any critiques or corrections. Typing all this out on my mobile device.
It’s a looooooot of factors multiplied out to a general statistical model. Sports, horse racing, etc. are notoriously difficult to get nailed down in a reliable way because, as you point out, there are a lot of factors. Sports are super chaotic, and depending on how many games are played out, it could make things even harder (i.e., it’s easier to predict who would win the World Series than the Super Bowl because there are many times more baseball games than football games, making the law of averages pan out more reliable over many dozens of games).
So, more simply, let’s look at dice odds as an example. Each die can represent a factor. So if you’re playing a game with 2 dice, you have 2.78% chance to roll a 2 on a single roll because 1 has a 16.67% chance of coming up on one die, so 16.67% × 16.67% is 2.78%. You can scale this up and up and up indefinitely, depending on the count of dice. At 100 dice, you could have values rolled anywhere from 100 to 600, with varying likelihood of these things happening, with the average of all 100d6 rolls being approximately 350. The more times you roll, the narrower the possible average of all rolls gets to that middle number. It’s always 1/6 × 1/6 × 1/6… with the number of 1/6s being equivalent to the number of dice younuse
You can then apply this to variables in a sports match. I’ll use a simple one with one v one like bowling. Person X and Person Y are going head to head. There are going to be a bunch of different variables (or “dice”) that we think will be “rolled” in this match. But these aren’t simple 6-sided dice with a 1/6 (or 16.67%) chance for any given number. They’re all many-sided and represented as percentages. And the game that bowler X and Y represents one “roll” of the dice. If that roll is over .5, X wins. Below .5, Y wins. And each “die” is multiplied together to show where the average is more likely to fall.
So let’s say X has a higher bowling average than Y. 295 vs 290. This is a 1.6% advantage in average score. Assuming they’re only playing one round, you could say this tilts things to a probable outcome favoring X at 50.83% likelihood to win. Still pretty damn close to a flip of the coin odds. But let’s say Y injured their hand 2 weeks ago. Maybe someone speculates that it gives a significant advantage to X. So that becomes one of the factors we multiply in. Say it’s 75%. Assuming the *weight* is the same for average score and injury, then we have 62.92% odds in favor of X. Maybe we just round it to 3:2 odds (or 60% chance of winning).
So every variable we think would *reasonably likely* has an impact on the performance of bowler X or Y at the time of the match gets multiplied in. Average score, health or injury, venue, style of play, method and patterns of the lanes being oiled, whether they had to travel long distance (jet lag), etc. etc. etc. could become variables. I mentioned weight before, which adds the additional factor of the proportional impact. So the average score of a bowler and their health/injury status are gonna carry the most weight. So they’re likely to move the needle on odds pretty heavily. But jet lag, maybe not so much. So when it is factored in, perhaps its weight impacts the likelihood of a win only a fraction of its value. In the end, you get a percentage that is then simplified to the odds ratio. So Y has a 40% chance to win, then it’s 2:3 odds. X has 60%, or 3:2. The ratios can be confusing when doing straight line percentage odds, but they’re useful in how payouts can be expected. You bet on X, you get 2 bucks for every 3 you wagered if X wins. You bet on Y, you get 3 bucks for every 2 you wagered if Y wins.
Latest Answers