As others note, the birthday paradox describes a certain counterintuitive behavior of statistics when applied to probabilities that involve pairs. It is important to realize it is *not* intended to be a literally applied phenomenon – but rather more of a thought experiment.
Let’s start with the assumption we have some population of people for whom their dates of birth are independent and uniformly distributed – note this isn’t really true in the real world, but it makes the analysis much easier. Now we can ask various questions about the people in the room like:
>What are the odds a randomly selected person is born on August 17th?
We can answer this question by enumerating all the possible events for our person:
>{ Jan 1st, Jan 2nd, Jan 3rd, … , Dec 29th, Dec 30th, Dec 31st }
From our assumptions, we know that the odds for any particular one is 1/N where N is the size of the set – e.g., 1/365. So far so good, everything makes intuitive sense.
Now let’s ask the more complex question
>What are the odds two randomly selected people are born on August 17th?
We answer by the same process – we’ll enumerate all the birthdays for the first and second person selected:
>{ Jan 1st and Jan 1st, Jan 1st and Jan 2nd, Jan 1st and Jan 3rd, … Dec 31st and Dec 29th, Dec 31st and Dec 30th, Dec 31st and Dec 31st}
Counting up the size of the set we see that it is a total of 365*365 events, of which only 1 satisfies the requirement – both born on Aug 17th.
Now we can ask a slightly different question:
>What are the odds two randomly selected people are born on the same birthday?
The enumerated set of dates is the same as before, except this time our criteria is different – we’re now looking for *any* matching dates, not just a *specific* matching date, e.g.,
>{ Jan 1st and Jan 1st, Jan 2nd and Jan 2nd, Jan 3rd and Jan 3rd, … Dec 29th and Dec 29th, Dec 30th and Dec 30th, Dec 31st and Dec 31st }
There is exactly one match for each day of the year – hence a total of 365 matching birthdays. As above, our enumerated set had 365*365 total dates, so the odds of landing on a matching birthday are 365/(365*365), which reduces to 1/365.
This latest result is counterintuitive – just by changing the selection criteria from a specific date to a matching date, the probability went up dramatically! This is the core truth behind the birthday paradox – our intuition for probabilities in English isn’t very good at capturing the distinctions that matter in the probabilities.
Latest Answers