# eli5: if a certain phenomenon occurs in 5% of a sample, does that mathematically mean it’s 5% likely to happen for any member of said sample? if yes, how?

50 views
0

eli5: if a certain phenomenon occurs in 5% of a sample, does that mathematically mean it’s 5% likely to happen for any member of said sample? if yes, how?

In: 1 No. It means, if you pick a random member of the sample, there’s a 5% chance that it has the phenomenon.

This is subtly different than I think what you’re asking about, which is “Does every member of the sample have a 5% chance of the phenomenon?”…that would only be true if each individual member of the sample had the same distribution of the phenomenon and that’s not generally true.

For example, let’s suppose our phenomenon of interest is “cat” and we have a sample of 95 dogs and 5 cats. 5% of my sample is cats. If I randomly choose an animal from the sample there’s a 5% chance I choose a cat. But for each individual animal in the sample the distribution is 0/100…you either are a cat (100%) or you’re not (0%). You can’t have an animatl that’s 5% cat and 95% dog.

The same logic can be extended to continuous properties, like height. If I have a random mix of men and women and 25% of the sample is taller than 6′ that does *not* mean that each individual person in the sample has a 25% chance of being taller than 6′, because the height distributions for men and women are different. In general the distribution does not tell you anything about a single member.
It only tells you that 5% percent of the sample belongs to a certain class.
But nothing about the instances them selfs.
(Assuming the sample actually represents the underlying distribution)

If 10% of the students of computer science are female, it does not mean, that there is a 10% chance for every student to become female when studying cs.
That was already decided before they began to study.

So the information is a one way street. No.

If I have 20 people and 1 of them is sick, it means 5% of people in the sample are sick. It does not mean that if I pick a person from the sample at random, there is a 5% chance they will.get sick, it means if I pick a random person there’s a 5% chance I get the sick one.