If there are extreme outliers, averages don’t work well. Two examples.
If Bill Gates walked into an elevator with 9 other normal people, the average net worth of everyone there would be around $10b which obviously doesn’t mean anything.
If aliens asked the average number of testicles that all of humanity have, aliens would assume that the vast majority people had one testicle.
Using both gives you a good idea of income distribution.
In the very unusual case that the mean is lower than the median, you might very well scratch your head over why more than half of everyone are almost equally rich except for a select few people making nothing.
Also, a true statistics nerd that just want his quick fix would prefer a consolidated graph with 3 curves. One for percentile median, one for percentile mean, and one for the percentile quotient.
They tell us different things.
The mean is the result of adding up everyone’s salaries and then dividing that total by the number of people in your sample. The median is the result of sorting everyone by salary, and then picking the person in the middle, meaning you have as many people earning at least as much as the median, as you have people earning at most as much as the median.
Let’s say we have five people, and these are their salaries.
Alice: $5
Bob: $15
Carla: $20
Dave: $25
Emma: $1000
The mean here would be (5+15+20+25+1000)/5=$213, and the median would be $20. You can see that the mean is being very heavily skewed by Emma’s earnings, and makes it look like wages are much higher than they are. Median, on the other hand, is less susceptible to being affected by extreme values. It doesn’t matter how much Emma earns, or how little Alice earns, we get the value squarely in the middle. But median doesn’t tell us anything about the distribution of the values. If Alice and Bob were only making $1, while Dave and Emma made millions, the median would still be Carla’s $20.
So which one is better? Neither. They convey different pieces of information.
No one statistic is good for everything. As others have mentioned mean can be swayed by outliers, so median is a good snapshot of the middle of the pack score.
Mode (the most common data point) is a bit of a meme, but it’s good to check if a data set is “bimodal” (or multimodal). For example, a test could have a bunch of students that got an A, and a bunch that got a D, and a couple with Bs and Cs. The median and mode would both say the average is a B or a C, but most people didn’t get that mark.
Another thing that is good to look at is the interquartile range (25th percentile – 75th percentile). It’s a pretty robust way to see what most people have.
Finally, standard deviation (the average amount to which each data point varies from the mean) is also good to be aware of. This shows how much the average individual data point does or does not ressemble the mean.
Let’s say you have five people. One person living in poverty makes $10,000 a year. Three middle-class people make $40,000, $60,000, and $80,000 a year. The final person, who is very wealthy, makes $150,000,000 a year. Their mean salary is $30,038,000, but their median salary is $60,000. Which of those is closer to what most people make?
One problem with the mean is that having too many outliers on one end of the spectrum can distort it. The median is less susceptible to that. Salary is one area where you have a lot of outliers on the wealthier end of the spectrum, while it’s not possible to have many outliers on the lower end since it’s not possible to have a salary less than $0. As a result, the mean salary is highly distorted by the ultra-wealthy, while the median salary is more representative of the average person.
Latest Answers