What is the statistical importance of “median” and “mode?”

344 views

[ad_1]

Inspired by a question about standard dev, this one is a question that has been bugging me for years. I’ve been using the mean and stdev a lot in my work and during my college years, but I never really used median and mode outside when it was introduced. I’ve seen no use for it so far. So, ELI5? especially to those people who use this frequently.

In: Mathematics
[ad_2]

Median is good for excluding the extreme elements. Consider the difference between the mean income of a nation and the median income.

The mean income is enormously skewed by the Gates and Bezos of the world who have extreme wealth. In contrast, the median wealth gives us a far better appreciation of what we really want to know: how much money a ‘normal’ person makes.

Mode is generally only used with discrete sets, so it doesn’t really fit the same sort of problems that mean/median do. An example would be a multiple choice quiz you administer to your class. The right answer is C, but the mode answer is B. This allows you to identify a common misunderstanding about your class.

Mode is useful for where there isn’t a continuity. For example what is the most popular car color?

Median is useful for where the distribution isn’t normal about the mean, where outliers have an outsize impact – income is a fairly common example here. A small but very wealthy group can move the mean income to the point it starts to lose value when it comes to telling you about a populations economic standing.

Not sure about mode, but I’ll take a stab at median. Median is often preferable to mean/average for most statistical purposes. The reason is that the mean/average can be skewed by outliers. Thus, the median is often a better representation of the “typical” value for non-normal distributions.

Example: In the U.S., mean/average household income is about $90,000. But the U.S. has lots of ultra-wealthy people who are skewing that mean. *Median* household income is only around $60,000, which is a better measure of the “typical” income of an American household.

The median is the middle value of the ordered list of all values. If the number of values is even, it’s the average of the two middle ones. It is often used in big data stuff because it’s more resistant to outliers than the average.

If you pull data from some kind of sensor and it somehow messes up for a brief period and creates values vastly above of the usual range, the average is gonna go up as well. The median will stay the same though because it completely ignores what’s going on at the upper end and only tries to find the middle value.

The mode is the most common value in the list. It mostly appears in statistics stuff because it is the most likely value to appear if you pick one from the list at random.

Median would be the middle person. If you lined up 99 people by height…. the 50th person would be taller than 50% and shorter than 50% of the 99 people… and you’d take their height as the average. It’s a good way to represent an average when it doesn’t make sense to just add it all up and divide (like the mean). For example, in income… if you take 100 people and they all earn $10 an hour, but 1 person earns $1000 an hour… $10 is the median but $19.90 is the mean. $10 is clearly more representative here to describe the average person

The mode is another way of doing the above but for “most popular” choices. It’s usually for categories like favourite colour where you can’t rank them. If you take 100 people and 50 of them prefer blue, 30 green, 10 red, 10 yellow… describing it as the average person prefers blue would be using the mode

Most of the time in math the word “average” is treated as interchangeable with “arithmetic mean.” But in more casual language, the word “average” is defined as the value/outcome that is most typical or normal.

For a lot of data sets, using the arithmetic mean as the average works out just fine, because the mean value is a good predictor of what you’d expect a typical element of the data set to be. But this isn’t always the case.

Consider of group of you, 9 of your friends, and Bill Gates. If you wanted to compute the average net worth of this sample and you tried to use the arithmetic mean as the average, you’d find that Bill Gates’ presence in the sample majorly skyrockets the average up into the billions. And, unless you have some *very* rich friends, the arithmetic mean is hardly representative of the net worth of a typical member of the sample,

On the other hand, if you calculated the average by using the **median**, you’d find that the average value is the middle value, such that five people have more money than this value and five people have less money. Since it’s likely that you and your friends have similar net worths, the median is a much better predictor of the typical net worth of this sample.

Now suppose you go to the store and you buy 100 boxes of donuts. Each of these boxes is advertised as having a dozen donuts, but 15 of them are defective – a hungry store worker ate a few from the box before resealing it and shelving it – and have fewer than a dozen.

Now let’s say that, just as curiosity, you want to know how many donuts are in an average box. You certainly could use the arithmetic mean or maybe the median as your “average”, but neither of these would really accurately describe a typical box of donuts. Here you’d want to use the **mode**, defined as the most frequently occurring element of the data set, since 85% of the boxes have 12 donuts.

TL;DR: The median is used as the “average” when most of the data is clustered together but there’s one or more large outliers; and mode is used as the “average” when a particular value is repeated a lot.

Median is a more accurate “average” in many cases because a few outliers can throw off mean. Let’s say you’re looking for a city to live in and want to know average home price in Redditville… one rich guy with a $10m estate for sale could throw off the mean, while median shows the number where half the homes cost more, half the home cost less. Similarly, something like average salary at a company could be thrown off by the CEO or a handful of super highly paid C-level execs vs. the rank and file workers… median shows what at what salary half the company’s workers make more and half make less.

Mode is less commonly useful, but could be used to explain typical family size, ie. mode of 2 kids. Explains the most typical/common vs. saying the typical family has 2.3 kids.

Median is the number that is physically in the middle when you list all the data from least to greatest (or greatest to least). Mode is the number that listed most often. Mean (as you know) is the average of all the numbers. Range is the difference between the largest number and the smallest number. Having all the information can give you a good idea of what kind of data you are looking at.

The significance is going to depend on the actual data you are looking at. It has been 30ish years since I took statistics so I can’t give you any thing more specific.

In the zoo i work at we have started shifting to use a median life expectancy instead of average.

Let’s say you had animals that lived 28, 29, 30, 31, and 50 the average age would be 33.6, but the median age would be 30. The animal that lived to 50 would be an outlier that could bring that average age up more than using a median (which would be the middle where 50% of animals die before that age and 50% die after that age. It’s a little more representation using a median than average. This small subset of ages would be skewed by using the average when only one animal actually got past that averaged age

I’ll tell you in literal terms:
You want to figure out what the most accurate adverage income for a population is. The richest guy has 100 billion dollars and the poorest guy is 130,000 in medical debt and is living on the street outside that Billionaires hotel. But, out of 1 million people in that city 23% earn $39,000 a year, the plurality income. This is the mode.

Say, however, the 500,000th richest person earns 42,000 a year, and is thus halfway in the middle of the earning chart. This is called the median.

Say you add up all the total wealth of the population and divide it by a million. This is called the mean, and is an inaccurate representation of wealth distribution and state of affairs of the economic situation of a society because of the oligarchistic society we live in, it’s usually 50% higher than the mode of median