When should one of mean, mode, and median be used over the other

589 views

When should one of mean, mode, and median be used over the other

In: 15

17 Answers

Anonymous 0 Comments

It all depends on what you want to achieve with your analysis.

Each tool has some very interesting properties that you should keep in mind:

(Caveat: set as in data set, not math set where values don’t repeat)

How many of mean, mode, or median can there be for a set of data?

Mode: Multiple

Mean: Only 1

Median: Only 1

Is calculated mean, mode, or median also present in the data set?

Mode: Always

Mean: Sometimes

Median: Always in a set with an odd count, sometimes in an even count set where values can repeat and never in an even count set where values can’t repeat

How much information about all the data values in a set does the mean, mode, or median provide?

Mode: Only information is that all the other values are not the mode(s) and no other value occurs as much as the mode(s)

Median: ~Half of the values are less than the median and ~Half of the values are greater than the median.

Mean: The differences between all the values and the mean sum to zero. Also stated as for all the values greater than the mean, the sum of their differences from the mean is the same as the magnitude of the sum of the differences from the mean for all the values less than the mean.

How can the mean, median, or mode be affected if I add a new value to the set?

Mode: Either no change, or creates an additional mode, or culls multiple modes into a single mode.

Median: Likely a small change, size of change in median not affected by how far away from the median the added value is

Mean: Likely a small change, adding low or high values causes a bigger change to the mean than adding a value close to it. Also highly affected by how many values are already in the set. Adding a value to a small set has the potential to cause a bigger change than adding a value to a large set

How many of the previous values do I need to remember to calculate the new mean, median, or mode when I add a new value to the set?

Mode: I need to remember all values

Median: I need to remember all values

Mean: I only need to remember how many values were in the set. The values themselves can be forgotten.

As for when to use each, keeping in mind some of the properties I talked about already, here are some general guidelines:

Mode: Useful when determining when a value is most popular. Which restaurant has the most votes among your group of friends before a night out? Who won the election?

Median: If I want to split up a set into two smaller half-size sets by value, the median is the split point. If I would normally use mean as the average, but the data set has one extremely low or high value, the median will be much closer to all the other values than the mean (this is typically why median income is used as a stat instead of mean income, the presence of CEOs with very large compensation packages makes mean income much higher than most people’s income)

Mean: Useful when trying to find a value that all of the values in a set are “close” to. What test score represents the value closest to the score all of the other students received? If I want to replace all of my workers with robots who have constant output, what value of robot worker output will I need to replace my human workers?

Anonymous 0 Comments

Mean = Average value
Mode = Most common value
Median = the middle value when you put them in a list.

Anonymous 0 Comments

The mean on its own intuitively explains “normal” data, or distributions with a big spike right in the middle that taper off symmetrically on each end.

The median is useful to add to the median if the data is “skewed” (leans in one direction) or has “outliers” (some data points that are way high or low and drag around the mean). A good example is household income, where a few super wealthy households drag up the mean income so that the mean overstates the wealth of the “typical” person, which may be better described by the median.

A mode is most useful for qualitative data like “favorite ice cream flavor”, but could potentially add info to a distribution with an unexpected spike in it. For example, if the mean on a test was a 75 but the mode was an 84… consider the possibility that some kids cheated together.

Of course, this is all an oversimplification. It’s best practice to include mean and median when describing distributions (they will be about the same in normal data). The standard deviation is also important info to include to describe how quickly the data tapers off.

Anonymous 0 Comments

It depends on how much you want to know about the distribution. Let’s say you’re shopping for break pads. There are a bunch of different brands available.

You dont know anything about break pads, you just ask the guy what kind people usually get. That’s the mode.

Or maybe you don’t want to trust the crowd, they probably just picked the cheap low quality one thats going to wear out faster, but you dont want the overpriced name brand one either, you just want the middle of the road breakpad, that’s the median.

In this case the mean doesn’t make much sense, you can calculate it, but then you get a number that doesn’t correlate with an actual break pad, so you pick the breakpad close to that calculated median.

Sometimes the mean is a lot higher or a lot lower than the median, that would happen if you accidentally include some professional race car breakpad in your calculation. A nice thing about median and mode is they sort out the outliers. There are better ways to measure the distribution, but if you’re looking for simple figures, the mean captures the distribution a little bit.

Anonymous 0 Comments

@OP did you get your answer?

Anonymous 0 Comments

All three of these measures can be used to think about what a ‘normal’ value for a variable is.

**When should you use the mean?**

I’m going to disagree with many of the other posts and say that in many real-world situations, the mean is not a very good measure to use to describe what a ‘normal’ value is.

* A lot of real-world variables, and I’m mainly thinking about money-related examples here, are not symmetrically distributed and the mean is a poor descriptor for what is normal.
* Many real-world problems involve taking a single sample from a distribution.

Want to know what a normal income is? What is a normal price to pay for a house? How long to people spend commuting to work? The mean would be awful for these because they are all going to be heavily influenced by a few extremely large values.

So when should you use the mean?

* The mean is great for when you need to understand what a variable does in the long term or if you are accumulating things over time. For example, given the distribution of daily rainfall, the mean is a good measurement to use to understand the expected amount of total rain over a month or a year.
* The mean can be good for describing properties of groups of things based on their individual properties. E.g. On average, 1 in every 400 widgets will fail within two years.
* The mean is a measurement that minimises how badly you can be wrong, so use it when you want a descriptor for the ‘middle’ of a data set that has this property.

**When should you use the median?**

The median is a great measure to use to describe what an average person’s experience is like. It is the ‘middle of the pack’ and is influenced by where the pack is, not what happens at the extremes.

**When should you use the mode?**

Others have stated this pretty well. This measure is most useful with categorical variables to answer the question of what the ‘most likely’ result is. E.g. What is the most common colour of car?

Anonymous 0 Comments

Say you paid $100 to enter a prize raffle (or perhaps to play slots).

The median prize value is $10. It helps you understand whether you did better or worse than “most people”.

The mean prize value is $80 dollars. It tells you that you can expect to lose 20% of your money if you enter many times.

The mode prize value is $5. Most of the time you’ll get $5 back.

The max prize value is $1000000. You could win this but only in the best scenario, probably unlikely.

The min prize value is $0. Very occasionally you won’t get anything.