Why is the Median a thing? Why would someone need to find the Median of a data set?

382 views

I know it’s a common saying that statistics isn’t intuitive to humans. I’ve read my Taleb. *Intuitively*, I can see why one might need to find the mean (average) of a data set as well as the mode.

But where and why would someone need to find the Median? I’ve never calculated the median of a data set in daily life. On the other hand, I compute the mean of several values multiple times a week sometimes. I don’t calculate Modes that much, but I can see **why** someone would care about the most-occurring value.

Can someone explain the relevance of finding the Median? I’m sure there are plenty of useful applications and I’m just unaware of them.

Thanks in advance!

In: 4

14 Answers

Anonymous 0 Comments

The mean (or average) tends to be more useful when you are dealing with data that has a normal distribution, meaning there’s like an even spread of data points.

Imagine you had 100 students and 20 got A’s, 20 got B’s, 20 got C’s, 20 got D’s and 20 got F’s. The average will be a pretty good indicator of what the average grade score was.

But if you have a much more skewed data set, like say we’re talking about real estate costs of houses in a city, then the average becomes less important. For instance, say 10 houses out of 100 are worth $100 million but the other 90 houses are worth between $200,000 to $400,000. Well, those $100 million houses are gonna screw up the average cost *a lot*, making you think *all* the houses are super expensive in that city. But if you use the *median* you get a better idea of how the costs are *actually* distributed.

Anonymous 0 Comments

The median is useful when you have a skewed dataset. For example median income tells you something that is lost in the mean because the latter is dragged up by a small number of very high earners. 50% of people earn less than the median, 50% earn more.

Anonymous 0 Comments

The median is important for thresholds in distributions often. For example if you pick a random person there is a 50% that they are over/under the median, so the median gives a good idea how common certain value ranges are.

This becomes very appararent in very skewed distributions.

Wealth is a good example, if you ask “can a normal person afford X” it doesn’t help to know the mean because that’s mostly dominated by the billionaires wich are a small sample contributing the most wealth. If you pick a random person they will most likely own less than the mean.

If your distribution is the values of 1 to 98 and then 1,000,000 twice the mode is a million, and the mean is just above 20,000. But picking a random number from that distribution will most likely be close to the median of 50.

There is a more general variant of the median: the percentile gives you a value of how many values are above that ones. So if your IQ is in the 95% percentile it means only 1 in 20 people is more intelligent than you. The median is the value with the 50% percentile

Anonymous 0 Comments

Say you are sorting a list. You want to find the center of the list so you can break it into 2 lists, hello Median. You know if it greater than the median, it is the upper list, below in the lower list. You can then split them again, and again which is used in sorting operations.

It also lets you know if there is a skew. If you got a 100 from 1 to 100 samples and the median is 25, you know you got a crap ton of low values in that list. Or the median is 75, you got a lot of high values in that list.

Anonymous 0 Comments

Both median and average are measures of “centrality,” or some kind of an attempt to find the middle of the data. For things like height, average is useful: people tend to all be about the same height and very tall or very short people are rare.

But for some other things, that’s not the case. If you look at “average number of followers on Twitter,” the number is like 700. But most people don’t have a lot of followers and a few people have millions. That would be like looking for average height but one of your friends is 3 km tall. It wouldn’t tell you anything about the “middle” of the data.

So you can use median to get a better idea and see that the median Twitter user has fewer than 200 followers.

Anonymous 0 Comments

The problem with the mean is that it can be strongly affected by extremely large values or small values.

Lets take a small town with 10 people, but one is a billionaire. Yearly salaries are 9 people at 50,000 and 1 at 1,000,000,000.

If we take the mean, then the average salary in town is over 100 million a year (100,045,000)

But if you take the median, it’s $50,000

If we take a less ridiculous example, In the USA, mean salary is $60,575 while median salary is $56,420.

This gives a more accurate idea of how much people are making, ignoring the few rich people that are pulling the mean way up.

Anonymous 0 Comments

I am a microscopist and part of my analysis involves looking at “blinking” molecules, where if you’re looking in a microscope the pixels are dark most of the time but sometimes they are bright and on. As part of processing the image we like to subtract the “background” of the image, the part of the image that is visibile the whole time, from each frame. Picture a video of an immobile cell, where sometimes molecules randomly blink “on” and you get occasional bright spots. If you take every frame of the video and subtract off the cell, what you get is a video of just the blinking, making it easier to analyze them.

For each pixel, we *could* subtract off the average brightness of the pixel over the whole video. However, the bright spots will affect the average. If the pixel value is ~10 most of the time but sometimes it goes up to 200, then its average value will be higher than surrounding pixels without blinking molecules. This may be ok, but it means when you subtract off the average of the background, the blinking will not be as distinct.

Instead, if we subtract off the *median* pixel intensity, for the same movie the background median is going to be ~10, because the very high pixel intensities are outliers and their values aren’t incorporated into calculating the median.

This is just an example, but for general measurements where you want to characterize the “center” of the data but you don’t want very large or small outliers to affect that measurement, the median is generally your first choice.

Anonymous 0 Comments

It’s useful for showing the true middle… let’s say you want to know what the midpoint housing cost is in a particular town. Let’s say most homes sell for $300k-$400k but there’s an estate on the edge of town that sold for $10m. Finding the mean would hugely skew home prices and might make it seem that the town is much less affordable than it is. By finding the median, meaning 1/2 sold for less, 1/2 sold for more then you see a more accurate mid-point for home sale price. Same for looking at pay within a company, where a CEO making millions might skew what rank and file workers make, whereas most people want to know rank & file pay.

Anonymous 0 Comments

Well imagine that you want to know the approximate net worth of a group of a 1000 people. 1 of them is billionaire with $1B and 999 of them are broke with 0$ net worth.

If you look at average congratulations you see that everyone is millionaire.

With median you know that these people are actually broke.

Anonymous 0 Comments

If I were applying for a sales job, I would be more interested in the median income of salesmen than the average, since the average could be greatly increased by even one super-salesman.