Boxplots Outliers, how are there values outside of the minimum and maximum whiskers?

72 viewsMathematicsOther

I don’t know much about statics but looking at a boxplot there’s a minimum whisker to the left and a maximum whisker to the right. Then, there’s outliers laying outside of both whiskers.

If the minimum whisker marks the lowest value for the data, why are there values outside of the minimum or maximum whiskers?

In: Mathematics

3 Answers

Anonymous 0 Comments

Alright, imagine you have a bunch of toys in a box, and you want to see how heavy they are. You line them up from lightest to heaviest and draw a picture to show their weights. That picture is like a boxplot.

Now, most of your toys might be around the middle, with a few really light ones and a few really heavy ones. The box in the middle of the picture shows where most of the toys are. The line in the middle of the box is the “median” weight, which means half the toys are heavier and half are lighter.

The “whiskers” are like lines that show how far most of the toys go. The whiskers reach up to where most of the toys are, but they don’t show every single toy. So, if there’s a really, really heavy toy or a really, really light toy, it might be outside of the whiskers. These are called “outliers.”

Think of it like this: most kids might have toys that are not too heavy or too light, but there might be a few kids with really heavy or really light toys that are different from the rest. These outliers show up as points outside of the whiskers on a boxplot.

Anonymous 0 Comments

If there are outliers beyond the whiskers, then the whiskers are representing something other than minima/maxima. One common choice is the smallest/largest data value within 1.5x IQR from Q1/Q3. A proper figure legend should state explicitly what they mean, though.

Anonymous 0 Comments

Outliers are values so far away from the rest of the data that it’s not helpful to use them as the minimum or maximum values, so they are essentially ignored for the plot itself. Let’s say you wanted to weigh various small objects around your house. They mostly weigh a couple of pounds, and you even find one that weighs ten pounds. You make a boxplot of the data and it gives you a good idea of what the distribution looks like.

Then you remember you have a tungsten cube up on one of your shelves and it weighs over 40 pounds. If you included that in the max value whisker, then suddenly that whisker would be like 8 times longer and becomes functionally useless. The whisker is supposed to tell you where the upper 25% of values lie, but because of one single value, you’ve skewed the entire plot to make it look like most of that quartile is far above where those values actually fall. As such, it is far more informative to just say “yes, we technically have a 40lb object in our data, but most of our stuff is actually way down here in this whisker”