Eli5: Box Cox Transformation

209 views

I’m working on getting my Lean Six Sigma Green belt and I pretty much understand every concept except this one. I’ve watched dozens of YouTube videos but none of them are “dumbed down” enough for me to understand them.

All I understand about them is that you have to take a data point and add an exponent to it…but why? What does adding this exponent do? Do you add the same one to every data point? How do you decide what exponent to use?

TIA!

In: 0

Anonymous 0 Comments

In statistics we love data that is normally distributed. A whole bunch of our standard statistical tests and tools rely on data being (roughly) normally distributed, so if we have data that isn’t they can give us meaningless results.

Box-Cox Transformations are a way of taking data that isn’t normally distributed and turning it into data that is at least roughly normal. Basically we take messy data, and by applying some simple rules to it, turn it into data easier to work with.

The [NIST page](https://www.itl.nist.gov/div898/handbook/eda/section3/eda336.htm) on them has [this handy graphic](https://www.itl.nist.gov/div898/handbook/eda/section3/gif/boxcox.gif), showing how this works in practice:

The top-left graph shows the original data. It doesn’t look very normal (in particular, it isn’t symmetric). We take the data, plug it into our fancy statistics software, and it gives up the top-right graph; a plot of how close to a normal distribution our data is based on different possible transformation parameters, λ. We want as good a normal distribution as possible so we pick the λ value that gives us the highest result (closest to 1 – a perfectly normal distribution).

λ is our magical transformation parameter. It is the thing we will use to transform our original data into our new data, using the formula:

> X = (x^(λ) – 1)/ λ

unless λ = 0 is our best option, in which case we use X = log (x).

That then gives us the bottom left graph – our transformed data set. We can see it look s a lot more normal.

[Disclaimer; I no almost nothing about Six Sigma stuff or if this is actually of any use, but Box-Cox transformations are real things.]