what rotation is in Principal Components Analysis, please!


What’s rotation? And what is the difference between orthogonal rotation and oblique rotation?

In: 1

Let’s say you’ve done an experiment and you’ve measured some leaves. You’ve measured the length and width of loads of leaves. You then plot a graph with width on the x-axis and length on the y-axis, plotting a point for each leaf.

As a general rule, the wider a leaf is, the longer it is, although there are some exceptions. So when you plot the graph, you can see a pattern where there is a positive slope. You can draw a line of best fit which slopes upwards.

Now imagine rotating the data points around the origin, so that the best fit line, you saw earlier now lies along the x-axis. Now what you have is data which is spread along the x-axis, but with some deviation above and below it. That act of rotating the data points to line up the best fit line with an axis is the basic process of principal components analysis.

In this very simple 2D example, your new rotated data points have a new x-coordinate which is principal component 1, and a new y-coordinate which is principal component 2. In this leaf example, the two principal components tell you about the leaf shape in a different way. PC1 tells you overall size of the leaf, and PC2 tells you the shape (negative for broad and fat, positive for skinny and long).

PCA just rotates your data, as if it were points on a graph, so as to give you a different “view” of the data (we turned leaf length and width into size and shape in the earlier example). Of course, PCA can work with more dimensions, For example if you have 3 variables, you rotate your best fit line onto the x-axis, then you look for a second best fit line at 90 degrees to the first, and you rotate that onto the y-axis.

It’s get difficult to imagine when you have more than 3 variables, but mathematically the process is identical. However, one of the key things about PCA is that the first principal component is most “important” (it has the most variance), and as you extract the later ones they get less and less important. This makes it possible to throw away the later ones, because PCA concentrates the variance into the earlier components, so you can analyse fewer variables with more concentrated information.

The problem with PCA is that the principal components with the highest variance aren’t necessarily easy to understand – because they may be composed of data from many different and unrelated variables.

So, after we’ve done the PCA and thrown away the useless principal components, we now want a new view of the data which is easier to understand. We can now apply the concept of rotation again, but this time, we rotate the coordinates to find an easier to understand view.

Usually we want a view where after rotation, each principal component combines data from a few original variables, and is unaffected by most of the others. This usually makes it easy to understand what a principal component is telling you. The common rotation processes search for a rotation where this type of “simple” structure is brought out.

Orthogonal rotations assume that the data does not have any correlations remaining between the principal components. The problem with this, is that if there are major correlations between the principal components, then the search for a rotation will try to avoid those views, and the resulting data may be uninterpretable, with the principal components being composed of mashed up data from loads of different variables.

Oblique rotations don’t make this assumption and will allow correlations to come through in the final results. The result is that the principal components (or factors) aren’t as cleanly separated as with an orthogonal rotation, but you are much more likely to get a result where there is a “simple” structure which you can easily understand.