What does ‘orthogonal’ mean in a statistics context?



I have tried many times to understand this and thus far failed every time. What does it mean to say that data is orthogonal? Can you provide an example of data that would be orthogonal versus data that would not be orthogonal?

In: Mathematics

Simply put, orthogonality means “uncorrelated.” An orthogonal model means that all independent variables in that model are uncorrelated. If one or more independent variables are correlated, then that model is non-orthogonal.


I like to remember it by thinking about what does “orthogonal” mean in geometry. Two lines are orthogonal if they have are 90 degrees from each other in euclidian (x and y) space.

The way you represent variables in the math of statistics is in a surface of multiple variables (let’s just think of two to make it simple). If I can chart one variable along the x axis and one on the y axis, then the math becomes much simpler because I don’t have to take into account the relationship between the two. That means your variables are orthogonal and you can simplify the equations.

Independent, unrelated.

For example, whether someone likes cats or dogs more is (probably, didn’t check) orthogonal to their body weight, but weight is correlated with overall health status.

Orthogonal means that the two datasets do not share any common factor that might lead to them being correlated. This is usually a consideration when you are using 2 different datasets or experiments to bolster your confidence in the result.

As an example, you might say ‘men and clumsier than women because they tend to bang their head a lot on doorframes’.

You measured this by looking at dents on doorframes used by males, and also by looking at concussion statistics in hospitals. On the outset, these seem very different datasets…so you might think that if they both gave you the same result, your conclusion is very robust.

But the hidden factor here is that men are generally taller than women and have slightly thicker skulls/mass. So it is likely that they dent doorframes more often and also end up at the hospital more often. Thus the datasets are not orthogonal …dents in doorframes and concussions at rhe hospital are dependent on each other through height. So in fact you are sort of measuring the same thing in the 2 apparently different datasets…they do not contain any additional information. Sort of like saying the glass is half full and then saying the glass is empty. It does not increase your confidence in your result, because the data is not orthogonal.