what is homoscedasticity and what is its relevance to statistical analysis?

101 viewsMathematicsOther

what is homoscedasticity and what is its relevance to statistical analysis?

In: Mathematics

2 Answers

Anonymous 0 Comments

Homoscedasticity is not a 5-year-old comprehension word. But in analysis of groups of things, where you want to compare averages (means) of the groups, it is a way to think about how different the individuals are within the group. You generally will assume that differences amongst people in a group are the same from group to group, but that that the averages might be different. E.g. maybe one class of students has a 92/100 average on a test and another class of students has a 56/100 average on the test, but the spread of scores around those averages is approximately the same (homoscedasticity), thus there is a “smart” class and a “less smart” class on the whole. When the spread differs between the groups, that violates the “assumption of homoscedasticity.” The reason this is a problem is that you use the spread of scores to determine how different the averages are expected to be by chance. If one groups has a different spread of scores than another, it messes with the understanding of how spread apart the average might be – making it hard to tell if the difference between the two classes (e.g. 92 vs 56) is a “real” difference, or just something that would have happened by chance. There’s no “fix” to a dataset that violates homoscedasticity, but there is a number of alternative approaches to take if this violation is detected – but by far the best way to deal with it is to identify the reason why one (or more) groups has wildly different spreads of scores than the other(s).

Source: taught college statistics for over a decade.

Anonymous 0 Comments

Homoscedasticity is another word for homogeneity of variances. Let’s take a simple example: you have two groups of 10 people playing the same video game on a different console. In group 1, the fastest player to finish a particular part of the game does so in 30 minutes, while the slowest does so in 5 hours. In group 2, the fastest player to finish that same part does so in 32 minutes, while the slowest does so in 52 minutes. The variance of the times in group 1 is way bigger than in group 2. So the assumption of homoescedascity is violated in this case.

Why is this relevant for statistical analysis? You basically got two flavors of statistical tests: parametric and non-parametric tests. For paramatric tests to make sense, they have have the assumption of homoescedasticity. So when that assumption is violated (as in the example I just gave), on of the basic assumptions is violated, leading to nonsensical results. When 1 or multiple assumptions of parametric tests are violated, using the non-parametric variant is advised.