eli5 the kolmogorov-smirnov test

303 views

I just need to know the purpose of it and what exactly it does, I don’t need to know how to do it yet.

In: 0

Anonymous 0 Comments

The KS test helps you compare two distributions (think of curves like a bell curve, which is a normal distribution, versus a flat line, which is a uniform distribution) to help see if they are statistically likely to be the same or different. Like most frequentist statistics, “the same” here basically means “they are only slightly different because of random chance and fluctuations”.

An example might be comparing the age distributions (how many people are X years old) of two countries. Sure, you could do a simpler test like just take the averages and run a t-test or Wilcoxon test. But imagine a scenario where one group is just 100 people all age 20, and another is 50 people age 10 and 50 people age 30. They’ll have the exact same average, but they are clearly different distributions. A KS test helps you identify those differences in a statistical way.

More specifically, the KS test is non-parametric, so it doesn’t need to make many assumptions about the underlying data (in the simpler example, a t-test is a parametric test, a Wilcoxon test is non-parametric). The “test statistic” (the thing you actually calculate to help determine your answer) is the maximum difference between the two distributions. In other words, if you plotted the two curves on top of each other, what is the longest vertical line you can draw between them? The bigger this maximum difference, the more unlikely it is that the two distributions are the same.

If you need either more or less detail/jargon, it might help to know the context in which you’re encountering this and what your background with other statistics is.