T, Chi-squared and F distributions

181 views

Hello. I was studying medical statistics earlier and I got to probabiliy distributions. Binomial, Poisson and Normal distributions are fairly intuitive. Then I got to t distribution, chi-2 distribution and F distribution and frankly I didn’t understand much.

Can somebody please explain in simple terms them to me? I would appreciate!

In: 7

Anonymous 0 Comments

The T distribution, also known as “Student’s T Distribution” is essentially a broader version of the Normal Distribution you’d use for very small data sets. Like, if you *think* a distribution is probably normal, but you only have 5 points of data, you’d use the T distribution to model it. In order to use the T distribution you need to input how many points of data you have, called “Degrees of Freedom” with, if I remember correctly, is 1 – the data points you have. So if you have 5 data points, you use the version of the T distribution marked “4 degrees of freedom”. As the you add more and more degrees of freedom the T distribution gets narrow and merges with the Normal Distribution at around 20 Degrees of Freedom.

Chi-Squared is used to determine correlation and is usually set up as an If – Or Else type question. For example (I’m making up all the math up) the question is – Is Cat Ownership correlated to a person’s race? So I could do some measurements observe that 40% of white people own cats. I then go and observe that 44% of Black people have cats and 37% of Indiginous American people have cats. Do these observations support my claim that cat ownership *is* related to race? In other words, *how far apart would the %s have to be before we can’t attribute the differences to random noise in the data and it becomes proof that race determines cat ownership?* IIRC in a test like the above Chi-squared doesn’t answer the question further, meaning it won’t tell you *which race* is obsessed with cats or hates them, just that race and cat ownership are, or are not, correlated.

The F Distribution – let’s say I want to compare types of cucumbers. I go out and find 5 different varieties of cucumbers and I buy 10 of each one. I can do some summary stats on the cucumbers. They have sample mean lengths of u1, u2, u3, u4 and u5 for example. But even if u1= 4 inches and u4 = 5 inches, and those are clearly different values, these are just small samples sets. Can I truly declare as a fact that cucumber variety 4 has a longer mean length than cucumber variety 1? Essentially, if I measure sample mean as 5inches, let’s say the real mean somewhere between 4.25 and 5.75. Similarly back to variety 1 with a sample mean of 4 inches, the real mean is between 3.25 and 4.75. Comparing the two you see there is actually an overlap, it’s possible the real mean of variety 1 is 4.75 and the real mean of variety 4 is 4.25, because this overlap exists we *cannot* conclude that variety 4 is absolutely truly longer on average than variety 1. This example is called an “ANoVa” test and doing it involves using the F-distribution to compare the sample means.