Trying to understand, in statistics (related to psychology), what does statistical significance and confidence interval mean exactly?

417 viewsMathematicsOther

As said above, I am trying to understand what statistical significance and confidence interval means. I am a psychology major whose reading a chapter for my Research Methods in Psychology class. The books definition is not helpful. Investopedia’s definition is better, but not enough.

So far, to my understanding, statistical significance refers to a claim that set of data that are not the result of pure chance, but instead are the result of a specific cause. What is a great example of statistical significance?

As for confidence interval (CI), It seems to be a probability that a parameter will fall between a set of values. In my case, this relates to the correlation coefficient (r). What is a great example and explanation of CI?

In: Mathematics

4 Answers

Anonymous 0 Comments

Statistical significance is related to the chance that a measured value for a treatment will be different than what it is being compared against (the control). A p value below 0.05 means that a difference in results between the treatment and control was observed greater than 95% of the time. The selected p value (commonly 0.05) allows for the study to set their standard for the amount of times a difference is observed compared to the control in which they would consider the difference to be statistically significant. Given a p value of 0.05, if a difference between the treatment and control is only observed 90% of the time then the results would not be considered statistically significant.

If a study produces a result in the form of a specific number the result should be reported including a confidence interval which denotes the range of values that the true value for that result could fall under. Essentially, values reported in this way are generated from repetitions of the same experiment which likely will produce slightly different results each time. Because of this it is impossible to know the exact true value of the result so the number reported includes a confidence interval which is commonly the value determine plus or minus the range of one standard deviation of all of the collected values for that treatment.

I hope this makes sense! This is my understanding of the concepts as a first year masters student. Anyone in the comments feel free to correct me if I have misrepresented the meaning of these terms!

Anonymous 0 Comments

As you mentioned, statistical significance determines the probability that the outcome of a study is the true outcome, or that you’re findings are accurate.

A simple example of this is flipping a coin. If a “fair” coin is flipped, it should have a 50% chance of heads and a 50% chance of tails. But let’s say you want to do an experiment to prove that. You could flip a coin once, get heads, and truthfully report that in your experiment 100% of the time you flipped a coin it came up heads. Or you could flip it three times, get heads, tails, heads, then report that in your experiment you get heads 66% of the time.

We know logically that this isn’t true because the sample size was too small, but a poorly designed experiment could come to that conclusion. With the example of coins, it’s easy to see where the error came from, but in a more complex study, it wouldn’t be so easy to just look at it and say flipping once or three times isn’t enough, so you use a confidence interval to help decide whether or not you should accept the results. Before starting your experiment you need to determine how confident you want to be (set your p value / confidence interval). Then you look at the parameter you’re trying to measure, apply some more complex statistics, and come up with the sample size (in this case, number of times that a coin needs to be flipped) to get an accurate answer.

The confidence interval is basically a range of answers that could be correct based on your study, or to put it another way, it’s how sure you are that the answer you got is the correct answer. Again, you have to predetermine what level of confidence you want. The standard in most sciences is 95%. After you do your study, you do some math with the results, and get a range of possible values that are supported by your study.

So going back to the coin toss example. Let’s say your hypothesis is that a flipped coin will be heads 50% of the time and you want to disprove that it will be heads 100% of the time. You pre-establish a 95% confidence interval.

You flip the coin once, get heads. Your result would be heads 100%, but your confidence interval (using made up numbers) could be something like 30% – 170%. This means that based on your experiment you are 95% sure that the true value is between 30% and 170%. But since you’re trying to disprove 100% and your confidence interval includes 100%, you can’t be sure which one is actually correct. The null hypothesis (The hypothesis you’re trying to disprove, that a coin will be heads 100% of the time) falls within your confidence interval, which means it could be the true result, so your study is not statistically significant.

But if you flip the coin a thousand times, you might get a result of 52% with a confidence interval of 49% – 53%. This means you’re study found 52% heads, but you are 95% sure that the true value is between 49% and 55%. Since your confidence interval does not include the null hypothesis of 100%, You can conclude that your study is statistically significant.

So in a way that it might be applied to psychology, let’s say you want to prove that therapy is effective at treating depression. So you establish a cut off that therapy will be 20% effective at treating depression and your null hypothesis (the hypothesis you’re trying to “disprove”) is that therapy provides 0% improvement. Again, you select a 95% confidence interval.

You identify a bunch of people with depression and randomly separate them into two groups, one that receives therapy and one that doesn’t. You give them all a depression test with a numerical answer, then give half of them therapy, then sometime later have them all repeat the depression test and a measure the change in their scores between the first test and the second test.

Let’s say your results show that people who received therapy have 18% less depression.

If your confidence interval is 0% to 22%, this would be a negative study since the confidence interval includes 0%. You’re 95% sure that there was a change in depression between 0% and 22%, but since the null hypothesis of 0% is included in the confidence interval, it’s possible that therapy was no more effective than no therapy so you can’t say that your study was positive.

If your confidence interval is 10% to 19%, this would also be a negative study. It doesn’t include zero, but it also doesn’t include your hypothesized 20% improvement, so it doesn’t meet your pre-specified amount of improvement. In this case, you might conclude that there is a trend towards improvement, but that it’s not statistically significant based on your predetermined criteria.

On the other hand, if your confidence interval is 15% to 22%, that would be a positive study. You are 95% confident that the true value is between 15% and 22%. Your predetermined criteria of 20% is included in the confidence interval, and your null hypothesis of 0% is not within the confidence interval, so this is a positive study.

Anonymous 0 Comments

When you are doing an experiment, you get the control results and the treatment results (where you changed something). You look at the results and see if what your testing is an “improvement” over the control, but then you have to ask yourself, what if it was just luck? What if the numbers just happened to land in a lucky way that makes my result look good? Was it a fluke?

Well, what you can do is see what randomness looks like and compare your result to that. We can take all the results we got, throw them in a hat and shuffle them around, and then randomly re-distribute them out and see what the result looks like. That’s one example of what randomness can produce. Then we can do it again and look at that result. And again. 1,000 times again. Now we have 1,000 results of what random looks like.

Now we can look at our original result again and see where it falls in the distribution of randomness. **How extreme is our result? How much of an outlier is it? Is it so extreme that it would be really unlikely for randomness to produce it?** Is it at least more extreme than 95% of the random results? If so, then we can call it **statistically significant,** because it has met the 0.05 (5%) burden that we set for ourselves.

Anonymous 0 Comments

Hey OP, just a heads up, when it comes to the task of interpreting p values and confidence intervals, one should approach with extreme caution as they are among the most misunderstood and abused tools in all of science. It has been my experience that most people inadvertently include at least one partially incorrect interpretation in their attempt to explain them to beginners.

This article does an excellent job summarizing the various pitfalls that must be avoided when interpreting them.
[https://link.springer.com/article/10.1007/s10654-016-0149-3](https://link.springer.com/article/10.1007/s10654-016-0149-3)