Test Reliability and Validity


I’m struggling in my stats class with all of the formulae and I have a billion questions. What is Cronbach’s Alpha? What is a correlation of determination? What is Pearson’s Product-Moment Correlation? How do you test validity? How do you know what test items are good and which are bad?

What even is reliability and validity?

In: 2

So the best thing to do would be to ask the professor if possible. Still, I’ll take a crack at it.

Lets say you have a survey of 20 questions. The goal of the survey is to see how much a community likes dogs. You develop 20 questions relating to that, you give the survey to a sample of the community, and you start to do analysis on the results.

So first things first, reliability and validity. Reliability is consistency between the questions. Let’s say one question on the survey is “Have you ever had a dog?” and another is “Do you currently have a dog?”. In a reliable scenario, everyone who answered the second question with a yes will answer yes to the first question. See, the goal of a survey is to have the questions all consistently get to the same point. In other words, in a well-constructed survey, there should be patterns in answers. This is called reliability, how often the survey gets consistent results.

The way you test for reliability is using an algorithm known as Cronbach’s Alpha. Now I’ll be honest and say I don’t know the formula, as when I did statistics work I used IBM’s SPSS program which has a way of kinda automatically doing it for you, but you still need to know how to interpret the results.

The basic gist of interpreting the results is seeing what the coefficient output is. If it’s greater than .70, it’s acceptable typically. Different people will have different thresholds. Some demand .90 minimum. What’s interesting about the software I used is that it calculated which questions in a survey were bringing the score down, and thus which questions you can delete for being bad questions.

Anyways, can’t help you with correlation of determination.

Pearson’s Correlation is a test for, well, correlating data. Correlating data is data that is… linked. Sorta. Basically, going back to my proposed survey questions, you would expect anybody who answered the second question with a yes to answer the first question with a yes. Thus, there’s a correlation between people who ever had dogs and people who currently have dogs. Correlation shows links in data, sometimes unexpected ones. Pearson’s test is a way of finding those links.

Validity is (in a basic way) if your survey is actually dealing with your topic. Let’s say on my dog survey I had no questions on dogs at all, and instead I asked a bunch of questions about cats. Well, the results could still be reliable (as they are consistent), but they wouldn’t really be valid as my goal is dogs and the questions are about cats.

Now as far as I know, measuring validity is harder because it requires like, interviewing and pilot surveys and serious analysis by experts to fully figure it out. There isn’t really a magical formula for it.

So how do you know what items are good and bad? Well I kinda already answered it, but to recap, if it brings down Cronbach’s Alpha or it doesn’t have anything to do with your topic, it’s a bad question on a survey.

I should state that I’ve actually done this stuff before for classes, administering a survey and analyzing the data, so I’m pulling a lot from memory and refreshers on Google. Anyone with better understanding of this stuff I would welcome to fix up any of my mistakes.