Sample size (number of samples) vs the size of your samples (sample effort?)


Let’s say you have limited time and resources to conduct any experiment. Which would be the most effective way to determine whether or not your results indicate a true effect? Taking more smaller samples or taking fewer but larger samples?

Everything points to larger samples sizes being better for reducing variance, but nothing I can find compares size vs effort. Obviously assuming independent sampling, etc.

For example, to determine bug community composition in soil… Should one take many small soil samples, or a couple large volumes of soil?

In: 1

It is dependent on the type of study being done. One of the professors in my field of research (functional MRI scanning) is currently quantifying that single subject scan time is most important up to ~30 minutes, then it becomes more important to add more subjects at that 30 minute time frame.

I don’t know much about other areas, but it seems like something that is highly variable based on the specifics of the research.

Seconding /u/Emyrssentry that this is going to be hugely variable depending on exactly what you’re researching.

What you’re describing here is two separate sources of variation, namely the variation within individuals and between individuals (though with soil the definition of individual becomes a bit more complicated — plots of land with clearly separate conditions?). You want your measurement to be true to the situation as it was in the whole individual, i.e. if I were testing for some biomarker in human blood I might need a minimum volume not just for the actual testing, but also to ensure I get a representative sample of that individual’s entire several-liter blood volume.

That’s all just optimizing your methodology, and statistics should generally come after that. But again, very much specific to your field.

I believe this depends on the variability within the sample unit vs between sample units.

Let’s say you want to know the average grade students get. Your You could take a sample of the average from schools or from individual students.

Let’s imagine that every student in a school gets the same grade. In that case you can use schools as your sample unit. You won’t get any more information from looking at the individual students within a school, because they’re all the sample.

The more different students are from one another, the more chance there is for sampling error.

So to know what sampling approach is best you need to know the variance within schools (or estimate it, since to know it you’d have to know all the grades already…). You can then use maths to work out what will get you the most accurate results. Don’t ask me how to do those maths.

That example is relatively easy because there’s a basic sample unit that you can’t go smaller than: the individual. With something like a soil sample it’s a bit more complicated and I don’t know how you’d approach it.

> Which would be the most effective way to determine whether or not your results indicate a true effect?

Conducting an appropriate statistical analysis. It may be possible to conclude that you have almost certainly found a real effect without taking any more data. Or it may be possible to conclude that an infeasible amount of data would be required to settle the issue.

> Taking more smaller samples or taking fewer but larger samples?

> Obviously assuming independent sampling, etc.

I’m not sure I understand what point you’re making. If everything is random and independent, then taking “fewer but larger samples” won’t make any difference if the overall sample size stays the same. This distinction is only important if you expect the data to have some kind of hierarchical structure. e.g. you might expect that there will be a difference between taking 50 blood samples from one person and taking 1 blood sample from 50 people. But this depends strongly on the setting and on what you’re trying to achieve.