in randomization, what is population and seed?


I keep trying to figure it out, but every resource I find uses the same wording

In: Mathematics

Your population is simply the group that you’d be selecting from. If you want to randomly pick individuals that represent a country, you could say your population is everyone living in that country.

The seed is basically where you want to start your randomization. Computers have no way of actually randomizing things, they all have a list of numbers that they read from. Like literally a list of numbers that some guy came up with and then a computer reads it in order to create “random”. The seed is simply telling the computer which number to start at when reading the list. the value is that if you want a “random” distribution but want your calculations to be consistent while running code, you can dictate the seed (e.g. seed=14) so that you get this random list but it stays consistent if you want to run the code more than once, such as while you are developing or during a presentation.

In statistics, your population is the pool of items that you are going to pull a random subset from. Knowing any biases that exist in your overall population is important. For instance, if you do a survey pertinent to household income in New York city compared to rural Montana, you’ll get different data.

A seed has to do with random number generation in computers. Since computers are inherently predictable, randomness is redefined to mean unpredictable and equally likely amongst all possible answers. To do this, the computer runs some math that cycles some really large number around, and then gives you a subset of that number. Since the algorithm is constant, the only way to get a different series of numbers is to choose a different starting point. That starting point is your seed. Common seeds used are either based on current time, or on some measurement of something physical like mouse movement, keystrokes, or hard drive response time.

A population is the group of people you’re taking a sample from. For example, if I wanted a simple random sample for an experiment of 25 women in missouri, the population would be ALL the women in missouri. Not the number of women, but literally just the concept of all the women in missouri. If my experiment used a sample of 170 Americans, then my population would be all Americans. It basically is just talking about who you can generalize the statistics to when you’re drawing conclusions.

Seed has less to do with sampling and more with just random number generation. One of the ways you can draw a random sample from the population is by using a computer to generate a list of random numbers. But the way that computers work makes it so that they are literally incapable of coming up with a random number. Instead, they have like a database with a long list of numbers that it uses to generate a random number. The seed tells it where in the list to start looking for the first number. Think of it like when you give a computer the seed 301, it will start on the 301st number and start listing from there.