Is there a way to compare averaged ratings, each with a different count of participants

779 views

— **Closed**. Many satisfying answers. Thanks! —

Let’s say people on the street are asked to rate three different movies on a scale of 1-10, but only if they saw it. We get the following, *already averaged* data:

* Movie A: 1000 people gave A 8.1 on avg.
* Movie B: 10’000 people gave B a 6.9 on avg.
* Movie C: 20’000 people gave C a 7.4 on avg.

Simply according to the averaged ratings the movie to watch would be movie A. But taking into account that way less people watched A, its rating does not seem as accurate, i.e. trustworthy as the ratings of the other two movies: C seems to be best suited for the average viewer, while A seems to be the choice if you like that type of movie (or it is a hidden gem).

So back to my question: Is there a way to calculate one value (preferably in the original rating range) for each movie that allows accurate (approximate) comparison of these movies – i.e. that takes the count of participants into account?

​

​

*What I’ve tried:*

* Dividing and multiplying these values:

|Movie|Participants: *ps*|Avg. rating: *rg*|*rg/ps*1000*|*ps/rg*|*ps*rg/1000*|
|:-|:-|:-|:-|:-|:-|
|A|1000|8.1|8.10|123.5|8.1|
|B|10000|6.9|0.69|1449.3|69.0|
|C|20000|7.4|0.37|2702.7|148.0|
||||useless, I think|better, but probably not fair|maybe even better, but still unfair|

​

* Getting the total of all participants and averaging the ratings accordingly

In total 31000 people have participated and have given a 7.3 to all three movie on average [sum of (ps*rg for A,B,C) divided by 31000]. I feel like this could be helpful, but then again I have not gained much with this new average. I simply cannot grasp how I could find such a number…

​

Any help/explanation appreciated!

In: Mathematics

4 Answers

Anonymous 0 Comments

In statistics if a population reaches 1000 individuals it is considered to be representative. It doesn’t matter if you increase the number of people you ask the accuracy will be the same.

This assumes that the population (the people participating a study such as a questionnaire about movies) is randomly selected. If the group is not randomly selected and you for example select 1000 people from a horror movie club or from an older people’s home. Then this will influence the results and make them not representative of the general population. The horror movie fans will likely select a horror movie and the older people a golden age classic. The result is biased.

The problem you bring up is real though. Imagine you could only find 3 people to ask about movie A, 200 people for movie B and 1000 for movie C. You can still calculate an average for each movie but for movie A and B the average is not representative since the idiosyncrasies (all the ways that people are NOT like other people) will influence the result. In movie A one of the three people might straight up just hate movies and rank it a 0. Even if the two other participants both rank it 10 the average is no higher than 6.6.

The problem

You are viewing 1 out of 4 answers, click here to view all answers.