I’m not asking what AB testing is, I am asking how the actual statistics work. I’m going to use MrBeast as my example.
MrBeast switches up his thumbnails sometimes up to 5 times a video, most people being under the impression that it’s for AB testing. I’m aware that AB testing is meant to figure out which thumbnail gets clicked on the most. What I don’t understand is how this data is even verifiable or accurate in any way considering the amount of variables:
* The 10mil people (example number) who saw the first thumbnail are completely different from the set of 10mil who would see the second thumbnail.
* Those first 10mil people who already watched the video are very likely not going to watch the video again after the changed thumbnail.\*
* The YouTube algorithm more than likely has a large hand in who gets recommended certain content.
My main confusion is from the first two points. How are you going to determine what thumbnail works better if it’s not the same set of people choosing between two thumbnails? In my mind, the way I see a definitive answer of a choice between two things is to have one group of people choose between those things. But if you have two different groups (or more), and the first group already watched the video so they’re not going to click the second, I can’t wrap my head around how this is even useful information.
In terms of YouTube videos specifically, I imagine most of the views come pretty much within maybe 12 hours after posting, at least with MrBeast. After that, I imagine the growth of views starts to stagnate at a certain point or even slow down. This is another major reason why I don’t understand how AB testing works, because the data from the 4th thumbnail change is probably in their “slow down” phase of video views, so again, not understanding how it’s accurate or useful info.
\*To further my second point, let’s say MrBeast’s team expects that he will get 500mil views on his video. With the first thumbnail, 100mil people watched it. That means 100mil people are now removed from that pool of potential people clicking on the second thumbnail, because it is under the assumption they don’t plan to rewatch the video. So the statistics are now dwindled to 400mil who can click on a thumbnail vs 500mil.
In: Other
You can calibrate this with videos that don’t change their thumbnail. You make a few of them, and you observe that (hypothetical numbers) the second day typically has 50% of the views of the first day. You make a video where you exchange the thumbnail after the first day, and you see that the second day has 70% of the views of the first day. It’s not guaranteed, but the new thumbnail was probably better.
People who watch the videos early tend to be more active on YouTube and more likely to be subscribed. That can matter, but we can take this into account by adding more data points. After the second day, go back to the original thumbnail and you can see how it performs with the average third-day viewers. Maybe use the other option again for the fourth day. You can get a pretty good idea how both of them perform under very similar conditions.
There are some effects that can mess with this analysis, but again you can take them into account. Videos might generally perform better on weekends for example, but you learn how much better over time so you can take that into account.
—-
Finding the best thumbnail for a single video is only helping you for that video, of course. Ideally you learn something that can be generalized, e.g. “showing Mr. Beast / a participant / an explosion / … in the thumbnail is positive/negative”. You can test this for individual videos and see if there is a consistent pattern. Patterns can be more complex, too. “Showing x in the thumbnail performs better on the first day”, “showing y in the thumbnail performs better on weekends” or whatever.
Latest Answers