Beyond the obvious, that they are using different models and one may simply be higher quality than the other, there are also differences in the expected usage that make direct comparisons difficult.
The most likely explanation however is that TikTok auto captions simply [aren’t as automatic as you think](https://newsroom.tiktok.com/en-us/introducing-auto-captions):
>With this feature, creators have the power to edit the text of their captions once they’re generated.
YouTube does technically allow the same but it’s likely not done as much given the volume of content, average length of videos etc.
It depends greatly on the microphone quality, background noise, usage of proper nouns/non-English words, and above all, the pronunciation and speed of the the person speaking. Tiktok people tend to be using high quality phones/professional mics AND are speaking directly into the mic, for an audience.
Imagine this fast casual conversation in a loud bar:
* “Did you eat yet?”
* “No, but I could go for some Poke Bros or Hibachi.”
* “Let’s go eat!”
What this conversation ACTUALLY sounds like is:
* “Jeweet-yet?”
* “Nah but I could gopher some pokey prose or he bot chee”
* “Skweet!”
That’s why the computer can have trouble with auto-generation.
Latest Answers