How does auto-generated subtitling work for live videos?


Yesterday I was watching NASA’s live press conference about Perseverance and I couldn’t wrap my head around how auto-generated subtitles would produce a text or phase long before the speaker words it out.

How does it happen?

Live films aren’t often true live. There’s a delay purposely put in. The delay is sort of like they’ve filmed it and then wait and send the video on to be broadcast. This is so they can cut the feed or edit out people trying to gatecrash, speaker going off on strange unexpected tangents, technology issues or any other numerous things. The delays are often only seconds long, but that is enough for a speech to text program to translate and spit it out so it’s sent with the ‘live’ feed.

You could try speech to text in Google docs. You will see how fast it is.