It works because modern video encoding (video codecs) spends most of its time just looking at *changes* to the video. So instead of sending a picture of your face tens of times a second, which would be a lot, it sends instructions about how different parts of the image are changing, such as how your face just shifted to the left a few pixels, and your eyes are now pointed to the left. By communicating only the areas of the image that are seeing motion, it greatly reduces the amount of data you have to send. You can then send these small packets many times a second and you can’t really perceive any significant delay when those changes are applied on the other side.
Latest Answers