eli5: What really happens under the hood when we get to video chat in real time to someone that is on other side of world?

248 views

I’m wondering how is this information actually transferred, how it can be that fast that we can see without any latency at all.

Thanks in advance,

In: 20

6 Answers

Anonymous 0 Comments

Ok well I’ll assume you want to start with infrastructure. There are [regional registries](https://en.wikipedia.org/wiki/Regional_Internet_registry) which can dole out blocks of IP addresses to autonomous systems (AS). An AS is just a number for a network, like a university or ISP. These AS’s [peer](https://en.wikipedia.org/wiki/Peering) with each other using the [BGP protocol](https://en.wikipedia.org/wiki/Border_Gateway_Protocol), often through a mutual gentlemen’s agreement to allow internet traffic to pass through each others’ networks, but sometimes in a upstream/downstream relationship where one entity buys bandwidth from another. When a signal traverses their networks, they determine a route (chain of AS’s) to send your packet through based on the ‘cost’ of transmission, a metric which could factor in manual business deals, latency, route length, or others.

All of the physical infrastructure between ISPs, or between an ISP and an IXP, is using fiber optic cable to ferry data about the world, so pretty much all international traffic traverses fiber optic cables at bottom of the ocean. Wireless towers are mostly used for the last mile between consumers and service providers, eventually towers and even satellites are going to transmit signals to a base station on the ground, and forward them across fiber optic cables.
There are many different media through which signals can pass, but the gist of it is to transmit pulses of light representing on/off (0/1) signals rapidly between network links. In the case of wireless links, users have to ‘share’ the bandwidth channel for the same reason that it’s difficult to hear when multiple people speak at simultaneously, which is the reason why wireless links have less capacity and efficiency than cables- everyone has to share the same medium and take turns talking. So with the network topology of large networks, even if there are many people transmitting signals on a network they will not be overloading the same machines, instead distributing the load across many switches/repeaters with internal packet queues, most of which eventually get processed by a router.

When you send an [Internet Protocol](https://en.wikipedia.org/wiki/Internet_Protocol) (IP) packet, it includes a header with the source IP address and destination IP address specified by the originator, just like with physical mail (except with some additional data like options and checksums) and just like with mail there is no guarantee that a packet will even arrive, let alone arrive intact or in order, so protocols like [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) have been built on top of IP that retransmit packets that get corrupted/dropped and keep track of packet order. Even though the internet is packet-switched, TCP creates a virtual ‘connection’ from point to point, without there actually being a direct physical connection between those points (circuit switching). There can be some performance costs to using TCP to transfer large amounts of data though, so there are also protocols like [UDP](https://en.wikipedia.org/wiki/User_Datagram_Protocol) that are used when minor packet loss is acceptable, like in your video chat.

Also, it takes some really beefy datacenters to send, receive, and process huge quantities of video data, so many video call programs will have ‘peer to peer’ functionality, which means that instead of using some central service to relay your video data to other people, your computers just use the central server to tell each other about what their IP addresses are and then connect to each other directly, using protocols like [STUN](https://en.wikipedia.org/wiki/STUN) and [TURN](https://en.wikipedia.org/wiki/Traversal_Using_Relays_around_NAT).

Abstracting away everything involved in actually *displaying the video* (lol), techniques like video compression, libraries, drivers, the kernel, and the underlying hardware, video data are just blocks of binary signals which can be interpreted by an output device as color. So you have several 8-bit values (# 0-255) used for red/blue/green, their combinations determining the color of a pixel on your screen. I have no idea how those output devices work physically, but I hope I cleared up some confusion about wtf the internet is.

You are viewing 1 out of 6 answers, click here to view all answers.