It works because modern video encoding (video codecs) spends most of its time just looking at *changes* to the video. So instead of sending a picture of your face tens of times a second, which would be a lot, it sends instructions about how different parts of the image are changing, such as how your face just shifted to the left a few pixels, and your eyes are now pointed to the left. By communicating only the areas of the image that are seeing motion, it greatly reduces the amount of data you have to send. You can then send these small packets many times a second and you can’t really perceive any significant delay when those changes are applied on the other side.
There is a delay, just often not very observable because of just how fast the signals travel. Some of the delay is actually just processing the information, and some of it is actual physical travel time.
Information travels mostly through fibre optics, which the signal travels around 60% the speed of light in. So in this case it’s literally a fluctuating light beam with a particular pattern that routers on both ends understand. It’s like Morse code but it’s not Morse code. There are actually massive fibre optic cables that connect continents.
It can travel through cell towers like to and from your phone network, which has a microwave transmitter that sends signals to other towers that have a receiving dish. It’s important here to note that microwave radiation is a form of light, all radiation is, this travels at like 99.999% the speed of light in air. Your phone also has a small transmitter and receiver and can relay that information to the nearest cell tower. It doesn’t need to be powerful because the cell tower is powerful.
It can travel to space satellites, like if you had starlink internet for example. Where it will relay from one satellite to the next, and then back down to the ground. This is also around the speed of light, maybe marginally faster sometimes because of low atmosphere for some of the journey.
In all cases it’s sending the information about each pixel for a frame as some 0’s and 1’s.
Pure red is the message 11111000 00000000 for example.
But since screens have many millions of pixels, to reduce the amount of information it will average several chunks of pixels together, and only send information on which pixels have changed, which is sometimes why your video quality reduces in favour of being fast.
Now it is doing the same thing for the audio at the same time, sending information about the audio channels in a sequence, many times a second.
There are different types of connections made over the internet, the two primary ones are TCP & UDP. The acryonm meanings arent all that important, compared to how they function.
TCP is what’s known as a “connection-oriented” protocol. It ensures data integrity, this is important for things like email or file downloads, where you dont want to miss bits of data. TCP is also slow, as it takes time and multiple back & forths to ensure all the data arrived, in order and unaltered.
UDP on the other hand is a stateless protocol, more of a “fire & forget” data stream. The sending device doesnt care if the receiver gets it all, it just keeps sending data. It’s used for streaming, audio calls, webcams, & the like. Without spending time & power verifying every last bit arrived intact and in order, it’s fast enough to keep up for audio-visual purposes.
Network infrastructure & especially cabling have also improved tremendously over the last 30 years to carry signals
>how it can be that fast that we can see without any latency at all
Actually there is some latency.
These days we have variable bit rates, so instead of cutting out completely when latency gets too high your video will decrease in quality temporarily.
Eventually we may be able to send data around the world, reliably, at the speed of light, and perhaps some day faster with quantum computing, but it’s going to be a while still.
>I’m wondering how is this information actually transferred
As ones and zeros, represented by pulses of electricity or light, across wires, fiber optic cables, and radio waves.
Over many decades, we have improved both the quality and capacity of those physical mediums, the manner in which that data is prepared for transmission (encoding), and the process of transmission itself (protocol).
Between you and the person you are calling, your video data is encoded, possibly encrypted, and compressed. That could be done on your phones themselves, but it would use a lot of CPU, battery, and get pretty hot. More likely this workload is shared between your phones and servers in the network of whichever application you are using.
Ok well I’ll assume you want to start with infrastructure. There are [regional registries](https://en.wikipedia.org/wiki/Regional_Internet_registry) which can dole out blocks of IP addresses to autonomous systems (AS). An AS is just a number for a network, like a university or ISP. These AS’s [peer](https://en.wikipedia.org/wiki/Peering) with each other using the [BGP protocol](https://en.wikipedia.org/wiki/Border_Gateway_Protocol), often through a mutual gentlemen’s agreement to allow internet traffic to pass through each others’ networks, but sometimes in a upstream/downstream relationship where one entity buys bandwidth from another. When a signal traverses their networks, they determine a route (chain of AS’s) to send your packet through based on the ‘cost’ of transmission, a metric which could factor in manual business deals, latency, route length, or others.
All of the physical infrastructure between ISPs, or between an ISP and an IXP, is using fiber optic cable to ferry data about the world, so pretty much all international traffic traverses fiber optic cables at bottom of the ocean. Wireless towers are mostly used for the last mile between consumers and service providers, eventually towers and even satellites are going to transmit signals to a base station on the ground, and forward them across fiber optic cables.
There are many different media through which signals can pass, but the gist of it is to transmit pulses of light representing on/off (0/1) signals rapidly between network links. In the case of wireless links, users have to ‘share’ the bandwidth channel for the same reason that it’s difficult to hear when multiple people speak at simultaneously, which is the reason why wireless links have less capacity and efficiency than cables- everyone has to share the same medium and take turns talking. So with the network topology of large networks, even if there are many people transmitting signals on a network they will not be overloading the same machines, instead distributing the load across many switches/repeaters with internal packet queues, most of which eventually get processed by a router.
When you send an [Internet Protocol](https://en.wikipedia.org/wiki/Internet_Protocol) (IP) packet, it includes a header with the source IP address and destination IP address specified by the originator, just like with physical mail (except with some additional data like options and checksums) and just like with mail there is no guarantee that a packet will even arrive, let alone arrive intact or in order, so protocols like [TCP](https://en.wikipedia.org/wiki/Transmission_Control_Protocol) have been built on top of IP that retransmit packets that get corrupted/dropped and keep track of packet order. Even though the internet is packet-switched, TCP creates a virtual ‘connection’ from point to point, without there actually being a direct physical connection between those points (circuit switching). There can be some performance costs to using TCP to transfer large amounts of data though, so there are also protocols like [UDP](https://en.wikipedia.org/wiki/User_Datagram_Protocol) that are used when minor packet loss is acceptable, like in your video chat.
Also, it takes some really beefy datacenters to send, receive, and process huge quantities of video data, so many video call programs will have ‘peer to peer’ functionality, which means that instead of using some central service to relay your video data to other people, your computers just use the central server to tell each other about what their IP addresses are and then connect to each other directly, using protocols like [STUN](https://en.wikipedia.org/wiki/STUN) and [TURN](https://en.wikipedia.org/wiki/Traversal_Using_Relays_around_NAT).
Abstracting away everything involved in actually *displaying the video* (lol), techniques like video compression, libraries, drivers, the kernel, and the underlying hardware, video data are just blocks of binary signals which can be interpreted by an output device as color. So you have several 8-bit values (# 0-255) used for red/blue/green, their combinations determining the color of a pixel on your screen. I have no idea how those output devices work physically, but I hope I cleared up some confusion about wtf the internet is.
Latest Answers