Eli5: what happens on the server when one selects to watch a movie on Netflix and how the architecture allows 500k other people to watch the same movie almost concurrently?


Eli5: what happens on the server when one selects to watch a movie on Netflix and how the architecture allows 500k other people to watch the same movie almost concurrently?

In: Technology

Content Distribution Networks (CDN for short) are used. Basically, you request to watch a film on Netflix, then Netflix responds with “Sure, here is your film!” – behind the scenes, a CDN that is located near you will provide the actual data for that film to you.

500k people can watch the same film at the same time from Netflix, but the data that is sent to those 500k people will be coming from 1 of the many, many CDNs that Netflix own.

Two words: caching and locality. Streaming services like Netflix have smaller servers scattered all around the world that serve content to the users geographically close to them.

If a user requests a movie that isn’t cached on their nearest server, that server can ask around and receive a copy from somewhere else in the CDN.

First of all, not everyone stream simultaneously in perfect sync. You would watch and I start a few minutes late and we would be watching different parts of the same video.

Content in stored close to you using a technology called CDN, also close to anyone watch it. If you are lucky you may be in the same city. You just download from a close point.

Also since video streaming is a one way thing unlike video chat, they can use faster protocols.

One nice fact is Netflix stores multiple copies of the same movie in different qualities and switches between them according to your streaming speed. According to their research a %7 change in quality is not noticed, so quality can change you without even noticing.

They put their own cache servers into ISP’s too. They have or had a page to request them. It’s a win win, as netflix gets content closer to their customers, and ISP doesn’t pay for the traffic on the internet side.
Page still there, it’s their open connect appliance.. “Our appliances are provided free of charge for ISP partners who meet our basic requirements, but they are not for sale to other parties.”

the question to be asked is, how are they steaming hundreds of thousands of different shows being streamed concurrently, which I guess is the same as the replies here which is CDN and satellite servers

Across the world Netflix has thousands of cheap servers with a lot of storage whose only job is to handle downloads.

Two people don’t stream the _same_ movie file. The server lets them each download their own copy of the file _as they are watching it_.

Streaming is just watching the parts of a movie you’ve already downloaded while you’re downloading it. Buffering is when you catch up to what you’ve downloaded.

In addition to what has already been mentioned, Netflix doesn’t put their whole catalog into CDN and cache. They would bankrupt themselves if they tried to do that. The clever part is guessing which films and content to cache where in a way that minimizes latency access the whole system.

Straight statistics helps, but being able to predict which films individuals will want to watch with better accuracy than the overall statistics is how they make money. They improve their bottom line by saving money on storage and bandwidth.


For the technical gurus, netflix is using docker they have published several time about this https://netflixtechblog.com/the-evolution-of-container-usage-at-netflix-3abfc096781b

This is nearly an hour but it does detail some of the many microservices that let Netflix work.

Apologies I only have a YouTube link.


Some grey area streaming programs out there have the ability to leverage some of the internets backbone routers to multicast a stream. Basically one stream in, two come out.

They have 1k different content endpoint servers so that each only has to handle 500 users, which almost any silly server with a bunch of disk space could do. And those endpoints are closer (in network-distance) to the user, ideally.

Just to add to the answers already given. Half a million people watching the same content isn’t as stressful to the system as OP might assume. In fact, its the opposite (because it can use the cached data). A worst case scenario for Netflix would be the very opposite, loads of people all watching **different** things.

A lot of the comments also miss out that the film is not one big 5GB file. It’s delivered in thousands of tiny little chunks. You’re watching it at 05:08 and someone else is at 02:12:23, you’re looking at different files.

A lot of people have given some good answers. I think there is a bit more to it than just that CDNs exist and they are close to you. That is still true, but there is more.

Think of a large billboard or sign that everyone wants to read. At first, it doesn’t matter if 10 or 100 people try to read it, the sign can let everyone read at the same time. They don’t even have to read the same part of the same time. Its just there. This is how reading a movie file works on the internet.

Now imagine this was really important information on this billboard and thousands of people come to read it (network bandwidth). They all get in each others way and the people in the back are too far to see the words. They need to wait their turn to get closer. This is the network bandwidth. The bandwidth can handle a lot of connections, but will eventually get clogged up.

Now imagine this billboard was copied to different parts of the city. You just go to the one closest to you. This is the Content Distribution Network (CDN). Now the thousands of people are spread out over 10 different identical copies of the billboard so they can read it without getting in each others way.

Now, let’s say that this is a really really popular billboard. Millions of people want to read it. The message on the billboard is long and takes a while to read. Even with the 10 identical copies, people are still waiting in the back for the people in the front to finish. To help, they split the billboard into 20 smaller billboards. There are still 10 areas in the city that they have this billboard, but each of these 10 areas now has 20 smaller billboards spread out. You walk up and read the first one. You then walk to the next one when you are done. People can start reading a lot faster because the traffic moves on as they read the entire message. Netflix will split a movie into chunks, you download a few at a time, then the netflix app only asks for more chunks if you are ready. Slow readers (slow internet speed) can take their time and not cause a lot of issues. People who don’t finish reading the billboard (movie) can just walk back up to the chunk they left off, without having to start at the beginning.

Hope this is all clear… kinda longer than I thought it would be.

ELI5: they duplicate a lot of machine to do the same thing. They match you with one of those machine.

You ask for ice-cream, your parents go to the nearby shop and get it for you.

Similarly, some kid in another town asks for ice-cream, their parent goes to their nearby shop to get the ice-cream.

The parents aren’t going to the ice-cream factory. Ice-cream factory sends ice-cream to relevant shops, so that ice-creams are available for everyone!

In netflix’s case, the ice-cream is a movie. The nearby shop is a netflix machine that’s close to your home. You ask for a movie, and its served by a machine near your home.

So far I don’t see any really stellar explanations… so here’s my swing. First I’ll get into some tech details and then compare those to more ELi5 type learning so those reading can grasp both versions of the answer.

Most streaming technology is based upon UDP connections to servers running on some sort of cloud based infrastructure that serve up chunks of files based upon requests coming in from end clients. Most of these concurrent multicast bit stream connections are without handshake and without error check (mainly done for the sake of speed/throughput), and are supplied over massive bandwidth networks that form together their internal cloud based CDN. Netflix is no different and a lot of this tech was actually created by the MLB (fun fact, yes, Major League Baseball) long before there was the end of Blockbuster.

Now, to rephrase this to fit into a ELi5:

Most streaming apps like Netflix use technology that can be likened to using a string-can phone (the string being the UDP connection).

Imagine your home has a string-can phone, just like you would have had when you were 5. Instead of the other end of this can going to your neighbors (and best friend Timmys) house, it connects to a physical Netflix server located somewhere in an Amazon owned warehouse (referred to as “the cloud” or AWS).

You can pick up the can at any time, and ask whoever’s listening on the other side to hear/see/stream some content to you (aka a “request” from you, the end client).

When you do that, the other ends starts sending you what you requested over the connection but instead of hearing some muffled audio, your TV/watching device interprets the data being sent and displays it as video content on your screen.

What’s being sent isn’t the whole movie/show/etc but a chunk of it every few seconds based upon where you are in the timeline of said content (a bit stream). Part of what allows Netflix to operate is the streaming protocols that were first established (and then matured) by an internal team at the MLB (if I remember correctly) in the late 90’s, and without those capabilities streaming wouldn’t exist in general.

Each endpoint (the other side of the can) can connect to hundreds/thousands/tens of thousands of users simultaneously and their limit is know and usage is tracked in real-time by Netflix. As the usage increases (like during the pandemic or holidays) content providers like Netflix scale up the number of endpoints to make sure they can keep up with the demand in a particular region.

You can think of endpoints like a virtual school library, filled with information (aka videos) that takes the requests from the string-can connections. When the amount of connections reaches a particular saturation, the whole library is digitally copied to a new location and can now serve more connections. Rinse and repeat as needed.

The system itself is very complex and quite robust when you step back and look at it from afar. Part of the beauty of building something like that is making the tech transparent to end users- what they see is it just working to expectation.

If I’ve missed something significant, or if anyone wants more details, please feel free to let me know.

Video can be broken down into little bite sized chunks of independent files. So an hour long video is actually made up of maybe 360 chunks of 10 second video files. But your browser/TV is loading a few of these video files at a time so that you have some buffer in case the network has a blip.

A CDN, like other people have said, will “cache” or save a local copy, of some of those chunks. When you go to watch a video, through the magic of the internet, you will be directed to a server near you which also likely has some of these video chunks saved locally (if it’s a popular video). So not only do they not have to load it from the “source of truth”, but the data is close to you.

Depending on how many people are watching the video in your area you can imagine that many servers store these pieces of the video (because the servers have only so much ability to serve viewers). But the viewers are probably spread out across the country or world and one server can serve a lot of different viewers.