eli5 , How does a server receive thousands of requests at a time ?

630 viewsOtherTechnology

I always wonder how a server like google server receive thousands of request from people at once! I understand the signals are passed through fibernet cable, but won’t those signals collide ? Or how can some small wire handle soo many requests at once ? iin my mind a wire can only take one request at a time. I know these happens close to light speed, but still! its very hard to understand.

In: Technology

13 Answers

Anonymous 0 Comments

the wires are not handling anything in this scenario, they transfer data but what that data is doesnt really matter much to that wire.

how companies like google handle so many requests is having a LOT of bandwidth.

specifically google and a few others are even operating their own undersea cables so they have bandwidth available just for their own services.

Anonymous 0 Comments

This is called multiplexing. Basically multiple different signals (eg from different internet users) are combined together and sent down the wire and then separated at the other end. 

Think of it like how multiple TV and radio stations can transmit at the same time, just on different frequencies. Each tv/radio station is a different internet user and the air is the fiber cable. 

As you go from your house to Google’s data center, the equipment gets more and more expensive and high performance – your router at home will be fairly basic, the wire in your street will be probably quite old and thin, then as it goes to you ISP it goes into bigger cables and through much higher performance switches etc. 

Anonymous 0 Comments

Your word of the day: Load Balancers. 

This is basically a reverse proxy server. 

Your request is made to the Load Balancer, which knows of not one but (X) servers just like the one you want to talk to, all (X) of them are in a farm. Using the guidelines you’ve given it, like lowest memory or network or requests, it picks the least busy copy of the website to direct you to. 

Further help comes from CDN’s, or Content Delivery Networks, like Akamai. These take all the images on a website and distribute them around the world to many data centers. When a given customer loads a web page, they get the HTML and maybe some scripting from the website, but the pictures are much closer to you already. Rather than downloading the whole thing from the owners servers, you download the images them from the nearest CDN server to you.

Anonymous 0 Comments

Requests are sent in chunks called packets. Each one of these packets has some information about who sent it and where it’s going. The server uses this information to keep requests distinct.

The fiber optic networks in use in datacenters are at minimum 100Gbps. That’s 1,000,000,000 bits per second. The maximum size of a packet(TCP but that’s not really important) is 65,535 bytes so one fiber connection can handle 15,259 requests **per second**.

Also, the data isn’t sent in a solid, unbroken stream. All the data can be jumbled together with other network traffic and the network hardware and servers deal with re-combining message streams

Anonymous 0 Comments

Slow it all down to a single message, and an expected response, if we shared a bunch of homing pigeons and I had some of yours and you had some of mine we could exchange messages, the path of birds is irrelevant and where they land is somewhat irrelevant as long as when they land the message is taken off the bird and handed to you.

The sky can hold billions of birds each with their own messages going to their own destinations.

Anonymous 0 Comments

Each server has a maximum number of data connections from various source systems it can handle. Each network connection has a maximum speed of data it can handle. Technology like load balancing and multiple destination IP addresses can redirect all the traffic from a particular source system to any number of destinations so that the network connections and servers are not overloaded. Properly designed cloud systems can track the capacity currently in use and can automatically add additional servers in other locations so that the connections go over other network connections. When those servers are no longer needed the cloud system can move the connections to other servers and shutdown the now unused excess servers.

Anonymous 0 Comments

1 CPU core can do 1 operation at a time. If a server gets a request it handles the request. That request could mean fetching data from a database on another server that takes time to go across the network, leaving that 1 core idle until it gets the data back. That makes it available to handle another request while the first one is suspended and this keeps happening over and over. As a result, a request can take 500ms but only consume 1ms of actual compute time on the core. This is called concurrency and why a server can handle 1000 requests in 1000ms or 1 second.

However, at some point you reach a limit and requests end up queuing up in memory until it eventually crashes the system with 0 CPU and 0 memory available. This is where you can either add cores and make the server bigger, or scale out with multiple identical servers. Having multiple servers means you need another server layer to route traffic to multiple instances called a load balancer. There are many types of load balancer strategies, but the simplest is round robin where each request is routed to the next server in line. Those servers could also have multiple cores, so you could have 2 servers with 4 cores each or 8 servers with 1 core each. Regardless, you will now have 8 cores that can now handle 8000 requests per minute.

There are additional systems that handle the load management with auto scaling where if average CPU use is high, additional cores will be automatically added. This is why retailers don’t get knocked offline on Black Friday and also why running Twitter is so hard–if a post goes viral, all the servers responsible for that viral content have to scale out quickly, while also scaling in properly to not waste resources/money.

Anonymous 0 Comments

Think of a mail room in a building. Each request is a letter coming in.

The postal service has large vehicles that can bring large numbers of letters in at the same time. The letters are all sealed, so even though there is a huge pile of mail, they don’t get mixed up.

There’s a large number of clerks in this mailroom attacking this mountain of mail. Each clerk processes a single letter at a time. One by one, each letter is opened and read. Some letters might get immediate responses (like “sorry, your letter is unintelligible; please send another”), and those are written, sent back out, and forgotten about. But most require the clerks to make inquiries further into the building to other departments. It’s a huge building, so they have no idea what the other departments do; those departments might ask other departments, and sometimes they might need to write letters and send them to other faraway buildings. So it could take a long time for an inquiry to get an answer.

Do clerks sit and wait for the responses to those requests? No. They put the letters aside and start working on the next ones. Each one might have a pile of letters they’ve started work on but don’t yet have the answers for.

Anonymous 0 Comments

information isn’t sent directly from the client to the server. There’s a chain of routers in the middle.

If there’s too many simultaneous requests at the same time, creating a jam in some part of that chain, the receiving router will drop the fragment, not acknowledge it, and it will be resent by the sending router.

Only that fragment will be resent, and only between the pair of routers that drop it.

This happens millions of times per second, it’s normal.

Anonymous 0 Comments

It uses buffers. Buffers everywhere.

You don’t have an unbroken wire connected to a google server. If you have fiber internet service, your ONT (Optical Network Terminal) is assigned timeslots where it is permitted to transmit upstream, and it has to be quiet the rest of the time to avoid collision with the other upstream transmissions from other customers on the network. This strategy is called Time Division Multiple Access (TDMA). All this data is decoded by the ISP equipment where it reaches a router (just a computer, really), that forwards your packets of data to another router and so on until it reaches the google datacenter.

The rates of incoming and outgoing traffic at each router do not necessarily match, especially on short timescales. Your ONT has a data buffer. Every router on the path has a buffer — multiple buffers, even, as the data packets are copied and headers modified internally. In fact, after propagation delay, the necessary delay introduced by the physical carrier over finite distances, queuing delay, the delay incurred by your packets sitting in a buffer not going anywhere, is a major source of latency on the public internet (and in consumer routers). In contrast, the transmit delay of the ONT waiting for a transmission opportunity is a relatively minor source of latency.

A router has finite buffer size. If those buffers are getting too full, a router will simply drop packets. In this way the internet is said to provide “best-effort” service; there is no guarantee of packet delivery. In fact, it is usually desirable for routers to drop packets _before_ it is strictly necessary in order to minimize latency on the link, and internet protocols like TCP are designed with this knowledge in mind: they use packet loss as a signal of congestion on the link. Packet loss is a _feature_ of the internet, not a bug.

At the other end, the server has it’s own socket and application buffers. It will handle requests as fast as it is able, up to some finite limit of outstanding queries, at which point it will also start to refuse requests. Depending on the service, there may be various load-balancers along the way that help it to make these decisions fairly and quickly at a large scale.

But, essentially, you’re right that multiple transmissions at the same time are problematic. There are a lot of ways that communication technologies use to divide access to a shared medium other than TDMA. If you use Wi-Fi on your lan, clients can and do accidentally shout over each other, rendering both transmissions indecipherable. In this case clients are expected to detect the problem themselves, sense when airwaves are quiet, and wait a random amount of time to try again, with the hope that they won’t accidentally transmit at the same time. This strategy is called Carrier Sensing Multiple Access (CSMA).

Later Wi-Fi standards (802.11ax / Wi-Fi 6) also use what they call OFDMA (Orthogonal Frequency Division Multiple Access) where the channel bandwidth (typically 80MHz in total) is divided into ~1000 smaller subcarriers and those are assigned to different clients, which enables multiple clients to speak at once without collisions. Notice that a packet from your phone on your home wifi is transmitted to your wireless access point, which is connected to or built into to your home router, which is connected or built into the ONT and so on. So, one packet might traverse several physical connections that use various multiplexing strategies.