What are distributed computer systems and why is it relevant ?

67 views
0

What are distributed computer systems and why is it relevant ?

In: 1

Is any kind of computing that is distributed over multiple computers. Could be a room full of them working on the same task in parallel, or could be a piece of software you download on your computer that does work and sends it to a shared project with millions of computers around the world also contributing (such as [email protected]).

They’re relevant because there exist classes of problems and research that require so much computing power that splitting it into small tasks that every machine can perform is the best way. Protein folding is such a problem.

When you have a big problem to solve, there’s only so much you can get out of one computer. There are huge diminishing returns on processor power, and even if money was no object, the best processor in the world has limits.
So if you want to scale up your computing power, the best option might just be to get more computers on the job.
Whether you do this by putting a bunch of computers into the same server rack, connecting them over your local network, or have access to internet-connected computers around the world, you’ll need some way of getting all these computers working together on the task at hand.

Distributed computer systems is the study of techniques to do just that.

Using multiple computers on a single task introduces a bunch of challenges that you don’t have when using just one computer. Making sure all the computers have the data they need, spreading out the task so computers aren’t waiting around for others to send them necessary data, minimising duplicate work, minimising the need to send data over slow networks, etc…
In this environment, the way you write software is very different, and it’s a very popular field of study because big businesses today are running into big data problems that are very hard to solve on single machines.

Simply put there is a limit to the resources one computer can have. There are only so many CPU cores, only so much RAM, etc that can possibly fit inside a PC/computer/server no matter how much money you throw at it. And even if you could, these things get expensive *FAST*.

The solution is more computers. However the way you run software on different computers isn’t the same as the way you would run it on the same computer. The super fast data speeds of things like RAM are intended to be used over distances of a few inches at most, and don’t scale out across rooms or buildings. So the idea of keeping all your data in RAM is just gone. The software has to be written with this idea in mind. How do you solve your problems with this new paradigm? How do you split your problem up into pieces such that you can minimize how much these computers need to talk to each other and that you’re not simply wasting time waiting for a response? What do you do if one of these computers crashes?

A lot of scientific research is based on these kinds of things. If you want to simulate weather effects across the whole earth, or nuclear reactors at the atomic level where there are crazy numbers of atoms, and so on, you need to take this leap. Now, how do you write the software to accomplish it in a reasonable way?

Single computers have limits:

– A single computer has limited processing speed.
– A single computer has limits to the speed and capacity of its disk, memory, and network connections.
– A single computer is located in a single physical place in a specific building, city, and country.
– A single computer might break or malfunction.

Many businesses, organizations, etc. use multiple computers because they are frustrated with these limits.

“Distributed systems” is a field of computer science. Distributed systems is the study of the theoretical properties and practical engineering problems you see when you try to use multiple computer systems.

You need to be pretty smart to work in the field. You have to basically plan how isolated computers in different locations each make their own decisions, while possibly being unaware of what other computers are doing.

There’s a famous theoretical limit to distributed systems, called the CAP Theorem. It stands for Consistency, Availability, Partition Tolerance. And the theorem says that a distributed system can only ever have two of these properties.

If you’re a good distributed systems engineer, you can have your pick of some of the best jobs in the computer industry. Every famous high-paying prestigious tech company, and every red-hot blockchain startup, is always on the lookout for distributed systems people.