I work with super computers on some projects for work.
In many types of science, engineering, and even movies, there are lots of tasks that require individual processing. Let’s say you are making a movie, and on each frame you are trying to simulate a fire, the scene lasts 30 seconds, and there are 24 frames per second.
For each of the 720 frames, the computer has to do the math to make the flame graphics look realistic, and then overlay the fire on the original picture. The more realistic the flames, the longer it will take to calculate for each frame. You could do this on your computer at home, and process each frame in sequence. This would take a very long time, especially because fire simulations require a lot of math and physics to make look good.
The main feature of a super computer is that it has a lot of processors. You can think of a processor as a person who is doing a job. Your home computer could have anywhere from 1-16 workers, depending on what kind of computer you have. Let’s say your fire special effects take 1 minute to calculate per “job”. If you do them all in series, it would take 720 minutes. If you have a better home computer, you could (optimally) assign each of the 16 workers in your computer a frame, and they would each put the fire simulation on your frame simultaneously. With 16 workers, your scene would (optimally) take 45 minutes to finish.
A super computer has thousands upon thousands of workers. If you had access to a personal supercomputer for your fire scene, you could do your fire simulation simultaneously on each frame individually. This means your scene would (optimally) be ready to watch in 1 minute.
In real life, super computers are used for movies, physics, science, engineering, and website applications. If you have some kind of job that needs done, that doesn’t depend on other jobs being finished, you can give each job to a worker, and each worker finishes simultaneously.
Imagine back in the old days when they would mine gold by hand. If you are running a mining company and only hire one miner, you can only mine as much gold as that one person can mine. However, if you hire 100 gold miners, you can mine gold ~100x faster. A super computer simply takes large, complex tasks, like modeling the air flow on airplanes, or how water moves in the ocean, and divides the tasks up into smaller jobs, and gives those jobs to the thousands of workers so that your job can finish faster.
Hi Everyone,
Please read [**rule 3**](https://www.reddit.com/r/explainlikeimfive/about/rules) (and the rest really) before participating. This is a pretty strict sub, and we know that. Rule 3 covers 4 main things that are really relevant here:
**No Joke Answers**
**No Anecdotes**
**No Off Topic comments**
**No Links Without a Written Explanation**
This only applies at **top level**, your top level comment needs to be a direct explanation to the question in the title, child comments (comments that are replies to comments) are fair game so long as you don’t break Rule 1 (Be Nice).
I do hope you guys enjoy the sub and the post otherwise!
If you have questions you can let us know here or in modmail. If you have suggestions for the sub we also have r/IdeasForELI5 as basically our suggestions box.
Happy commenting!
I like the top comment but I thought maybe I’d try another, quite relatable take:
Think of something like a county’s election. The most recent US one is still fresh for most people. You had around 155 million votes in that election. Those all need to be counted.
If you have one guy name Steve who has to count them all one at a time, it’s going to take a looooong time. So you hire 2 people. Or 3 people. Or 50 people. Or a million people.
Then you spread them around so that the votes are counted closer to their origin. Then you upgrade to machine counting, etc.
In the end you have one overall “tally” with *lots* of different pieces involved in the counting and done *much* faster than if it was just that one guy counting.
A super computer is similar in that it takes one *very* large and *very* complex task and completes it much faster than a single, simpler computer could. Your home PC is Steve. The super computer is the entire system of people counting.
In the end Steve could maybe count a few thousand votes on election night. But the entire system working together can give a pretty damn good picture of who won an election *on election night*. Much faster.
I work on a supercomputer on a daily basis. I’m going to talk about that supercomputer, Cedar, to help you understand.
A “node” is equivalent to a single computer, but they’re higher-end then a normal computer. They’ll have two processors, blazing fast memory, and many forms of storage that are all faster than what you probably have in your computer. Still, nothing mind-blowing, but hang on…
Cedar has two head nodes and one scheduler node, along with 2470 actual nodes for actual computing. When I log into Cedar, everything I do is done on a head node. But when I want to run a program, I enter in the following command:
sbatch submit.sh
which tells the scheduler to add my task to the queue. When it’s my turn, the scheduler will do whatever’s in submit.sh. An example of what I might put in that file would be:
#!/bin/bash
. ./env_vars.sh
mpirun -np 40 ./wrf.exe
which would tell the supercomputer to launch wrf.exe (a weather forecasting program) with 40 processes/threads (which would all be on one node). WRF would, in turn, divvy up my weather forecast for, say, an entire state, into a 5*8 grid and assign one grid for each CPU thread.
So far, so good. Nothing I can’t do at home. But wait — what if I need to forecast the entire globe? Or run another program that requires resources far beyond what one node has? That’s where supercomputers really shine.
What makes Cedar a supercomputer, in my opinion, is the connection between the nodes. If you had a thousand computers at home, that wouldn’t be a supercomputer because they wouldn’t be linked efficiently. You couldn’t run a program that uses all of the resources in any efficient way, because the only way the program could communicate between the computers would be through 500500 ethernet cables between them, or an overloaded router, which is going to slow your program the fuck down.
Because, you see, the weather in one grid square affects the weather in the adjacent grid square, so WRF needs to pass massive amounts of information between the computers. Your dinky little setup won’t be able to handle it. Cedar, on the other hand, has an amazing interconnect that can communicate between nodes with barely any loss in speed, with 100 Gb/s in bandwidth. (Intel OmniPath, if you’re curious and want to google it.)
The upshot is that I can submit a program using thousands of nodes (over 100,000 threads/processes), using hundreds of terabytes of memory. So in my submit.sh, I could ask wrf to divvy up my forecast into 10,000 pieces, say, to forecast the entire globe:
mpirun -np 10000 ./wrf.exe
To answer your question, then, in my opinion, scalability is what makes a supercomputer: being able to run the truly massive tasks requiring many nodes.
Thus far no ELI5….
Humans did math with pen and paper or whatever until calculators were created. Which does basic math for us quickly.
A computer is basically a calculator created to do MORE math faster. A regular calculator can do 1 “math” at a time.
A 5ghz CPU in a desktop computer can do ~5 billion of “Maths” per second.
So a super computer is basically a bunch of regular computers working together.
So if a super computer has the equivalent 1,000 of Desktop computers working together it 1,000*5billion = 5 trillion “Maths” per second.
Being able to do the same math as 5 trillion calculators in 1 second allow us to analyze incredibly large amounts of data faster. Data analysis is the core of prediction of things. Such as weather.
TLDR: A bunch of computers connected to each other can do Math faster than 1 computer alone.
A super computer is just a big powerful computer.
In the 60-70’s super computers were single powerful computers like the famous CRAY-2. The appearance of the CRAY-2 is quite distinctive and is still used as the symbol to represent super computers to this day. This was the era of mainframe computers where a single computer was shared between everyone in a building using terminals. The idea of individual PCs on everyone’s desk hadn’t been figured out yet.
While today what we call a super computer is in fact a cluster of 100’s of regular servers working together as a single machine. If you walked into a server room with a super computer in it you probably wouldn’t recognize it as such, it just looks like any other server room full of identical rack mounted servers.
Such a computer uses distributed processing, meaning that a task is broken up into smaller pieces by a command and control server and distributed to individual ‘nodes’. This way a super computer can spread out the workload of a large mathematical problem across the nodes and perform the work much much faster than on an individual machine.
Super computers are used to process very large mathematical models like calculating the trajectory of millions of theoretical stars around a galaxy of the period of millions of years.
Running such models while tweaking the variables gives us ~~incite~~ insight into the nature of things like gravity.
At a fundamental level though the individual nodes are just powerful servers with the same types of CPUs and memory as your PC. So they could theoretically do anything your PC at home does, just a lot faster.
Super computers are purpose built to perform computing tasks at a higher level of performance than a commodity, retail, or workstation computer. Your phone today is several orders of magnitude more powerful than the first super computers, such as the famous Cray-1, but that doesn’t, in some way, demote a super computer.
Typically, super computers use processors that are designed for math operations, very much like a video card’s GPU. There is an encoding into binary bits of “real” numbers called a “floating point”, and these computers perform many FLoating point arithmetic calculations. The number of **FL**oating point **OP**erations a computer can perform per second is called a **FLOP**.
There are, or were, general purpose super computers that weren’t especially focused on floating point operations, and they measured the number of CPU instructions they could perform per second, not just floating point instructions, and they measure their performance in **M**illions of **I**nstructions **P**er **S**econd, or MIPS.
What they can do is kind of hard to say beyond the above, without also talking about what they’re used for. Today, most super computers are used for physics simulations. These simulations are most often for weather, but also for nuclear physics, astrophysics, material sciences, chemistry, and engineering.
There is a long history of super computing. The earliest machines were analogous to superscalar single core machines, compared to their contemporaries. These early super computers pushed the computational boundaries of their days regarding materials, signaling, and propagation among the circuitry of the day. Cray-1 is regarded as one of if not THE first super computer, and while integrated circuits were invented by this time, they were so new, Seymour Cray basically didn’t trust them? He built his first machine using discreet components.
Eventually, miniaturization smacked face first into the physical limits of quantum physics, so smaller and faster was no longer a way to chase down performance. We call this vertical scaling. So the machines stopped scaling vertically, and instead went horizontally. More cores, more nodes.
If you saw a super computer today, they would be racks and racks of nodes in a data center. These are not individual computers in a closed network! These nodes cannot “boot” and operate on their own, they don’t each run their own operating system, they don’t each have their own video port, or their own disk drives. These nodes are little more than CPUs, and do not operate in a standalone fashion. The network that combines the nodes aren’t your typical Ethernet network, either. The sum of these nodes and the network that interconnects them constitute a single computer with hundreds of thousands of cores. Your computer has many compute cores within it – if you just look at task manager. These computers have many cores within each CPU, and many CPUs, not just within a single node but spread out among hundreds and thousands of nodes.
Then there are compute clusters. These are a distinct class of computing. They’re not super computers, though they will get called that often enough. These are indeed individual computers, each one can run standalone, each one made from off the shelf parts. They’re just arranged with a private network and special software to sort of treat each node in the compute cluster as a worker node. Work is farmed out in batches to nodes that need work. You can assemble one of these in your own home, and they’re popular among businesses and universities on a budget. A classic cluster computer setup would be a Beowulf cluster, though not relevant in the modern era.
There are grid computers. This is similar to, but distinct from a cluster in that each unit in the grid can do different work, whereas a cluster is focused uniformly on a singular goal.
Then there is distributed computing. This is where computers all across the internet can download a client that helps manage the work to be done. SETI@Home and Folding@Home were and are popular, where that since your computer spends most of its time idling, it uses that idle time to perform work.
And then there are mainframe computers. These are not super computers, and the term does not mean some really old computer of a bygone era. In fact, the business behind mainframe computers are alive and very well, won’t go away any time soon, and is even expanding. Mainframe computers do things that none of these other computers can do, and that is they perform transactional computation – an all or nothing sort of thing; either the computation is complete, verified, and committed, or not at all. This is *exactly* the sort of computation you want when handling financial transactions, like when you pay with a credit card or transfer money. The other aspect of a mainframe that unique is the high throughput, which is a very specific metric. Just one of these mainframe computers can perform billions and billions of transactional computations in a single day, non-stop, uninterrupted, for years or decades at a time. I once got a tour of one of these machines, which takes up several racks in a data warehouse, and the largest ones made today have a couple thousand processors and modules just dedicated to moving and managing data to keep the CPU fully saturated with work. Nearly the entire financial system worldwide is built upon these things, as are many government systems. When COVID hit and the US unemployment office got completely slammed, remember when New York Governor Cuomo blamed the decades old mainframes? Nope, those babies were humming along just fine, underwhelmed. It was the more conventional web frontends that absolutely crumbled under the load. But mainframes are not super computers, they are only good for the type of work that they do and are thus very special purpose.
There is cloud computing, which is what my company does, and that is basically paying someone else to manage the hardware, and the client rents use of computer resources. Lots of companies choose cloud platform providers because owning and maintaining hardware can be expensive. Some providers have performance oriented offerings.
Finally, there’s High Performance Computing. This is like a mini-super computer. You see the Bitcoin miners do stuff like this all the time, where they’ll either stuff a computer with as many video cards as possible for their math processors, or they’ll use custom FPGAs or ASICs (basically, purpose built custom processors).
Supercomputers are special-purpose computers that link together higher-end components from typical computers using specialized hardware to achieve state-of-the-art processing power.
Many supercomputers use hundreds of thousands of individual processors working together to coordinate and solve some problem. But that alone does not make a supercomputer–for example, Google’s MapReduce uses large amounts of commodity-class machines to perform distributed and parallel computation. Supercomputers also generally have extremely fast memory, so the thousands of processors can work together on the same problem while sharing (mostly) the same memory. This is facilitated by extremely fast networking hardware called the “interlink:” high-end supercomputers make it so that talking to another machine is almost as fast (or at least much closer to the speed) of local RAM.
In practice, different supercomputers are organized in different ways to optimize for different kinds of workloads they handle. You can read more on wikipedia: [https://en.wikipedia.org/wiki/Supercomputer_architecture](https://en.wikipedia.org/wiki/Supercomputer_architecture)
Latest Answers