Eli5:How does computer memory actually work?


So I might be overthinking it, but I understand that machines store information in certain locations on a drive but how exactly does it work? Like how can a usb drive store thousands of songs?

Edit: just to be clear I mean how does it physically work

In: Technology

First: how does it work conceptually?

Data is stored in the form of ones and zeros using binary. Any type of data can be encoded this way, including songs. A USB drive contains space for many, many ones and zeroes and so can store many songs.

So how does it do this? USB drives, also called “flash drives”, store data using something called “flash memory” (newer drives use other forms of solid-state storage, but that’s beside the point). Flash memory is slower than the in-your-computer memory, but it doesn’t reset itself when the power is lost. Think of a flash drive like millions and millions of tiny little bottles, each of which can hold a tiny electrical charge. A full bottle represents a 1, an empty bottle represents a 0.

That’s a fair analogy. In reality, this memory is made up not of bottles, but of an array of tiny electrical components–but the premise is the same. Each of those tiny electrical components can store a small electrical charge to represent a 1, or can be uncharged to represent a 0.

The computer can read these charges in a stream of 1s and 0s to get a sequence of data. That sequence can be taken back up through the layers of the computer to become a picture on the screen, or sound coming from the speakers.

First, there is a [memory hierarchy](https://en.wikipedia.org/wiki/Memory_hierarchy), take a look at the pyramid. You want to get data into the working bits of this electric machine we call a computer. The slowest memory would be you – your brain, your observation, your manual input into the machine by toggling buttons… That’s not represented here, but data does come out of the physical world, and that only happens because of humans in the first place.

So we got to get it to the working parts of the machine. The working parts of the machine are REALLY SUPER FAST, and typically, your CPU is **STARVING** for data. So that’s where the hierarchy comes into play in the first place. We can make memory fast first and foremost by PHYSICALLY locating it as close to the working parts of a CPU. That’s registry memory. These are small storage units of literally just individual bits or maybe “word” sizes, and they are the inputs and outputs of the actual machine. Next up are the CPU caches, which are larger banks of data, but by necessity, they’re further away from the CPU.

Also, you have to consider how memory is physically built. Registers and cache lines are made from flip-flop circuits, from transistors. They’re fast as all hell, but generate a ton of heat and take up a lot of physical space. That means their data storage isn’t very dense for their area, they’re expensive to operate because of power costs, and because they’re so big, some of this technology starts getting so far away from the CPU core that the speed of light starts to have a really significant effect and we’re back to starving the CPU for data because you just can’t move signals across copper fast enough. That’s why there’s an L1, L2, and sometimes an L3 cache on a CPU.

System memory – RAM, is much bigger, much denser, much cheaper, much cooler (though modern RAM runs hot enough to burn you if you touch it), and much, MUCH slower. They instead use capacitors, tiny electronic components that can hold a charge through capacitance, and dissipate it quickly. To read a value you discharge a capacitor (or not) down a signal wire, which will have a feedback loop to recharge that individual cell. This is SO much slower because flip-flops are stable so long as they’re energized – once zero or one, they have a signal wire that is always indicating a zero or a one. But with capacitors, you have to drain the damn things, and that takes time. And then you have to recharge them, which takes time. And then there’s that physical distance, all the way across the motherboard.

System memory is organized in terms of a [geometry](https://en.wikipedia.org/wiki/Memory_geometry). They say computers are byte addressable, and that’s not exactly true. Physical memory is not byte addressable, the smallest unit is based on its geometry. So when you want to read a particular byte from memory, you have to fetch the smallest unit that byte exists in, and then subdivide from there.

So that’s the physical side, then there’s the virtual side. You see, memory addresses are an abstraction, since no physical part of the circuitry actually functions on a per-byte level. And abstraction is good, as we’ll get into.

Enter virtual addressing. Every process running on the machine believes it’s the only process running on the machine, and that the whole entire memory address space is its sole domain. In truth, the fact that multiple processes are all running is hidden by the virtual address subsystem. So two programs can have data at memory address 7, but there’s a level of indirection where that actually points to different physical addresses in system memory. The processes don’t have to know or care. This also lets the computer reorganize data without the process being aware. That’s what swapping or pagefiles are all about – some program isn’t running that much? Move its data out of RAM and into swap space on the disk to make room in RAM for other processes that have higher demand. When that oft used process needs to access it’s data, you can swap it back in literally anywhere in RAM and update the mapping.

Every process has a little table of what memory it has allocated. So when you want to read or write, the address you request is mapped though this table – it’s how virtual addressing works. If the address you want isn’t in that table, that’s an access violation. This is how programs don’t just go off and read data owned by other programs. There are ways to do it – you need to access memory as though it were a hardware device.

And that’s another interesting thing about virtual addressing – it doesn’t just represent RAM, but really almost anything. Any hardware inputs and outputs can be mapped to memory addresses, so that reading and writing to them routes that data to those devices. Even other programs – it’s how one program talks to another, or shares data between them. This mapping doesn’t even have to touch system memory. That’s what DMA is all about – memory addresses that point right to the disk.

Virtual addressing on modern x86_64 processors use 64 bits for the entire address space, but only 2^48 are currently used by most Intel or AMD processors. There are some super computers that can address 2^54. So any process can use at most that many bits for system memory, for hardware or software mapping, whatever. 2^56 is the limit, though, because the upper 8 bits are used as a boolean flag field – true/false values, such as if that memory address is readable, writable, executable, etc. This is why you can’t just write bytes to memory and then execute them – those addresses need to be flagged as executable. Hackers have to play some REALLY CLEVER TRICKS to manage to get their exploit code into executable memory and then called upon.

There are data sets that are much, much, much larger than 2^48. To handle that, there’s memory mapping, and that is simply saying some bytes in my virtual address space access a range in that address space at some offset. When you open a large file in a program, you only need to load a small window of bytes into memory – the bytes you’re actually working on at that moment. This is how it’s done. Some programs don’t do this, as their implementation is naive. “`notepad.exe“`, for example, definitely has file size limits.

Physically speaking, we use [floating gate transistors](https://www.researchgate.net/profile/Nahid_Hossain/publication/280878435/figure/fig1/AS:[email protected]/Schematic-diagram-of-a-floating-gate-transistor.png) to either store electrons in the floating gate (which means a 1) or to not (0).

A transistor is just a digital switch. If you apply a voltage to the control gate, then the switch closes through the creation of a “channel” from the source to the drain. Normally, when you remove that voltage, the channel dissipates, opening the switch, but with a floating gate, you can apply a higher voltage that forces the electrons to effectively teleport (quantum tunnel) into a little space near the control gate, called the floating gate, making it close permanently, even when that voltage you applied to the control gate is removed. You can also apply a negative voltage to the control gate to drain the floating gate and force the electrons out.

By making the switch close permanently, you can try to push current from the source to the drain. If it works, then that means the floating gate is charged (you stored a 1) and if not, then you haven’t. This works even after the machine has lost power. But over time the gate will leak, so it’s not truly permanent storage.

We achieve high data storage by making these transistors *very* small. Current channel widths are around 14 nm, or 0.000000014m