eli5: what does (de-)fragmentation even mean?

441 views

After reading another post about why defragmentation isn’t as necessary with modern devices, i started wondering what exactly fragmentation even is. How and why does it happen and doesn’t it screw up your data?

In: 0

11 Answers

Anonymous 0 Comments

Computer scientists have to get very clever sometimes. I bet there have been hundreds of thousands of instances of fragmentation dealing a killing blow to many machines before there was a good solution to it. First, what it is and how it happens. Let’s say you’ve got a woodworking shop. Your working on a piece of furniture. Every night, you go downstairs and take the piece out of its drawer, work on it for a few hours, then put it back. Maybe you’ve got a couple side projects you work on to take a break every once in a while. They’ve all got their spots in the project cabinet. One day, you assemble two parts of the chair or whatever, but the problem is, it no longer fits in the spot you had set aside for it. With woodworking, your only recourse is to find a completely new location for the project that will accommodate it’s size. Maybe a smaller project will come along and fit the empty space, otherwise, you’re just making inefficient use of your storage.

Computers have a fancy trick up their sleeve. They keep a lookup table that is always a constant size and always sits in the same place. The lookup table is a list of the location and sizes of each of the pieces corresponding to your project. Moreover, they can be assembled on the fly when you need to work on it, then disassembled and placed in their own drawers when you’re done. If a piece gets too big for the location allocated to it and would spillover into other files, the program splits it apart and finds a new location for the new piece, and adds a new entry in the lookup table so it can find it again later.

I’m going to switch metaphors for defragmentation. It also allows me to explain why fragmentation is not an issue on solid state devices. The kind that are in most modern computers now. Let’s say your favorite movie theater is Marcus, your favorite local sports team plays at Allianz field, and there’s a nice little dive bar 8 miles away that you like to go to and listen to the live musicians they host. Meanwhile, Bob Floozy (some guy in town) prefers spending most of his free time either at AMC theater , Target Field, and a small concert hall.

You both spend a fair bit of time on the road driving to your favorite places of leisure. And this isn’t a perfect analogy because many people go to these places. For the sake of this example, pretend you 2 are the only people that make use of the establishments listed. It would be so much more efficient if we moved your favorite establishments into your neighborhood and all of Bob’s favorite places to his neighborhood. You can get to what you love in less time and spend less time waiting. This is defragmenting. Pieces of files are moved around so they’re all in one place. This is important because on a spinning disk hard drive, data’s location is relevant. If one piece of a file is stored on the left side of the disk, and another piece is stored on the right side, it takes time to spin the disk and move the read/write head to the correct location. Moving files together means less physical movement, less time wasted in travel, and quicker computing.

Solid state devices are not immune to fragmentation. In fact, there is absolutely no difference in how files need to be stored when you’re finished working on them. The difference is in retrieving files. You ever wonder how a USB stores billions of bytes of information accessible on only 4 data pins (2 of which are for power) and no moving parts? There is a unique path to every single bit. All that needs to be done is switch on and off the appropriate transistors to connect that bit to the output at the right time (done with a small processor onboard). This is very fast, but tends to take up more space and generates more heat. SSDs are smaller than hard drives, but not the data portion, just the moving mechanism and casing. This is where my analogy comes back into play. Instead of driving to the theater, stadium, or bar, you can turn on the TV and flip to a movie, sports, or music channel at will without moving. It doesn’t matter that the venues are scattered all around town, at the click of a button, a camera in a stadium sends its data through some antennae and eventually to your screen. No need to defragment because the distance between you and the source of entertainment doesn’t matter.