eli5: what does (de-)fragmentation even mean?

367 views

After reading another post about why defragmentation isn’t as necessary with modern devices, i started wondering what exactly fragmentation even is. How and why does it happen and doesn’t it screw up your data?

In: 0

11 Answers

Anonymous 0 Comments

My dad always told me it was like tidying your room, and organising everything alphabetically, but in your disc c 😂

Anonymous 0 Comments

Old mechanical hard drives with platters and stylus reader would take longer to travel over the physical magnetic disk in multiple places to pull all the bits of data associated with a program. Defrag consolidates all the data together into one physical chunk on the disk, so the stylus and disk doesn’t have to move around as much or travel as far to access all the data. Modern solid state drives have no moving parts but contain a controller with address bits to each piece of data, so there is no speed decrease by accessing data from multiple places on the drive. The controller directs the drive to each bit of data, and because electrons move at the speed of light, more or less, it doesn’t matter if the data is contained within address bit #1 or #736347, etc… the speed is the same. Defrag is not necessary for SSD.

Anonymous 0 Comments

Let’s say you have a hard disk with 10 files on it, and file #4 is 8 blocks long. If you delete file #4, there is an 8 block “hole”. If you add a new 12 block file, there might be space at the end of the 10 files. At some point, however, there won’t be enough unused space at the end to fit the next file. Eventually, 8 blocks of some file will pe put where file #4 used to be and the rest of that file will be someplace else. That file will be in two fragments, one where file #4 used to be and another one someplace else. If you do this enough and the disk as little free space, eventually you’ll only have a bunch of 1 block holes. That means a new 15 block file will be split into 15 fragments, and that will take 15 times as long to read as if the file was all together. SSDs have no access time, eliminating this “longer time to read” effect.

Defragmentation is the process of copying all the files into a pattern where every file only needs one fragment. This involves a lot of copying and work, particularly if your disk has little free space.

Today, hard drives are big and running them almost completely full is uncommon, so the problem doesn’t occur much. The newest operating systems work to minimize fragmentation, and SSD drives are worn down by frequent defragmentation, so it’s not a common thing to do manually anymore.

Anonymous 0 Comments

Imagine you have a book (Your hard drive) it has a story but the data is spread out. Some words on page 1, a few more on page 2, a bunch on page 3 and so on. This is done by the computer to fill up any available space. Defrag essentially takes all the words and moves them to page 1…this makes it faster to read and therefore more efficient.

Anonymous 0 Comments

Let’s say you have a library. And you are in the mood to read a book. So you just grab the book and read it, front to back. EZ.

But, let’s say, when you’re trying to put the book back, for whatever reason, there isn’t enough room for it to fit on the shelf. So, instead, you rip the book in half, put half of it on the shelf where it used to be, and the other half in the next available free spot. You also include a note with the first half explaining where to find the second.

The next time you’re in the mood to read that book, you’ll grab the first half, read it, but then you’ll have to stop and find the second half to continue reading.

This is fragmentation. And, if done a lot, can really slow down the loading and reading of files.

*De*fragmentation is the process of sitting down, taking all the books off the shelves, putting them back together, and putting them back on the shelves, whole.

It’s not really a thing anymore because the ways in which we store and organize files are better, the ways in which we load and read files is better, and the configuration and speed of newer hardware (e.g. SSDs) makes fragmentation largely irrelevant.

Anonymous 0 Comments

When we remove files from the disk, we leave holes where those files were. When the next file comes in, and it’s larger than that hole, we fill in as much of it as we can into that hole, then leave note at the end saying “rest of it is over in X”, where we put rest of the file. For larger files (or smaller holes), we can end up splitting the file into several different pieces (fragments – fragmentation). Since going between disk locations is costly (less so now than it was before), every time we read one of those split files, it takes us a longer time than if they were all neatly put together.

As a side-note, those little notes we leave for “rest of it is over in X” still take space, and given enough splitting, might actually have a noticeable effect on how much of your disk you can use.

De-fragmantation is re-organizing the disk so files are made continuous again.

Anonymous 0 Comments

Data is arranged on a hard disk in concentric circles. To access another one of these circles, the disk drive has to move the magnetic head. This takes some length of time, which adds up if many such jumps are required.

Initially all files are written to a disk in a sequential fashion. One after another. Some files later get deleted and the space they took up is marked as free. When a new file is written, it may need to occupy several of these smaller gaps and now consists of disjoint segments. They are being kept track of in a table and can be reassembled, but this takes time.

Anonymous 0 Comments

So imagine a traditional hard drive – you have a spinning platter and a head that moves around on it.

Here’s a simplified example.

When you write a file called “myfile” to disk, you create an entry in a directory table saying “myfile starts on sector 17.” Now maybe your file is 6 sectors long, but sector 21 is used by something else. So you have sectors 17,18,19, and in sector 20 you say “the rest of this file starts at sector 44.”

So your drive head moves over to sector 44, and reads sector 44, 45, and 46 to get the complete file.

Your file is in two fragments.

Now if you have a lot of big files that are constantly being written and re-written to disk, and especially if the disk is mostly full, the odds are very good that your file will get split up into a LOT of fragments, and every time you have to jump to a new fragment, you are probably going to have to move the head somewhere and then wait for the right portion of the disk to spin under it. This can definitely increase the time to read a file.

Defragmenting does its best to undo this process – it moves data blocks around on the disk to consolidate some free space, and then moves a fragmented file into that free space. Then repeats this many many times.

So, **why isn’t it as important any more?**

Firstly because we have solid-state drives (SSD), which don’t have physical moving parts to shuffle around. Reading data blocks is roughly a constant rate, no matter where they reside on the “disk,” and how fragmented they are. They’re also ridiculously fast compared to spinning drives.

Secondly, because modern filesystems are much better at addressing the problem of fragmentation. For example, they don’t start writing a file at the first spot available on a disk, or even at the first spot big enough for the entire file. Instead, they look for the smallest space that will hold the entire file; and in some cases, will deliberately fragment a file to avoid leaving tiny gaps of free space scattered across the disk. They also may do things like calculating the shortest ‘travel time’ from the end of one fragment to the start of the next.

Finally, spinning disks now have a big chunk of cache on them which will pre-load more data; and operating systems cache read data in memory as well. Less and less of the time we spend waiting for our computer is tied to the hard drive.

Anonymous 0 Comments

So, back in the days of spinning hard drives, data was written in a big spiral on the surface of platters. There was an arm that had to physically move across the surface of the drive, to wherever the data you wanted was stored. It could keep track of where everything was written, and what parts of the drive were free to write on. But, waiting on that arm to move was the slowest part of reading data.

As you added data to the drive, it would more or less just add it to the end of that spiral of data. But along the way, you probably would have deleted some files, and eventually, its going to run out of fresh disk to write on, so it starts writing into the gaps formed by deleted files. But, those gaps might be smaller than the file being written, so it would break up new files into pieces that would fit in the open gaps. It would get the file stored, but that means that now, partway through reading the file, the arm has to move to a different spot on the drive, often many times during the file read, which slows read time considerably. When files are split up like that, they’re said to be “fragmented,” since the file has been broken up into multiple fragments across the drive.

Defragmentation is the process of re-organizing the empty space and files on a drive, so that each file can occupy one continuous block of the drive.

The reason its not necessary anymore in the days of soldi state storage are twofold: one, because hard drives had more or less infinite re-writes (they fail because of the moving parts inside. Some of those parts move over 100MPH, so eventually something is going to crash, get stuck, eventually, and it’s just toast, but the magnets don’t really lose their energy. However, solid state drives can only re-write any given bit a finite number of times (it can be tens of thousands to millions of times, but there are a lot of temporary files that come and go and generate a lot of write cycles). And you don’t want to burn up those write cycles just re-arranging which bits hold which data. Especially when you consider the second reason: there are no moving parts, no swing arm that has to come to the data. Every bit of a solid state drive can be read at more or less the same speed, whether its located right next to the previous bit or not. So, it can just find whatever space is free (and it keeps track of those write cycles, so it’ll give preference to using a spot on the drive that’s been used less, to try to wear it more evenly)

Anonymous 0 Comments

Imagine you have tools and a wall of tool chests. By default, your automated tool sorting system is going to place a tool in the first drawer that has enough empty space for it. However, this often means that tools that go together, like drills and drill bits, are stored in two separate, far-apart places because a drawer will have space for the tiny bits but needs to go further to find space for the big drill. So when you need to use the drill, you waste time going to two separate toolchests for the drill and the bits when it would be more efficient to store them in the same place since they’re always going to be used together.

Defragmentation goes through and sees which tools go together and manually rearranges the tool boxes to put them in the same place so that they’re faster to fetch.