I’m assuming you mean how does *defragmentation* optimise your disk, but let’s start with fragmentation, and why that’s a problem:
Imagine a librarian, with a huge library. 100 rows of floor to ceiling bookshelves on ten floors.
This is a special library. To maximise space, it doesn’t have any book covers or dividers between books. It keeps the pages in big stacks on each shelf. Every shelf can hold 5 stacks of *exactly* 1000 pages each. Each page has a standard font and text size and can hold 1,000 words.
To ensure the librarian can add books to the library as quickly as possible, he gives each page a sequential number in the order he receives it, and then adds it to the pile on the shelf. He never skips any numbers, and never starts a new stack or shelf until the old one is completely full.
As there are no book covers, the librarian builds a super cool index to keep track of where everything is. ‘The Shining’, for example, might be on Floor 1, Shelf 14, Stack 3, Pages 409-733.
Sometimes, a publisher asks for a book to be removed from the library. Going and taking the pages off the shelves would mess up the librarian’s numbering system and make the stacks different heights, so he doesn’t do that. Instead, he just deletes it from his index.
This works well enough, for the most part. When the readers come, the librarian just consults his index and goes and fetches them the pages they need. Equally, since you need the index to find anything in this hellscape of stacked, unlabelled pages, something that has been deleted from the index is as good as gone. (Note though that someone can still search through the whole library manually page by page to look for it if theyre determined).
It’s worth noting that storing and numbering each page instead of each word is slightly less efficient, as not all pages are 100% full. Sometimes, a 10,000 word book might take up 12 pages instead of the theoretical minimum of 10.
This has some advantages, however. Firstly, it increases speed: finding a whole page is easier than finding particular words. Secondly, it gives the librarian a little wiggle room to make small changes. if the publisher notices a typo on page 7, then they can reprint just that one page, and the librarian can switch out the old one and the new one. No big deal.
But what about a bigger update? One day, the publisher decides to add a new foreword to The Shining to celebrate its 45th anniversary. They send over an extra 3 pages and say ‘can you just add this in at the front, please?’
Unfortunately, this is where ‘just adding it in’ isn’t that simple. The librarian has has added a LOT of new books since The Shining. The original shelf has been filled for years. There’s no room to add a foreword!
So the librarian come up with a workaround: he adds the foreword to the end of his latest stack as if it were a new book that just came in. He they updates the index to keep track:
The Shining: Floor 1, shelf 14, stack 3, pages 409-733 with foreword at Floor 8, shelf 2, stack 29, pages 4-6.
*This is fragmentation*. Even though the librarian knows exactly where the book is, fetching it is now a pain. He has to go first to floor 1 to get most of it, then go to floor 8 for the rest. This slows him down and the customer gets annoyed.
In some circumstances, this gets even worse. If the library is very popular, it may be receiving multiple books at once. Usually, each publisher sends him a page as soon as it’s printed. If multiple books are being sent at once, the librarian might end up receiving and filing 3 pages of The Stand, then 8 pages of The Green Mile, then another page of The Stand, then 10 pages of Pet Sematary, on and on until all three books have been received. The librarian duly numbers and files each page as it comes, but now theres three books all interspersed with each other in this stack! Chaos!
There are a couple of things that the publishers and librarians do to try and help this, but none of them are perfect:
Firstly, the librarian realised that getting lots of books at once is quite common. Instead of numbering each page immediately as it comes, he now allows himself to make a few informal stacks on his desk, so he can group pages of the same book together before numbering and filing them. There’s not much space though, and a limit to how many stacks he can have. This means he’s often still forced to number them and put them on the shelf before they’re fully done.
Secondly, *some* kind publishers will tell the him how many pages they will send in total, so he can use blank pages as placeholders and put the ‘whole book’ on the shelf at once. When the ‘real’ page comes in, he just switches it for the blank one. This helps when recieving multiple books at once, but isn’t much help if the books get changed later (or if the publsiher doesn’t bother).
The biggest tool that the librarian has, however, is *defragmentation*. This is what he does on his downtime when people aren’t sending him stuff to file or nagging him to go fetch pages for them:
He looks through his chaotic index where one book is spread over eight shelves and three floors and tries to make it nice and simple again. He reorganises and renumbers the three books that were all intermingled with each other from that day people just wouldn’t stop sending him shit. He notices that there’s no longer anything listed in his index for Floor 1, shelf 14, stack 3, pages 400-409: the space immediately before “The Shining”. Clearly *something* was once there, but it can’t be that important if it’s not in his index. So he goes and fetches The Shining’s foreword from floor 8 and replaces pp406-409 with it. (Note that pages 400-405 of whatever was there before are still available for anyone who looks through manually). More intelligent librarians may also keep a list of books that are often checked out together, and rearrange the library so they’re stored together and can be fetched in one trip.
The possibilities for optimisation are endless and depend on the librarian and how the librarian prefers to go fetch boks. But essentially it’s all the same goal: take the system built by someone trying to *add* things to the library as fast as possible, and turn it into one more useful for someone trying to *find* things in the library as fast as possible.
I’m assuming you mean how does *defragmentation* optimise your disk, but let’s start with fragmentation, and why that’s a problem:
Imagine a librarian, with a huge library. 100 rows of floor to ceiling bookshelves on ten floors.
This is a special library. To maximise space, it doesn’t have any book covers or dividers between books. It keeps the pages in big stacks on each shelf. Every shelf can hold 5 stacks of *exactly* 1000 pages each. Each page has a standard font and text size and can hold 1,000 words.
To ensure the librarian can add books to the library as quickly as possible, he gives each page a sequential number in the order he receives it, and then adds it to the pile on the shelf. He never skips any numbers, and never starts a new stack or shelf until the old one is completely full.
As there are no book covers, the librarian builds a super cool index to keep track of where everything is. ‘The Shining’, for example, might be on Floor 1, Shelf 14, Stack 3, Pages 409-733.
Sometimes, a publisher asks for a book to be removed from the library. Going and taking the pages off the shelves would mess up the librarian’s numbering system and make the stacks different heights, so he doesn’t do that. Instead, he just deletes it from his index.
This works well enough, for the most part. When the readers come, the librarian just consults his index and goes and fetches them the pages they need. Equally, since you need the index to find anything in this hellscape of stacked, unlabelled pages, something that has been deleted from the index is as good as gone. (Note though that someone can still search through the whole library manually page by page to look for it if theyre determined).
It’s worth noting that storing and numbering each page instead of each word is slightly less efficient, as not all pages are 100% full. Sometimes, a 10,000 word book might take up 12 pages instead of the theoretical minimum of 10.
This has some advantages, however. Firstly, it increases speed: finding a whole page is easier than finding particular words. Secondly, it gives the librarian a little wiggle room to make small changes. if the publisher notices a typo on page 7, then they can reprint just that one page, and the librarian can switch out the old one and the new one. No big deal.
But what about a bigger update? One day, the publisher decides to add a new foreword to The Shining to celebrate its 45th anniversary. They send over an extra 3 pages and say ‘can you just add this in at the front, please?’
Unfortunately, this is where ‘just adding it in’ isn’t that simple. The librarian has has added a LOT of new books since The Shining. The original shelf has been filled for years. There’s no room to add a foreword!
So the librarian come up with a workaround: he adds the foreword to the end of his latest stack as if it were a new book that just came in. He they updates the index to keep track:
The Shining: Floor 1, shelf 14, stack 3, pages 409-733 with foreword at Floor 8, shelf 2, stack 29, pages 4-6.
*This is fragmentation*. Even though the librarian knows exactly where the book is, fetching it is now a pain. He has to go first to floor 1 to get most of it, then go to floor 8 for the rest. This slows him down and the customer gets annoyed.
In some circumstances, this gets even worse. If the library is very popular, it may be receiving multiple books at once. Usually, each publisher sends him a page as soon as it’s printed. If multiple books are being sent at once, the librarian might end up receiving and filing 3 pages of The Stand, then 8 pages of The Green Mile, then another page of The Stand, then 10 pages of Pet Sematary, on and on until all three books have been received. The librarian duly numbers and files each page as it comes, but now theres three books all interspersed with each other in this stack! Chaos!
There are a couple of things that the publishers and librarians do to try and help this, but none of them are perfect:
Firstly, the librarian realised that getting lots of books at once is quite common. Instead of numbering each page immediately as it comes, he now allows himself to make a few informal stacks on his desk, so he can group pages of the same book together before numbering and filing them. There’s not much space though, and a limit to how many stacks he can have. This means he’s often still forced to number them and put them on the shelf before they’re fully done.
Secondly, *some* kind publishers will tell the him how many pages they will send in total, so he can use blank pages as placeholders and put the ‘whole book’ on the shelf at once. When the ‘real’ page comes in, he just switches it for the blank one. This helps when recieving multiple books at once, but isn’t much help if the books get changed later (or if the publsiher doesn’t bother).
The biggest tool that the librarian has, however, is *defragmentation*. This is what he does on his downtime when people aren’t sending him stuff to file or nagging him to go fetch pages for them:
He looks through his chaotic index where one book is spread over eight shelves and three floors and tries to make it nice and simple again. He reorganises and renumbers the three books that were all intermingled with each other from that day people just wouldn’t stop sending him shit. He notices that there’s no longer anything listed in his index for Floor 1, shelf 14, stack 3, pages 400-409: the space immediately before “The Shining”. Clearly *something* was once there, but it can’t be that important if it’s not in his index. So he goes and fetches The Shining’s foreword from floor 8 and replaces pp406-409 with it. (Note that pages 400-405 of whatever was there before are still available for anyone who looks through manually). More intelligent librarians may also keep a list of books that are often checked out together, and rearrange the library so they’re stored together and can be fetched in one trip.
The possibilities for optimisation are endless and depend on the librarian and how the librarian prefers to go fetch boks. But essentially it’s all the same goal: take the system built by someone trying to *add* things to the library as fast as possible, and turn it into one more useful for someone trying to *find* things in the library as fast as possible.
So, there are a lot of answers here explaining the basics of data storage, but not really showing it, so here it is (extremely simplified) in the form of tables:
|A|A|A|A|A|A|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Imagine this as one continual line where each letter signifies a block of data for a file and a dash signifies empty or unassigned space. When the writer (or reader) head of the hard drives passes above each segment, it reads that piece of data but it must ALWAYS go left or right, it can never skip and it can never ‘just go there’.
If we delete the file A (again, this is extremely simplified) we get this as our line:
|-|-|-|-|-|-|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Now those blocks for data are freed, but physically the read head still has to move past them to get to the other files. Now, this isn’t a problem here because thankfully the other files are stored contiguously anyway, but what happens if we delete and add new files over a few iterations?
|E|E|E|-|-|-|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Adding a new file, E the hard drive places the data in the available unassigned space, again thankfully there is no need to fragment here with these very simple files, and the same happens when we add F
|E|E|E|F|F|-|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Seemingly fine. But what happens if we delete E and add file H which is five blocks in size?
|H|H|H|F|F|H|B|B|B|B|C|C|D|D|H|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
It’s starting to get messy.
So as you can see, after adding and deleting a few files we have gotten to the stage that to read file H, the read arm will have to pass over multiple pieces of data needlessly adding to the read time. Multiply this out to massive file sizes on many hundreds of thousands of these ‘blocks’ and you can see why it’d become a problem.
Defragmentation aims to resolve this simply by rearranging these blocks of data so that they are closer together allowing the hard drive to read the necessary pieces of data faster without having to pass over parts that it doesn’t need to.
So hopefully we would end up with something more like this in the end making it easier and faster for the hard drive to read file H:
|H|H|H|H|H|F|F|B|B|B|B|C|C|D|D|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
So, there are a lot of answers here explaining the basics of data storage, but not really showing it, so here it is (extremely simplified) in the form of tables:
|A|A|A|A|A|A|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Imagine this as one continual line where each letter signifies a block of data for a file and a dash signifies empty or unassigned space. When the writer (or reader) head of the hard drives passes above each segment, it reads that piece of data but it must ALWAYS go left or right, it can never skip and it can never ‘just go there’.
If we delete the file A (again, this is extremely simplified) we get this as our line:
|-|-|-|-|-|-|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Now those blocks for data are freed, but physically the read head still has to move past them to get to the other files. Now, this isn’t a problem here because thankfully the other files are stored contiguously anyway, but what happens if we delete and add new files over a few iterations?
|E|E|E|-|-|-|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Adding a new file, E the hard drive places the data in the available unassigned space, again thankfully there is no need to fragment here with these very simple files, and the same happens when we add F
|E|E|E|F|F|-|B|B|B|B|C|C|D|D|-|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Seemingly fine. But what happens if we delete E and add file H which is five blocks in size?
|H|H|H|F|F|H|B|B|B|B|C|C|D|D|H|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
It’s starting to get messy.
So as you can see, after adding and deleting a few files we have gotten to the stage that to read file H, the read arm will have to pass over multiple pieces of data needlessly adding to the read time. Multiply this out to massive file sizes on many hundreds of thousands of these ‘blocks’ and you can see why it’d become a problem.
Defragmentation aims to resolve this simply by rearranging these blocks of data so that they are closer together allowing the hard drive to read the necessary pieces of data faster without having to pass over parts that it doesn’t need to.
So hopefully we would end up with something more like this in the end making it easier and faster for the hard drive to read file H:
|H|H|H|H|H|F|F|B|B|B|B|C|C|D|D|-|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
Think of space on your hard drive like an actual hole. You can fill it up by putting things into it (writing) and make more space by taking things out (deleting). Deleting things doesn’t leave a nice large chunk of space. It leaves smaller holes the size of whatever you take out.
Say, for example, you’ve got a Transformer you want to put in your hole, but it’s full. It’s about the same size as six Matchbox cars that are already in there, so you pull those out. Now you’ve got six smaller holes that don’t touch each other. The Transformer still won’t fit.
That’s where defragmentation comes it. That process takes all the toys in your hole and moves them together. All the empty space becomes one chunk where you can fit bigger things. Starscream now sits in there just fine. He can use the combined space from all the cars you took out.
Think of space on your hard drive like an actual hole. You can fill it up by putting things into it (writing) and make more space by taking things out (deleting). Deleting things doesn’t leave a nice large chunk of space. It leaves smaller holes the size of whatever you take out.
Say, for example, you’ve got a Transformer you want to put in your hole, but it’s full. It’s about the same size as six Matchbox cars that are already in there, so you pull those out. Now you’ve got six smaller holes that don’t touch each other. The Transformer still won’t fit.
That’s where defragmentation comes it. That process takes all the toys in your hole and moves them together. All the empty space becomes one chunk where you can fit bigger things. Starscream now sits in there just fine. He can use the combined space from all the cars you took out.
A hard disk is well, a disk, much like a gramophone. It spins, there are mechanically moving read heads and each location on the disk has some data.
So it matters where the data physically is on the disk, if the read head has to physically seek between many different places to read all parts of the file, it’s going to take much longer than reading it sequentially from a single place.
Now how and why would a file get written some parts here, some parts there? If you start with empty hard drive, it wouldn’t, it would just start from beginning and write however many files you want to write sequentially. But the drive gets full, what do you do? You delete some files you don’t need. Leaving behind chunks of unused space, size of the files that used to be there. Now you want to write a file and you do have enough empty space total for it, but none of the empty chunks are individually large enough to fit it, what to do? Easy, opsystem writes some parts in one empty chunk, some in next etc etc.
Over time between deleting old and writing new files it becomes more and more of a mess until eventually using the hard drive becomes unbearably slow.
To combat that is defragmentation procedure. What it does is move around where the files are located to create continuous free regions and brings the files that are separated in parts together into larger chunks.
All this is of course only relevant if you have physically moving media like in a hard drive. In case of SSD, fragmentation doesn’t really matter and trying to defragment would be counterproductive as SSD has limited lifetime of write cycles.
A hard disk is well, a disk, much like a gramophone. It spins, there are mechanically moving read heads and each location on the disk has some data.
So it matters where the data physically is on the disk, if the read head has to physically seek between many different places to read all parts of the file, it’s going to take much longer than reading it sequentially from a single place.
Now how and why would a file get written some parts here, some parts there? If you start with empty hard drive, it wouldn’t, it would just start from beginning and write however many files you want to write sequentially. But the drive gets full, what do you do? You delete some files you don’t need. Leaving behind chunks of unused space, size of the files that used to be there. Now you want to write a file and you do have enough empty space total for it, but none of the empty chunks are individually large enough to fit it, what to do? Easy, opsystem writes some parts in one empty chunk, some in next etc etc.
Over time between deleting old and writing new files it becomes more and more of a mess until eventually using the hard drive becomes unbearably slow.
To combat that is defragmentation procedure. What it does is move around where the files are located to create continuous free regions and brings the files that are separated in parts together into larger chunks.
All this is of course only relevant if you have physically moving media like in a hard drive. In case of SSD, fragmentation doesn’t really matter and trying to defragment would be counterproductive as SSD has limited lifetime of write cycles.
Are you old enough to remember the encyclopedia britannica?
It came in around 26 volumes 1 for each letter?
Lool at your bool shelf, put each volume where it fits, some first shelf, some 2nd some 3rd…
Now find the volumn for A then B then C …. easy but not fast.
Now clear the shelf and put all volunms together and repeat, much faster.
Are you old enough to remember the encyclopedia britannica?
It came in around 26 volumes 1 for each letter?
Lool at your bool shelf, put each volume where it fits, some first shelf, some 2nd some 3rd…
Now find the volumn for A then B then C …. easy but not fast.
Now clear the shelf and put all volunms together and repeat, much faster.
Latest Answers