I’m assuming you mean how does *defragmentation* optimise your disk, but let’s start with fragmentation, and why that’s a problem:
Imagine a librarian, with a huge library. 100 rows of floor to ceiling bookshelves on ten floors.
This is a special library. To maximise space, it doesn’t have any book covers or dividers between books. It keeps the pages in big stacks on each shelf. Every shelf can hold 5 stacks of *exactly* 1000 pages each. Each page has a standard font and text size and can hold 1,000 words.
To ensure the librarian can add books to the library as quickly as possible, he gives each page a sequential number in the order he receives it, and then adds it to the pile on the shelf. He never skips any numbers, and never starts a new stack or shelf until the old one is completely full.
As there are no book covers, the librarian builds a super cool index to keep track of where everything is. ‘The Shining’, for example, might be on Floor 1, Shelf 14, Stack 3, Pages 409-733.
Sometimes, a publisher asks for a book to be removed from the library. Going and taking the pages off the shelves would mess up the librarian’s numbering system and make the stacks different heights, so he doesn’t do that. Instead, he just deletes it from his index.
This works well enough, for the most part. When the readers come, the librarian just consults his index and goes and fetches them the pages they need. Equally, since you need the index to find anything in this hellscape of stacked, unlabelled pages, something that has been deleted from the index is as good as gone. (Note though that someone can still search through the whole library manually page by page to look for it if theyre determined).
It’s worth noting that storing and numbering each page instead of each word is slightly less efficient, as not all pages are 100% full. Sometimes, a 10,000 word book might take up 12 pages instead of the theoretical minimum of 10.
This has some advantages, however. Firstly, it increases speed: finding a whole page is easier than finding particular words. Secondly, it gives the librarian a little wiggle room to make small changes. if the publisher notices a typo on page 7, then they can reprint just that one page, and the librarian can switch out the old one and the new one. No big deal.
But what about a bigger update? One day, the publisher decides to add a new foreword to The Shining to celebrate its 45th anniversary. They send over an extra 3 pages and say ‘can you just add this in at the front, please?’
Unfortunately, this is where ‘just adding it in’ isn’t that simple. The librarian has has added a LOT of new books since The Shining. The original shelf has been filled for years. There’s no room to add a foreword!
So the librarian come up with a workaround: he adds the foreword to the end of his latest stack as if it were a new book that just came in. He they updates the index to keep track:
The Shining: Floor 1, shelf 14, stack 3, pages 409-733 with foreword at Floor 8, shelf 2, stack 29, pages 4-6.
*This is fragmentation*. Even though the librarian knows exactly where the book is, fetching it is now a pain. He has to go first to floor 1 to get most of it, then go to floor 8 for the rest. This slows him down and the customer gets annoyed.
In some circumstances, this gets even worse. If the library is very popular, it may be receiving multiple books at once. Usually, each publisher sends him a page as soon as it’s printed. If multiple books are being sent at once, the librarian might end up receiving and filing 3 pages of The Stand, then 8 pages of The Green Mile, then another page of The Stand, then 10 pages of Pet Sematary, on and on until all three books have been received. The librarian duly numbers and files each page as it comes, but now theres three books all interspersed with each other in this stack! Chaos!
There are a couple of things that the publishers and librarians do to try and help this, but none of them are perfect:
Firstly, the librarian realised that getting lots of books at once is quite common. Instead of numbering each page immediately as it comes, he now allows himself to make a few informal stacks on his desk, so he can group pages of the same book together before numbering and filing them. There’s not much space though, and a limit to how many stacks he can have. This means he’s often still forced to number them and put them on the shelf before they’re fully done.
Secondly, *some* kind publishers will tell the him how many pages they will send in total, so he can use blank pages as placeholders and put the ‘whole book’ on the shelf at once. When the ‘real’ page comes in, he just switches it for the blank one. This helps when recieving multiple books at once, but isn’t much help if the books get changed later (or if the publsiher doesn’t bother).
The biggest tool that the librarian has, however, is *defragmentation*. This is what he does on his downtime when people aren’t sending him stuff to file or nagging him to go fetch pages for them:
He looks through his chaotic index where one book is spread over eight shelves and three floors and tries to make it nice and simple again. He reorganises and renumbers the three books that were all intermingled with each other from that day people just wouldn’t stop sending him shit. He notices that there’s no longer anything listed in his index for Floor 1, shelf 14, stack 3, pages 400-409: the space immediately before “The Shining”. Clearly *something* was once there, but it can’t be that important if it’s not in his index. So he goes and fetches The Shining’s foreword from floor 8 and replaces pp406-409 with it. (Note that pages 400-405 of whatever was there before are still available for anyone who looks through manually). More intelligent librarians may also keep a list of books that are often checked out together, and rearrange the library so they’re stored together and can be fetched in one trip.
The possibilities for optimisation are endless and depend on the librarian and how the librarian prefers to go fetch boks. But essentially it’s all the same goal: take the system built by someone trying to *add* things to the library as fast as possible, and turn it into one more useful for someone trying to *find* things in the library as fast as possible.
Latest Answers