In computers, why do some files, even if they are the same size, take longer to be moved or deleted than others?

145 views

[ad_1]

I noticed this when I deleted a 2 GB video file on my laptop today, which took less than 5 seconds. But when I deleted a separate folder that was smaller than 2 GB, it took way longer to get deleted.

In: Technology
[ad_2]

The video file is one single file. So even though it’s “bigger” at 2 GB, it’s just one thing that the computer needs to pick up and drop in the recycle bin.

The folder has *many* files. Even if the folder itself is smaller than 2 GB, the computer has to figure out how many files are in there, how big they are, then basically pick up and drop off each one individually to the recycling bin.

Sort of like if you’re moving into a new place and you’ve got one bed, and then you’ve got ten boxes that don’t have too much stuff in them but they take up the same amount of space as the bed. You’ll probably get that bed into the house in one trip, but it’s going to take you multiple trips to take all those boxes in.

Also, the location on the disk plays role too. If I send you into the grocery store to get 10 things in a specific order (oatmeal, carrots, milk, dog food, cheese, aspirin, steak, soap, and dr. Pepper) and they are are on opposite ends of the store, it would take you longer than if I sent you in for 10 packs of Juicy Fruit gum.

[removed]

This has to do with the way computer usually handle such things as deleting or moving files.

Normally when you delete a file, the computer does not actually delete the data itself. Instead it goes to a sort of index where it has a tabel that says where each part of a file is located on the storage device.

You can think of it as going to the table of contents in a book and striking out the entry for a chapter and what page it is one. If you flip though the book you will still find the chapter, but if you only look at the table of contents it is gone.

From that metaphor you can also see why striking out a single chapter that goes fro a hundred pages is faster than striking out 100 chapters that are only a page long each.

Something similar happens if you rename or move a file around in the same drive. The file itself stays put you only mess with the entry in the table that says where what is.

You can move a file that is hundred of gigabytes from one folder to another in almost no time, but moving a hundred tiny files takes a bit of more time.

Most file systems don’t really erase data when a file is deleted. They just “forget” where that data was stored.

File systems use a small part of your disk to keep a kind of ledger of where every file is stored. “You want to load doge_picture_1.jpg? Go to sector 385018502820 and then read the next 8930 sectors.” This ledger also keeps track of which memory sectors are available to be written to, i.e. any parts that do not already contain file data.

When you delete a file, the file system just deletes the location information for that file from its ledger. It doesn’t actually go to those memory locations and manipulate the 1s and 0s that are stored there. These 1s and 0s can just hang out (each bit has to be either 0 or 1 anyway, it cannot be “empty”). Then when a new file is written, it might end up being written to some or all of these sectors, because the ledger shows that these sectors are now available.

This is why deleting files is pretty quick, and big files are deleted just as quickly as small files.

If the folder you deleted took longer than the video file, this is almost certainly because the folder contained a number of files or other folders inside of it. For each file, the file system had to make a change to its ledger. The more changes to the ledger, the longer it takes to do the whole deletion process. I could make a folder and fill it with a million empty text files, and it would take a long time to delete all of those, even though they take up almost no storage space.

(In addition to these ledger changes, the file system may also be performing additional “household” tasks for every file being deleted, e.g. checking that it’s not currently in use, that you have permission to delete it, etc. This all adds up to a bunch of overhead that has to be done for every file, regardless of its size.)

The fact that data isn’t actually being overwritten when you delete files means that it is often possible to retrieve the deleted data, using specialized software, as long as no new files have been written in those sectors. This can be a security risk if you really need to make sure that the data is completely gone. Permanently erasing a file also requires special software, which will overwrite the sectors occupied by that file with meaningless 0s and 1s (e.g. random or repeating sequences). Permanently erasing a 2 GB video like that would take quite a bit longer than erasing, say, a 5 MB picture.

The way files are actually saved is more like a hotel.

Each room stores a maximum amount of information. So your 2GB file may take up 20 rooms if each could hold 100 MB. When you want that video, your computer looks up where it is, and gets the data from the correct rooms. To delete that data, you don’t have to actually delete it, your computer just forgets where it put it and remembers that those rooms should be vacant and used for something else.

Your 1 GB file may contain 1,000,000 1 MB files. Each file ends up going to its own room, and the hotel remembers where they stay. To delete them, it has to do way more work to scrub the register that keeps track of everyone one.

Imagine your computer’s storage like a book, with the data being the writing and the files being the chapters.

When you erase a file, you’re not actually blanking the pages, you’re just removing the chapter from the index.

That’s why a single 2GB file deletes faster than a folder with two hundred 10MB files. Although both cases have 2GB of data, there’s one index entry in one case and two hundred index entries in the other.

Depends on the drive. HDD drives had a physical “head” that read data on a circular disk, much like a CD. The disks were segmented into clusters of a specific size. When you wrote a file, it would take up several clusters and often leave a bit of space in the last one. Imagine a drive that goes from 1 to 100 and is segmented into 10s.

* The first file is 12 large, so it takes up all of the 10 and a bit of the 20.
* The second file is 8 large but because 2 has data in it, it starts at the 30.
* The third file is 22 large. It starts at the 40 and files up the 50 and some of 60.

Eventually your disk ends up “full” but it isn’t. Thats when it starts filling the gaps left earlier. But now, the “head” has to move around a lot more. Instead of going to one place and read there, it has to jump from place to place to read data. The physical movement costs time.

​

With SSD drives the answer is a lot more complicated and can range from index position, over charge stability and disk wear, but others have covered this well enough so I’ll leave it at that.