Why does deleted data stay on a HDD once written, waiting to be overwritten, as opposed to being removed when requesting deletion?
In: 22
It takes a lot more time to go and remove the data, especially on older spinning discs
Got a 10 GB ripped bluray movie you don’t want any more? The way that we have deleting setup today it goes and clears the entries in the file time and makes the space as unclaimed. It changes a few bytes and can be done in a fraction of a second. If you want it to go overwrite the whole 10 GB file with 0s then it’d take the spinning hard drive about 2.5 minutes to complete the task during which time *you can’t get to anything else on the drive* without seriously slowing down the operation and whatever you wanted to do
Lots of rules were created to reflect the fact that HDDs are *slow*. A fast disk (15000 RPM Raptor drive) could do 80 MB/s but your normal bulk storage was in the 40-60 range. Modern discs with 4 or 6 ultra high density platters can do better but still rarely more than about 120 MB/s. Having to go through and erase a sector would have been a massive waste of time for no gain, anyone who cared about data destruction had an electromagnet to take care of that.
An HDD is like a library. The data is written in a huge number of “books” called blocks, but all the blocks are only identified by numbers. To find something, you need to use a big index to know which information is in which book.
Your computer uses the index to keep track of which books are being used at any given time. If you delete the index entry for some information, two things happen. One, the computer no longer has any way to find which books contain that information. Two, the computer now thinks that those books are empty. Of course, if you were just waltzing through the library and picked up one of those books at random, you’d find all that info is still there. At some point, part or all of it may actually get deleted because the computer, thinking that those books are empty, will write some new information there. But there’s really no guarantee on how long it would take to completely overwrite some deleted data through average drive use. To really delete stuff you can use software that will just write empty data to all the books in a file before deleting its index record.
Unless there’s a security reason to need to wipe the data so none of it can be found, it’s a lot faster to not bother and instead just mark those parts of the disk as being freely available to overwrite with something else rather than being overwritten right now.
It’s like “who cares that these bytes aren’t all zero? You’re not using them anyway.” Later, when you do start using them, you’ll be replacing them with your new content anyway so you gain nothing by writing to them twice (once to blank them out and again to write something new.)
Imagine a wall with a mural.
You mark off a rectangular spot on the wall – a part of the mural you don’t mind clobbering with a new picture. You paint over it with white to blank it out, and then plan to let someone else paint a new picture on it later.
Well, instead of painting it over with white, you could have just marked the rectangle off with a border, and then not bothered to paint it over with white yet – just mark it as available. Let the person who’s going to paint something new there paint over it – don’t bother with the in-between step of painting it blank in the mean time. **This is essentially what disk drives do when you “erase” a file**. They just mark the space as “clobber-able” without actually bothering to clobber it yet, leaving that to happen later when some file needs the space and writes over the top of it.
It takes extra time to write over the top twice – once to blank it and once again later when putting something new there.
That being said, there are some secure filesystems that DO blank it when you erase, for use in places where the extra time is worth it for security.
Thanks all. I now have more of an understanding of how relevant the file allocation table is to a hard drive and the calculation of available size etc. I’m learnding!
Overwriting files can take a lot of time, especially if the data to be overwritten is very large or spread out in many small pieces.
As such, it’s far faster just to erase a file table entry pointing to that data and move on.