if I have a 50GB transfer as a bunch of big files and a 50GB transfer as lots of very small files, why is the first one faster in Windows?

59 views
0

I’m moving a bunch of files for some drive changes, and I’ve noticed a bunch of times that for 2 equally sized data transfers, the smaller and more number of files the slower it goes. Why does this happen?

In: 3

For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.

Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.

Imagine having to move 1000 liters of water in containers.

One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?

Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.

EditLI5: “ca.” (circa) changed to “approximately”

The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.

Storage on disk is a complex business: most files are broken up into smaller chunks (each piece having a link to the next one) and deleting files, which happens often enough, leaves gaps in the overall physical disposition of these chunks.

When you copy a file to that disk the operating system has to decide where to put the new chunks, and tries to be somewhat smart about that (e. g. finding an available gap to fill to not waste space).

So when you have a lot of small files it has to make that decision each time anew and there’s a noticeable overhead compared to a single big file processed in one go.

For similar reasons hard disks have to be defragmented (moving the chunks around on the disk platter) to get rid of the gaps, which is generally done continuously in the background while the machine runs. But not with SSDs where the data is accessed differently: they don’t need defragmentation. Which doesn’t remove the necessity to find free space for writing.

Imagine a delivery driver who is delivering items to a house, but they can only deliver one item at a time

It’s much faster to deliver a single 10kg item than It is to deliver 10 1kg items individually.

The driver has to go back to the truck, collect the next item, deliver to the door etc .

Your computer has to do finish the transfer, verify it copied and then find the next file and do the same.

If over the network, there’s also a small loss of efficiency with lots of smaller files. A standard packet is 64kb in size, so you can send 1 64kb file in a single packet or if you want to send 8 8kb files, each needs its own packet which adds time and overhead (correct me if am wrong)

It might be easier to compress a bunch of smaller files into a zip file, copy it and then unzip