if I have a 50GB transfer as a bunch of big files and a 50GB transfer as lots of very small files, why is the first one faster in Windows?

677 views

I’m moving a bunch of files for some drive changes, and I’ve noticed a bunch of times that for 2 equally sized data transfers, the smaller and more number of files the slower it goes. Why does this happen?

In: 3

18 Answers

Anonymous 0 Comments

For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.

Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.

Anonymous 0 Comments

Imagine having to move 1000 liters of water in containers.

One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?

Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.

EditLI5: “ca.” (circa) changed to “approximately”

Anonymous 0 Comments

For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.

Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.

Anonymous 0 Comments

For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.

Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.

Anonymous 0 Comments

Imagine having to move 1000 liters of water in containers.

One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?

Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.

EditLI5: “ca.” (circa) changed to “approximately”

Anonymous 0 Comments

Imagine having to move 1000 liters of water in containers.

One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?

Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.

EditLI5: “ca.” (circa) changed to “approximately”

Anonymous 0 Comments

The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.

Anonymous 0 Comments

The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.

Anonymous 0 Comments

The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.

Anonymous 0 Comments

Storage on disk is a complex business: most files are broken up into smaller chunks (each piece having a link to the next one) and deleting files, which happens often enough, leaves gaps in the overall physical disposition of these chunks.

When you copy a file to that disk the operating system has to decide where to put the new chunks, and tries to be somewhat smart about that (e. g. finding an available gap to fill to not waste space).

So when you have a lot of small files it has to make that decision each time anew and there’s a noticeable overhead compared to a single big file processed in one go.

For similar reasons hard disks have to be defragmented (moving the chunks around on the disk platter) to get rid of the gaps, which is generally done continuously in the background while the machine runs. But not with SSDs where the data is accessed differently: they don’t need defragmentation. Which doesn’t remove the necessity to find free space for writing.