For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.
Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.
Imagine having to move 1000 liters of water in containers.
One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?
Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.
EditLI5: “ca.” (circa) changed to “approximately”
For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.
Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.
For the same reason driving 50 miles continuously on a highway is faster than driving 50 miles through a city: there’s constant starting and stopping with smaller files as the computer switches from one to the next and ensures each one is correctly downloaded and indexed.
Modern computers will handle this extremely quickly, but it still adds delays compared to simply downloading one larger file of comparable size.
Imagine having to move 1000 liters of water in containers.
One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?
Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.
EditLI5: “ca.” (circa) changed to “approximately”
Imagine having to move 1000 liters of water in containers.
One batch is in approximately 1000 drinking bottles of various sizes.
Another batch is in approx 10 big containers.
Which batch will be faster to move?
Computers work in a similar way. Every single file has a bunch of overhead called metadata (such as file name, size, timestamps) that need to be recorded, making the transfer time depend on both the number of files as well as the amount of data.
EditLI5: “ca.” (circa) changed to “approximately”
The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.
The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.
The operating system of a computer needs to know where on the hard drive the data are stored. It uses a kind of index to write it down. The index says basically that file A is stored in row 3, segment 4 (it’s more difficult but let’s keep it that way). Doing so with only one big file is of course faster than with hundreds of small files that are together as big as the one file.
Storage on disk is a complex business: most files are broken up into smaller chunks (each piece having a link to the next one) and deleting files, which happens often enough, leaves gaps in the overall physical disposition of these chunks.
When you copy a file to that disk the operating system has to decide where to put the new chunks, and tries to be somewhat smart about that (e. g. finding an available gap to fill to not waste space).
So when you have a lot of small files it has to make that decision each time anew and there’s a noticeable overhead compared to a single big file processed in one go.
For similar reasons hard disks have to be defragmented (moving the chunks around on the disk platter) to get rid of the gaps, which is generally done continuously in the background while the machine runs. But not with SSDs where the data is accessed differently: they don’t need defragmentation. Which doesn’t remove the necessity to find free space for writing.
Latest Answers