What is the difference between “Size” and “Size on disk?”

701 views

What is the difference between “Size” and “Size on disk?”

In: Technology

5 Answers

Anonymous 0 Comments

Imagine you have a case of beer. Inside the case it’s broken up into 16 slots, one for each bottle. The slots are bottle sized and so using a bottle is an efficient use of that space. Now imagine you put something smaller in the slot, but you can only put 1 thing in each slot, never 2 or more. If you put something smaller than a beer bottle in the slot, like a stick of gum, the object itself is smaller, but the amount of space it takes up in the box, 1 slot, is the same as the much larger beer bottle. It’s a less efficient use of that space. So the stick of gum is small, but it’s size ‘in the box’ is larger.

A gigabyte is a billion bytes of data, which is a lot. Each file on a disk uses a certain amount of bytes and it’s position on the disk, and what parts of the disk the file is using are all stored in something called a file allocation table, which is a kind of index to the book that is your disk.

Now imagine you tried to keep track of all one billion bytes of space in that file allocation table, the FAT itself would quickly take up a billion bytes itself if it wanted to track file usage in that much detail. It would be wasteful and slow.

So instead the disk space is broken up into blocks of data, like larger chunks that are easier to track. The larger the disk is, the larger these chunks are to keep the file allocation table a healthy size.

Lets say that your disk has a block size of 32,000 bytes and you need to store a file that is 5 kilobytes. The smallest amount of space it can use is 32 kilobytes though because that’s the minimum block size and so the file size would be 5k but size on disk would be 32k.

In general, the larger the file allocation table, the smaller the block size can be for any given disk size. This is why the block size on old fat16 disks was so much more wasteful than fat32, and why fat32 is so much more wasteful than exFAT.

Anonymous 0 Comments

this comes from the hard-drive storage paradigm. the disk is not a bucket, it is broken in small sections of equal size, for optimization reasons.

you know your ice tray? that’s how a storage drive works. many sections of the same size. you have a cup of water that will fill 3 and a half ice “units”. the one that you used up to the half can’t be used to store another file, so it will be reported as 4 units (size on disk) even if the space you need is 3.5 units (size).

in filesystems you can’t use half (or a third) of an allocation unit. if the file you want to store has a size of bits that can’t fit exactly over a number of sectors you will get a size on disk that is slightly different from size.

Anonymous 0 Comments

Think of it like a spaghetti sauce. Let’s say you have 10 litres of sauce that you want to store in five containers. But all your containers are 3 litres, so you put 2L of sauce in 5 containers. The “size” of your sauce is still 10L, but the “size on shelf” (or on disk) is 15L.

Anonymous 0 Comments

In a really simplified sense: think of a storage device as a book: storage is divided logically into discrete “pages”. Each page can only be assigned to a single file, so if that file doesn’t occupy the entire page, the rest can’t be used by anything else since the table of contents points to that page for the first file.

This is why, especially notable for a small file, the “size on disk” statistic is often a even multiple of a power of 2 (like 4kb, for example”)

Anonymous 0 Comments

Take a piece of paper. Write “Hello” on it. Grab another piece of paper. Write “H” on it.

The two pages are the same size. Different amount of writing on each (Size) versus how much paper you took (Size on Disk).

The word “hello” might take up the same disk space as “h”. Yet obviously it is longer. And if your write more, more pages are needed, but you grab further full pages, not partial pages.

(Actually with modern filesystems there’s a bunch of optimisations for trivial examples but the idea is valid).