How does a filesystem on a FLASH storage device work?

302 views

I was thinking about how to design a filesystem from scratch.

Let’s say every cell in our FLASH chip has 100’000 write cycles before it dies and let’s imagine a simple flat filesystem.

I imagine the filesystem splits the data of the file across multiple chunks and creates a linked list of these chunks where every chunk holds a pointer to the next chunk. It would be a copy-on-write filesystem. This would theoretically slow the wear down on the cells? Now I’m not really sure how to achieve efficient wear.

But I really don’t know how the filesystem table should work because changing, creating or deleting files would be very easy to do and wouldn’t do too much damage to the FLASH cells. But I imagine you’d need to update the filesystem table with each of this operation.

My question is how do you get around this? Do you just rely on the fact that when the cells in the filesystem table fail the whole filesystem fails? I guess you could store somehow multiple copies of the filesystem table and compare them against each other at every write operation? And if one of the filesystem table copies failed you just create a new one?

And so you would also need to store the addresses of bad sectors somewhere to not attempt to write to them. But you would also need multiple copies of that?

In: 4

4 Answers

Anonymous 0 Comments

File systems don’t need to do this, the SSD handles what you are asking transparently.

There is another layer of abstraction handled at SSD device level, although it can be designed to work with the host computer, it doesn’t need to in order to become functional.

It’s called the “flash translation layer”.

There is a mapping table maintained by the SSD which maps each logical address to a physical block, and other tables record the use/free status and the age of each block.

Whenever a logical block is modified, the SSD looks for a free physical block, write stuff to it, then change the mapping table so the logical address points to this physical block with newer copy of the data, then the original block it points to is marked invalid, write life incremented and ready to be used for something else.

In this way any new data or modification always go to blocks with lower use life, overwriting that amplify life on the same logical block NEVER happens on the same physical block, it’s always the blocks with lower write counts get used.

This transparent behavior is good enough, but it can be made better if the host preemptively tells the SSD that certain logical blocks are already free on a file system level as soon as a file system level deletion happens, instead of only being notified when overwriting happens.

This is done via “TRIM” commands. Whenever a file system level deletion happens, the host also notifies the SSD of all the logical addresses containing the bulk of the deleted data are free to be deleted. The SSD can preemptively free and erase these blocks, so next time when a write happens, it can immediately write to these prepared blocks. This improves SSD write performance and longevity by giving the SSD more free space to shuffle cold data that never gets modified, freeing those stiff blocks for better wear leveling.

You are viewing 1 out of 4 answers, click here to view all answers.