AnswerCult

Question

1.66K viewsJuly 9, 2024Engineering Other

Question 100.55K July 9, 2024 0 Comments

Why don’t we write a database file system? Isn’t a file system practically a database already? Isn’t layering an OS between the data and the database application slowing things down?

In: Engineering

19 Answers

You are viewing 1 out of 19 answers, click here to view all answers.

Answer 1 · 2024-07-09T21:19:01+00:00

NTFS is database-based as far as I understand. It’s the implementation of how Windows enumerates files and folders which is dumb.

Specifically, on NTFS, you have a hidden/system file called $MFT which is the Microsoft File Table which stores all of the Filenames, Folders, Attributes (dates, owner, read-only, archive, etc), and ACL (Access Control Lists, everything shown when you right-click a file, hit properties, and hit the Security tab), and also all of the offsets of the files (where the content is physically stored on the drive).

The way file enumeration works is that you call an API function which returns the list of files in a folder. This call is synchronous which means that you have to wait for the function to complete before doing anything else. You can tell it to list all of a particular type of file anywhere on storage but it can take several MINUTES to complete the call because it’s calling itself recursively to look inside of all of the sub-folders. Or, you can tell it not to run recursively and then run it recursively yourself which allows you to not have you program be locked up while it searches but since there’s no way (using the Windows API) to know how many files without first running this thing recursively, there’s no knowing how long it’ll take to run.

A more proper way to enumerate files would be to read $MFT for the raw list of files. There’s no recursing through folders, you just have a raw list of files and their folders.

By way of example: Here are two different competing applications which show you a breakdown of how your HDD/SSD is being utilized broken down by folder and by file size (including cumulative “here’s how much of your hdd/ssd is .jpg files”) with a neat 2d graph visually showing relative folder sizes. Both apps work really well but they differ in how they analyze your filesystem:

* [WinDirStat](https://sourceforge.net/projects/windirstat/):
* Uses the Windows 32 API to enumerate files.
* **Pros:**
* Non-admin users can run this tool
* Works on any filesystem supported by Windows
* Fully open-source
* **Cons:**
* ***Slow, can take 10+ minutes to scan on larger filesystems.***
* Files and folder which the current user cannot access are excluded from analysis and marked as “Unknown”. (Some system files cannot be accessed by even an Admin user so there is always some amount of data marked as “Unknown”.)
* Requires installation.
* [WizTree](https://diskanalyzer.com/):
* Directly reads $MFT.
* **Pros:**
* ***Fast even on huge file systems (with million+ files/folders)***
* Includes all file and folders on a filesystem including those to which the current user does not have access. (Does not grant access to the content of the files, only shows file and folder names.) There is never “Unknown” data on an NTFS file system if $MFT is being read.
* Portable (no installation required)
* **Cons:**
* Requires admin privileges to run.
* Only supports locally mounted NTFS file systems. (On file systems which are not locally mounted NTFS, it falls back to the Win32 API filesystem calls.)
* Freeware with donation request. No source code available.

AnswerCult

Why don’t we write a database file system? Isn’t a file system practically a database already? Isn’t layering an OS between the data and the database application slowing things down?

19 Answers

Search questions

Popular Questions

Latest Answers