AnswerCult

Question

2.06K viewsJuly 9, 2024Engineering Other

Question 100.55K July 9, 2024 0 Comments

Why don’t we write a database file system? Isn’t a file system practically a database already? Isn’t layering an OS between the data and the database application slowing things down?

In: Engineering

19 Answers

« Previous 1 2

Answer 1 · 2024-07-09T18:56:01+00:00

This topic comes up very regularly. It is an issue between DB people and OS people.

Jim Gray et al wrote an extensive monograph called “[Notes on a Database Operating System](https://jimgray.azurewebsites.net/papers/dbos.pdf)s”.

For example, Oracle doesn’t *need* a file system from the OS; it can store its data on raw devices.

I think the consensus is that the classic DB abstraction is far too strong for all use cases. Consider

Concurrency: Most data transformations are not concurrent. Pretty much most of the data on your personal machine is acted upon by a single user, single thread.

Transient: A lot of data is throwaway or reproducible .. think compiler intermediate files, .o files, .bak files.

Pattens of access: Queues see action at the head and the tail, stacks see action only on one end. Of course, Jim Gray had a paper on that as well (“[Queues are Databases](https://arxiv.org/pdf/cs/0701158)”)

But I think the final answer is in money. There’s no money to be made in operating systems (even Microsoft), but there’s still tons of money to be made in DBs. So the DB folks have let the OS folks spend money providing the basic machine abstractions, worry about the variety of disks, network cards, CPU architectures etc; it is worth the tradeoff.

Answer 2 · 2024-07-09T19:57:29+00:00

The [Pick](https://en.wikipedia.org/wiki/Pick_operating_system) operating system did this in 1965 and through various incarnations adapted by other vendors over the years, it is still in use today in specialized applications.

Answer 3 · 2024-07-09T18:58:09+00:00

Anonymous Posted July 9, 2024 0 Comments

Microsoft did. It was called [WinFS](https://en.wikipedia.org/wiki/WinFS). It just didn’t catch on.

Answer 4 · 2024-07-09T20:04:47+00:00

One big reason is that file systems came before databases, and the major mainframe and microcomputer operating systems all were built on the concept of the file system as the most fundamental representation of data. When databases came along they, like all other applications, built on the abstract building blocks given to them by the OS in the file system. Commercial databases were only available on expensive mainframes in the 1960s and 1970s, and many folks “rolled their own” data storage software until the late 70s/early 80s, when SQL-based RDBMS software started becoming more popular. The first several years of my programming career was spent maintaining custom database software that was built directly on top of TANDEM’s file system, including handling converting records from their in-memory representation to blocks that could be written to disk. Things have changed a lot, but fundamentally most applications still want a file system that looks pretty much like it always has

As others have mentioned NoSQL and other innovations have come along that are close to what you’re describing, but for most applications it’s not useful to throw out the paradigm we’ve built on for computer data storage.

Answer 5 · 2024-07-09T20:09:47+00:00

it’s been done; other people have mentioned the main cases, but there’s also iSeries, where tables can be seen as files where records have fixed lengths; it was widely used at some points, and there is a good chance your bank still uses it

Answer 6 · 2024-07-09T20:12:16+00:00

The bottleneck on the local system for ACID compliant databases are mostly the raw disk write speeds rather than the file system, though highly available systems for massive dataloads do use filesystems specifically made for that.

Concepts from databases have been borrowed for filesystems currently in use – Hadoop FS for example – but it’s just not worth the extra hassle over the more traditional options in most cases.

Answer 7 · 2024-07-09T21:19:01+00:00

NTFS is database-based as far as I understand. It’s the implementation of how Windows enumerates files and folders which is dumb.

Specifically, on NTFS, you have a hidden/system file called $MFT which is the Microsoft File Table which stores all of the Filenames, Folders, Attributes (dates, owner, read-only, archive, etc), and ACL (Access Control Lists, everything shown when you right-click a file, hit properties, and hit the Security tab), and also all of the offsets of the files (where the content is physically stored on the drive).

The way file enumeration works is that you call an API function which returns the list of files in a folder. This call is synchronous which means that you have to wait for the function to complete before doing anything else. You can tell it to list all of a particular type of file anywhere on storage but it can take several MINUTES to complete the call because it’s calling itself recursively to look inside of all of the sub-folders. Or, you can tell it not to run recursively and then run it recursively yourself which allows you to not have you program be locked up while it searches but since there’s no way (using the Windows API) to know how many files without first running this thing recursively, there’s no knowing how long it’ll take to run.

A more proper way to enumerate files would be to read $MFT for the raw list of files. There’s no recursing through folders, you just have a raw list of files and their folders.

By way of example: Here are two different competing applications which show you a breakdown of how your HDD/SSD is being utilized broken down by folder and by file size (including cumulative “here’s how much of your hdd/ssd is .jpg files”) with a neat 2d graph visually showing relative folder sizes. Both apps work really well but they differ in how they analyze your filesystem:

* [WinDirStat](https://sourceforge.net/projects/windirstat/):
* Uses the Windows 32 API to enumerate files.
* **Pros:**
* Non-admin users can run this tool
* Works on any filesystem supported by Windows
* Fully open-source
* **Cons:**
* ***Slow, can take 10+ minutes to scan on larger filesystems.***
* Files and folder which the current user cannot access are excluded from analysis and marked as “Unknown”. (Some system files cannot be accessed by even an Admin user so there is always some amount of data marked as “Unknown”.)
* Requires installation.
* [WizTree](https://diskanalyzer.com/):
* Directly reads $MFT.
* **Pros:**
* ***Fast even on huge file systems (with million+ files/folders)***
* Includes all file and folders on a filesystem including those to which the current user does not have access. (Does not grant access to the content of the files, only shows file and folder names.) There is never “Unknown” data on an NTFS file system if $MFT is being read.
* Portable (no installation required)
* **Cons:**
* Requires admin privileges to run.
* Only supports locally mounted NTFS file systems. (On file systems which are not locally mounted NTFS, it falls back to the Win32 API filesystem calls.)
* Freeware with donation request. No source code available.

Answer 8 · 2024-07-09T21:33:40+00:00

The other interesting thing aside from what other commenters have said is… the other way round is also completely possible. This is the idea of a “data lake”, essentially files in a filesystem being used as a high-capacity database. On-prem Hadoop clusters were all the rage at companies that needed to store a few PBs of data in a decent-enough database.

Answer 9 · 2024-07-09T22:29:41+00:00

We already have a few examples, apache spark (basis of data bricks) does this.

A database, at its simplest, is made up of tables with rows and columns. You can store these as files, a good way to do this is having each column as a file (this is how apache spark works, with parquet being the file format, and those files being stored on a file system (aws s3).

It works incredibly well at some things, less well at others.

AnswerCult

Why don’t we write a database file system? Isn’t a file system practically a database already? Isn’t layering an OS between the data and the database application slowing things down?

19 Answers

Search questions

Popular Questions

Latest Answers