AnswerCult

Question

2.07K viewsJuly 9, 2024Engineering Other

Question 100.55K July 9, 2024 0 Comments

Why don’t we write a database file system? Isn’t a file system practically a database already? Isn’t layering an OS between the data and the database application slowing things down?

In: Engineering

19 Answers

You are viewing 1 out of 19 answers, click here to view all answers.

Answer 1 · 2024-07-09T18:06:25+00:00

**Why don’t we write a database file system?**

This is kind of a hard one to pin down, because as you’ll see in the answer to your second question, we kind of do. But I suspect that when you say “database” you mean in the traditional RDBMS sense, which we do not. At least not today.

Microsoft actually developed a filesystem based on a relational database system. It was called WinFS, and it got a fair amount of hype (amongst developers, anyway) because it was planned to be part of Windows Vista. It died rather unceremoniously.

The reasons why are pretty nuanced. As someone who has been in the software development field for more than 20 years, I recognize a few patterns in WinFS’ failure to launch.

In a broad sense, software is a bit like an evolutionary ecosystem. Solutions that are “good enough” often thrive because they are lower cost or have the inertia of widespread adoption, and therefore widespread understanding. The low cost part is easy enough to understand — individuals and companies prefer to spend less — but the fact that WinFS was a brand new paradigm in filesystems had upsides and downsides. The most significant downside is that developers were completely unfamiliar with it, and end-users barely knew it existed.

So you had this situation where end-users just wanted features like good performance, search, and access controls. Developers wanted an API that allowed them to deliver these end-user requirements, but this was all table stakes. You couldn’t sell a new piece of software based solely on file-system features, because most of what WinFS promised was already being delivered in some capacity; just using different technologies.

Undoubtedly, WinFS was a cool technology, but it was so different from what had come before that it failed to pick up enough inertia before it ever truly entered the world.

**Isn’t a file system practically a database already?**

Yes, definitely. All modern file systems are databases containing file metadata and a pointer to the binary file data location in storage. They’re just not like other relational databases that we’re used to using.

**Isn’t layering an OS between the data and the database application slowing things down?**

This is really tough to answer. File systems are complicated enough that you can’t make blanket statements like “a relational database filesystem would be faster than current filesystem technologies”. Relational databases are very well understood, and they are *very* performant… But so are modern file systems.

There is one thing that we can say with 100% confidence though: if you add additional work, you add additional time. Every piece of software that lives in a performance-critical role adopts a strategy of work-avoidance. You cannot avoid the fact that adding the “relational” aspect of RDBMS to a file system would introduce additional, at a minimum for tasks that rely on relationships. That means there is a lot of opportunity for things to actually be *slower*.

**Questions you didn’t ask.**

Looking at the file system and operating landscape today, we see that modern operating systems deliver on much of the promises that RDBMS based file systems made. Mac OS X Tiger had fast file system search starting in 2005. Not coincidentally, this is around the time that WinFS was cancelled; I don’t recall the exact year. Modern versions of macOS continue to deliver fast file system search without a relational database file system.

What we see applied here is the same thing we see in software development in general. Modern development eschews the idea of gigantic monolithic units of software, and instead prefers to break things up into smaller, more manageable components that we can interact with through APIs. That is how Apple’s Spotlight search works. It is its own service that relies on file system features, but it keeps its own index. Much of that process uses techniques that are common in database systems.

Basically, rather than trying to develop an elaborate file system that stores relationships and complex information about file system objects, developers let the file system concern itself with storing files and file metadata, then build services that query the file system and build indexes of their own, containing only the data relevant to delivering the feature the user wants.

This approach helps developers manage complexity in ways that allows them to iterate their designs in smaller steps, avoiding long development cycles for products that are so complicated, you need months of QA just to ensure you don’t break the user experience.

AnswerCult

Why don’t we write a database file system? Isn’t a file system practically a database already? Isn’t layering an OS between the data and the database application slowing things down?

19 Answers

Search questions

Popular Questions

Latest Answers