Database-based file system

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Database-based file system

Post by Ethin »

So, not trying to be ~ :) (from what I've seen of him, I unfortunately have a bit of a bad opinion of him now :(). Its an idea I literally just came up with, and I thought I might post it here to get feedback.
The idea is simple: a file system stored in -- say -- a heavily adapted database like SQLite or something else that can easily be embedded without a ridiculous amount of overhead (I'm looking at you DBMSs). I thought we could take something very simple (say, USTAR). Each table would be a file, which could (as the Unix philosophy goes) could be a directory. Each table would have attributes (i.e. contents, permission bits, etc).
Is this just a pipe dream (i.e. sounds good in theory but would never work right in practice) or would this be fast enough that it might actually work on a modern computer system?
I'm not thinking about implementation right now, just theoretics. What ar your thoughts?
pat
Member
Member
Posts: 36
Joined: Tue Apr 03, 2018 2:44 pm
Libera.chat IRC: patv

Re: Database-based file system

Post by pat »

Why have a table per file? What would the rows for song.mp3, movie.mp4 or soliloquy.txt look like? There's been interest in database-like file systems over the years, but I don't think I've seen that idea before.
Image is my operating system.

"...not because [it is] easy, but because [it is] hard; because that goal will serve to organize and measure the best of [my] energies and skills..."
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Database-based file system

Post by Ethin »

Something like:
Innode ID, filename, file attributes bitmask, date modified, date created, date accessed, permissions bitmask, file size, contents
Things like that. The attributes bitmask would contain flags for thigns like read-only, hidden, etc. The permissions bitmask would be thigs like read/write/execute, the POSIX permissions, that kind of stuff. If we add in usersand groups, the UID and GID can be added in there.
User avatar
iansjack
Member
Member
Posts: 4705
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Database-based file system

Post by iansjack »

In what way would this improve on current filesystem implementations?

Do the relationships that exist between tables in a relational database, and the normalisation this permits, exist here? Are the queries that are typically done on relational databases important in this context?
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Database-based file system

Post by bzt »

Hi,

It is an interesting idea, but not new. Actually many have tried to create database file systems, but it seems that everybody are moving away from the concept.

- the aforementioned DBMS by IBM which used DB2
- HFS originally was a database-like file system, where a single table (the Catalog file) contained all files and directories of a volume in a b-tree
- FILES-11 was special as it could store I/O backed records, so called RMS feature
- DBFS from Oracle has a long history too

Btw, sqlite is particularly suitable to be used as a file system, because under the hood it uses a so called pager, which allocates and writes 4096 bytes at once (equivalent of a logical sector). I think nothing special needed to create a sqlite database on a partition device under Linux, like "sqlite3 /dev/sda2" should work, but I haven't tried.

(Just an interesting fact, FILES-11 had a Master File Directory, kinda overall index, and since the same people designed NTFS as FILES-11, NTFS inherited that MFD).

Cheers,
bzt
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Database-based file system

Post by Ethin »

@iansjack, yes, the relationships do exist (because I'd probably go with sqlite3), and the various queries and operations available to such a database would also be important. Those queries would be important, for two reasons:
1. from what I've seen, a database can be much smaller than other files. Compression can also be applied, making files even smaller if they're not already compressed.
2. A single select statement can be incredibly fast. I haven't tested this on a database with, say, a few thousand (or even hundred thousand) rows, but will certainly need to do so to see if this would be viable.
As an example of the databases structure, in sqlite3, the following statement might create the files table, whereupon a relationship could be created between other tables, using the USTAR format as an example:

Code: Select all

CREATE TABLE files(
id INTEGER NOT NULL PRIMARY KEY,
filename TEXT NOT NULL,
mode TEXT NOT NULL,
uid INTEGER NOT NULL,
gid INTEGER NOT NULL,
size INTEGER NOT NULL,
mod_time INTEGER NOT NULL,
checksum TEXT NOT NULL,
type TEXT NOT NULL,
linked_file INTEGER,
owner_username TEXT NOT NULL,
group_name TEXT NOT NULL,
dev_major INTEGER,
dev_minor INTEGER,
prefix TEXT NOT NULL,
contents BLOB
);
The only problem with using SQLite3 is that it doesn't have as rigid of a type system as other DBMSs, but that's a sacrifice that can be made, as the user won't be directly managing the database (unless they extract it from the disk, which is possible).
The huge advantage is that they could store all their files in a database that would be a single database on other FSes. The other advantage is that pseudo FSes could also be created as in-memory file systems (i.e. /proc) and then a hook could be made in the file system driver to transfer your request to the in-memory file system when you attempt to read from anything and its path starts with /proc, for example.
There are probably other things I'm missing in regards to advantages/disadvantages, but for right now its just an idea and I don't know if something will become of it or not.
User avatar
iansjack
Member
Member
Posts: 4705
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Database-based file system

Post by iansjack »

What other tables would you envisage apart from the Files table? What relationships do you see between the various tables?

I'm not convinced that a file system based on a relational database would be more efficient than a filesystem specifically designed as a filesystem.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Database-based file system

Post by Ethin »

iansjack wrote:What other tables would you envisage apart from the Files table? What relationships do you see between the various tables?

I'm not convinced that a file system based on a relational database would be more efficient than a filesystem specifically designed as a filesystem.
I'm not trying to convince you its better than a traditional file system. I'm simply throwing around an idea and asking for feedback.
Other than a files table, there could be a devices table, which would hold all devices to ensure that files and devices are separate. (They would show up as files in the file system to the user, and would be accessible files, but would be separate tables in the database.)
Again, this is just an idea, and I don't know if I would ever implement it. Its just a idea I came here to get feedback on.
User avatar
iansjack
Member
Member
Posts: 4705
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Database-based file system

Post by iansjack »

I'm sorry if you didn't like my feedback. :wink:
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Database-based file system

Post by Ethin »

I liked your feedback, your post came off as practicality when I had said that this was an idea, that's all.
User avatar
eekee
Member
Member
Posts: 892
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: Database-based file system

Post by eekee »

I'm not a good historian, but I think this was normal for mainframe OSs before Unix. The trouble is it requires all sorts of parameters just to create a file. There may also have been flexibility problems back then; it was neither easy nor efficient to put arbitrary-size "blobs" into databases until the '00s. This might not have been too bad because each file was a separate database with separate parameters; you could probably specify a file's records have just 1 1-byte field if the data was that arbitrary. (Or rather, 1-word field; bytes weren't so universal as they are today.) Then again, it's sadly not unlikely that such an 'extreme' 'ridiculous' structure would be blocked by the OS.

If sqlite can store multiple databases (with different record structures) in a single file/partition, and blobs too, I see nothing particularly wrong with using it for a filesystem. I know you're not thinking of practicality, but I can see one huge practical advantage: Filesystems are perhaps the worst environments for subtle bugs causing misery. sqlite is a lot of well-tested, featureful code you could bolt in with just a straightforward wrapper. It almost makes me wish I was writing a more conventional os. :)

Edit: Oh I don't think I read the original idea properly, sorry. I'm confused now. (But that's normal.)
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: Database-based file system

Post by Ethin »

Sqlite3 can, in a way, have multiple databases. You can have any number of DBs open; but, unlike a conventional RDBMs like MySQL, you can't have multiple DBs in the same "database". I wish this was possible, though.
Other than Sqlite3 I'm not really sure what database engine is suitable nor capable of running with a tiny memory footprint, which is what I think anyone would be aiming for if this were to bea thing (I certainly don't want my fS driver taking up a few hundred MB of RAM, or a few gigabytes, for example, when there aren't that many files in it to begin with).
User avatar
iansjack
Member
Member
Posts: 4705
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Database-based file system

Post by iansjack »

eekee wrote:Filesystems are perhaps the worst environments for subtle bugs causing misery. sqlite is a lot of well-tested, featureful code you could bolt in with just a straightforward wrapper. It almost makes me wish I was writing a more conventional os. :)
But doesn't sqlite need a filesystem to run on (which rather defeats the purpose)? I thought it needed to create temporary files, so couldn't just use a raw partition.

I would think that a more profitable direction for filesystem design would be that of ZFS and the like, providing extra features such as snapshots, data deduplication, file compression, error correction, etc. I don't really see how an SQL database would be superior. And I have reservations about the idea of storing the whole filesystem in a single "file".

I'm still struggling to see the advantages of a filesystem based on a relational database.
User avatar
eekee
Member
Member
Posts: 892
Joined: Mon May 22, 2017 5:56 am
Location: Kerbin
Discord: eekee
Contact:

Re: Database-based file system

Post by eekee »

The possibility of SQlite needing temporary files didn't cross my mind. I wonder what it needs them for.

ZFS features are lovely, but it's not a simple system! FreeBSD's ZFS support was not fit for serious use for some time after it was released, I think 2 or 3 years. If I were implementing something like that, (I have thought about it,) I would see if it could be broken up into layers which could be implemented separately. Compression of course can; just compress blocks rather than files, & maybe serve a virtual block device. Not that blocks need "serving", it's a very simple interface. I've seen deduplication done in the block layer too, in Plan 9's Venti with 56KB blocks. Snapshots I believe could be implemented fairly simply in a block layer too, but if they're implemented in the filesystem itself you can more easily get control of what files and trees are snapshotted. Error correction I'll admit I know nothing about, but all the others can be implemented separately. I'm happy now!
Kaph — a modular OS intended to be easy and fun to administer and code for.
"May wisdom, fun, and the greater good shine forth in all your work." — Leo Brodie
alecco
Posts: 6
Joined: Sat Jun 08, 2019 4:36 am

Re: Database-based file system

Post by alecco »

iansjack wrote:
eekee wrote:Filesystems are perhaps the worst environments for subtle bugs causing misery. sqlite is a lot of well-tested, featureful code you could bolt in with just a straightforward wrapper. It almost makes me wish I was writing a more conventional os. :)
But doesn't sqlite need a filesystem to run on (which rather defeats the purpose)? I thought it needed to create temporary files, so couldn't just use a raw partition.
SQLite has a nice OS portability layer (VFS). It's very easy to plug your own storage in there. SQLite has it's own page layer on top of that.

About temporary files, they are necessary for concurrent access. You don't need that in this case. It would be the equivalent of

Code: Select all

PRAGMA journal_mode = MEMORY
which means change database pages in-place as it has exclusive access.
Post Reply