Page 1 of 2

Directory free FS and binary configs.

Posted: Mon Oct 17, 2005 1:16 am
by stonedzealot
Hi everybody, long time no see... err... yeah.

So I recently rewrote Viridis (it's amazing how weird you realize your old code looks after neglecting it all summer) with much tighter code and got it back up to the point where I'm messing DMA and file i/o stuff.

I'm doing fine with the implementation of that, but it brings up the point of a filesystem that's going to have to be chosen soon.

Originally I figured that I'd use ext2, because that's pretty simple and well-documented as well as open source. Then I was thinking of doing something more novel like having a completely abstracted file system. A Database file system.

I've seen it done, particularly in a cool KDE tech demo, but those were all databases built off of the current filesystem. In this the filesystem would be the database. The nicety that this would bring is that the FS would basically be self-organizing.

The downside though is how would I keep this idea from slowing file i/o down to a crawl? I'm not worried about slower procs or anything, I'm just worried about having to search a database every time I'd want to use a file for IPC or logging.

The second thing is an idea I picked up from the enlightenment project (www.enlightenment.org) using binary configuration files that are "compiled" from traditional configs. This allows for sanity and syntax checking long before the config is ever used but does it really save that much speed? Parsing text versus loading up a file and associating a struct to it? Can't imagine it would be too much faster in terms of real-life time.

So what does anyone think about those ideas? Silly, not worth it? Useful?

Re:Directory free FS and binary configs.

Posted: Mon Oct 17, 2005 2:04 am
by Solar
We evaluated DBFS' back when we were doing Pro-POS. There was much talk in favor of them, but very little that made me confident they could be done in a way that is intuitive, backward-compatible (think removable media and network shares using different FS) and efficient.

I believe the benefits of a DBFS are overestimated. Yes, searching files for content is a frequent operation, but I tend to believe that smart index caching can go a long way towards making such searches quicker, without the expected performance hit and the technical novelty (read, instability) involved in a self-designed DBFS.

Looking at how Microsoft tries to introduce a DBFS (WinFS) for, like, two decades now, and still not shipping it with Vista... I think the whole concept is borked. Something a CS PhD can have wet dreams over, but doesn't pass the field tests.

As for binary config files... if you keep the textual config files, you introduce redundancy (not good). If you throw them away after compiling the binaries, you can only change your configs with "appropriate" tools (instead of e.g. reviving a borked installation with the help of a Knoppix CD and some know-how).

XML with Schemes can go a long way towards "verifying" configurations. Plus, there are lots of tools and libraries around to handle XML. Plus, anyone who has written HTML by hand should know how to tinker with XML in a text file.

Re:Directory free FS and binary configs.

Posted: Mon Oct 17, 2005 4:04 am
by JoeKayzA
The downside though is how would I keep this idea from slowing file i/o down to a crawl? I'm not worried about slower procs or anything, I'm just worried about having to search a database every time I'd want to use a file for IPC or logging.
Well, a stream based file system can get fragmented too, AFAIK. Efficient caching of content and query results should mostly eliminate sequential reads.


I spent some thoughts about a DBFS too, some time ago. The main motivation for me was getting rid of directory structures (expecially symbolic links), as well as making the location of objects implicit. The experience for the user should be that he/she has a flat pool of objects which have attributes. The user may search over these objects just like you can 'google' for web contents nowadays. When an object is stored on removable media or a remote server and the user has proper permissions, it will just appear along with the other objects. A user can add custom attributes (like keywords) to objects, as well as create custom relations between them. (Cover.jpg has something to do with Song.mp3 - so whenever I query for Song.mp3, give me a hint to Cover.jpg, for ex.)

The biggest problem to me seems, as Solar stated, compatibility to other, 'traditional' file systems. Queries could be realized: use directory names as keywords, or provide the whole path. Queries for content *could* also be realized in special cases, mainly when the accessed filesystem supports indexing of contents, like HFS+.

Where it starts to get really problematic is importing and exporting of data to and from traditional FSs, IMHO. They use streams to hold all the file data, while my DBFS uses attributes. So you would have to provide some type of import/export plugin for each file format you want to support, and this procedure will be quite resource intensive too. Btw, I'd also need to develop my own formats for every object type (since there are no mainstream DBFSs, there are no common attribute based formats), as well as suitable apps (that's where the mess continues).

So, for now, I decided to stay away from the concept, go back to dreaming and designing, and maybe pick it up again later.

Regarding configuration:
I agree with Solar that XML and schemes probably offer more advantages. At least I am quite comfortable with it (Apache Tomcat configuration).

cheers Joe

btw, we may continue the discussion on DBFSs in more detail, if you like. I'm interested in ideas.

Re:Directory free FS and binary configs.

Posted: Mon Oct 17, 2005 8:27 am
by stonedzealot
@Solar: Well, just because the great big Microsoft couldn't come up with a system viable for their huge billion user OS doesn't mean that I couldn't come up with one for my one user OS. I imagine that the problem with the DBFS idea in terms of Vista was more that it would break an assload of compatibility with everything. I don't have to worry about such things on a large scale... I can just worry about importing and exporting.

With the binary configs, I hadn't thought about the recovery issue. I know we've all been there ::).

@Joe:
You're right that there would have to be import/export for each supported filesystem, but at the same time, how often are you going to need to extract something from the database and put it on another filesystem? Every once in awhile, of course. Then consider with todays monster procs, just how long would it take to transfer, having to extract from the database and inject into ext2 or 3, versus having to just copy? I know that the theoretical time is a huge difference, but what's the reality? I think the export would be a non-issue in terms of real time.

Certainly there could be filesystems you want to share with other operating systems that won't support the DB, but you could just exclude those from the database and index them the traditional method or not at all.

Also in terms of file types and apps, I'm excited about designing the former and I'll probably just end up writing most of the latter so that also seems to be a non-issue.

Re:Directory free FS and binary configs.

Posted: Mon Oct 17, 2005 9:11 am
by Pype.Clicker
My main concern about putting everything in a SQL-like database would be that i wish files to remain offset-able. If your design cannot provide a way to access a user-defined portion of the file (using the common 'fseek', for instance) without requiring that the whole content is retrieved first, then probably you're doing a poor design.

It must be highlighted, too, that storing large things (BLOBs) in database has been a problem for years.

(more comments when i'm no longer in 'time-travelling' mode ... get a look at FLUIDs and my 'ideas' page to get an overview of my usual position about metadata, files, etc.)

Re:Directory free FS and binary configs.

Posted: Mon Oct 17, 2005 11:33 am
by stonedzealot
I don't see how fseek would be impaired. I mean, despite the file's existence within the database, isn't the file itself going to still come down to a list of blocks or chunks on the disk? If that's the case, then fseek would work like normal, hunting for an offset within the blocks.

As far as I know, or am concerned, the only real difference in a DBFS is how things are referenced and organized, but past that it would be mostly the same old thing. For that reason, I don't see why a BLOB would be a problem, the binary information would still be stored as usual, but the reference material on it (it's name and context) would be just like any other file in the database.

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 3:21 am
by Pype.Clicker
wangpeng wrote: I don't see how fseek would be impaired. I mean, despite the file's existence within the database, isn't the file itself going to still come down to a list of blocks or chunks on the disk? If that's the case, then fseek would work like normal, hunting for an offset within the blocks.
well, if the design is proper, then it will indeed be the case.
For that reason, I don't see why a BLOB would be a problem, the binary information would still be stored as usual, but the reference material on it (it's name and context) would be just like any other file in the database.
so basically, your database doesn't contain the data proper, merely the metadata, right ?

About the BLOB stuff, i just had a look at a MySQL administration tool recently and had hard time figuring whether my data should be "tiny text", "small text", "text", "large text", ... Storing variable-length stuff in a database has always trigerred a list of issues considering performances, implementation, etc.

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 4:57 am
by Solar
wangpeng wrote: As far as I know, or am concerned, the only real difference in a DBFS is how things are referenced and organized, but past that it would be mostly the same old thing.
Check out ReiserFS. No, not what you see from Linux user space; the underlying system.

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 7:15 am
by JoeKayzA
wangpeng wrote: I don't see how fseek would be impaired. I mean, despite the file's existence within the database, isn't the file itself going to still come down to a list of blocks or chunks on the disk? If that's the case, then fseek would work like normal, hunting for an offset within the blocks.
So did you intend do use the database just for organizing the files, but still treat a file's data as a flat data stream? Or did you plan to split the data into attributes (like WinFS plans to do, IIRC)? In the latter case, IMO, there is a bigger difference - and this was also my main concern about importing and exporting data.

The idea behind this is to make the file's content explicit, so that it can be queried by the DB - you want to search for things like "author=Joe" on a text document, or "album=Mesmerize" on an audio file. The filesystem tries very hard to search for these terms when they are encoded somewhere in the flat data stream, so the idea is that the FS 'knows' about the type of a file, it knows about which pieces of data it consists of, and can directly search for these.
You're right that there would have to be import/export for each supported filesystem,
For the reasons mentioned above, not only for each supported file system - for each supported file type!

cheers Joe

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 8:03 am
by Solar
wangpeng wrote:...how often are you going to need to extract something from the database and put it on another filesystem? Every once in awhile, of course. Then consider with todays monster procs, just how long would it take to transfer, having to extract from the database and inject into ext2 or 3, versus having to just copy? I know that the theoretical time is a huge difference, but what's the reality? I think the export would be a non-issue in terms of real time.
Samba. Apache. E-Mail attachments. Tar. ZIP. Creating an ISO, saving to removable media, USB stick, MP3 player.

How often does a user have to do a search for file contents, provided the OS and available tools provide a sane level of functionality for organizing in directory hierachies, picture albums and music libraries?

Is a DBFS the best solution to a real problem, or would a better meta-data support in the FS and a couple of tools serve the same purpose, probably on readily available filesystems?

Not trying to discourage you, but making you aware of a couple of questions that need to be asked so you don't waste efforts in vain.

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 1:39 pm
by DennisCGc
How often does a user have to do a search for file contents, provided the OS and available tools provide a sane level of functionality for organizing in directory hierachies, picture albums and music libraries?

Is a DBFS the best solution to a real problem, or would a better meta-data support in the FS and a couple of tools serve the same purpose, probably on readily available filesystems?
Well, I quite agree with Solar, BUT sometimes a database file system would be better than a traditional file system. Just think of high-end servers who need their files quick (with a 'simple' command, like: SELECT content FROM `files` WHERE `filename` LIKE 'foo', you get my point.. ?)

Yes, I'm able to think of a reason why a DBFS would be better. But IMHO having a DBFS on a desktop is overkill. (Sure it would be fun to program a DBFS :D, but overkill)

What I thought up is you can have a read-only file (not even root can change it, only the OS) containing an XML-like document, which describes each file. It is just an idea, which is still under development (and I think it will be like that for a long time). On top of that you can have f.e FAT32 or EXT2.

In short: it really depends what your OS is meant to do. And of course how much time you want to spend on your FS.

Just my 0.02$,

DennisCGc.

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 5:51 pm
by stonedzealot
@Solar1: Checked out Reiser4 a little while ago, pretty neat stuff... definetely worth more than a passing thought

@Joe: It seems a little more intelligent to keep a file as a flat stream all together on the disk than any other way. I'm not sure why you can't have a flat file and information readily available on it inside the database. Using your example, it would keep like

[info]
author = Joe
title=My Masterpiece
blocks = 43, 44, 45..
mimetype = word doc

or
[info]
album = Mesmerize
song = Radio/Video
mimetype = application/ogg
blocks = 46,47,48...

That way the information is in the db and available just like any file in a "normal" operating system....

Also, why on earth would you need an exporter for every format? I mean sure, an intelligent exporter would spit out My\ Masterpiece.doc and Mesmerize - 5 - Radio/Video.ogg, but the file itself would just be like any other file and be accessed just like any other file.

@Solar2:
Again, I don't see why any of these points would require any significant export time, since the files are stored like any other.

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 10:30 pm
by Solar
DennisCGc wrote: Just think of high-end servers who need their files quick (with a 'simple' command, like: SELECT content FROM `files` WHERE `filename` LIKE 'foo', you get my point.. ?)
Honestly? No. How is that SELECT "simpler" than [tt]fopen( "foo", "r" );[/tt]??

Re:Directory free FS and binary configs.

Posted: Tue Oct 18, 2005 10:33 pm
by Solar
wangpeng wrote: @Joe: It seems a little more intelligent to keep a file as a flat stream all together on the disk than any other way. I'm not sure why you can't have a flat file and information readily available on it inside the database. Using your example, it would keep like

[info]
author = Joe
title=My Masterpiece
blocks = 43, 44, 45..
mimetype = word doc
...
What you are describing is not a DBFS, but a metadata database. The metadata is kept redundant from metadata that might be available from the file format itself (A Bad Thing (tm)), and has to be generated some way...

Re:Directory free FS and binary configs.

Posted: Wed Oct 19, 2005 6:35 am
by stonedzealot
So it's bad merely because it's redundant? Or because it takes up space and is redundant?