DBFS

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
knicos

DBFS

Post by knicos »

Ive been thinking about implementing a database in my kernel. It would replace the file system, and other things.

Its a huge task, so I wanted to know what you thought about it. How could it be done and what should it be able to do?

The interface to such a database is very important, I would like to hear some of your ideas on this.

The database must allow very powerful searches, possibly based on something like SQL. It will also need to allow any file format (images, documents, emails) to all be stored in the database. It will need to be multi-user and possibly distributed. May also support XML format.

thanks.
Moose

RE:DBFS

Post by Moose »

Well, technically, a filesystem is a database. A database has storage of data, just like a filesystem, except a filesystem uses a table to 'remember' where stuff is instead of sequentially storing one thing after another.

Imagine a filesystem as a one table relational database that contains a file name, type, permissions and pointers to its locations.

>> The database must allow very powerful searches, possibly based on something like SQL.

A filesystem has to provide means to support powerful searches.

Moose.
knicos

RE:DBFS

Post by knicos »

I know, but as you said, it only has one table. I would like to expand the database idea and have multiple tables, and have files related to each other (other than the directory they are in).

I was thinking of possibly removing away from the idea of directories and simply giving files attributes etc... which could be searched for. Several files could be linked by having common attributes...

Microsofts Longhorn is planning on having something similar to this. Its not an ordinary 1 table file system.
Moose

RE:DBFS

Post by Moose »

Now you're talking about B-tree (balanced tree) filesystems.
They are tables with contents based on a common simularity, which i think is the first letter of the file. Each table is then linked to another to narrow down the list of files.

So the file hello.txt in the below example would be a-n table -> h-n table, then perform regular file searches with the produced list.
It would be a lot more complicated mind you, and B-Trees are self balancing, the idea is to keep tables equal in size and to reduce search times.

a-n table  o-z table
     |         |
a-g   h-n   o-s t-z
|     |     |   |
= Files in different lists reducing number of potential matches forced to validate search against.

I would think MS longhorn is probably expanding on their ntfs b-tree system in order to compete with xfs better.

Moose.
common

RE:DBFS

Post by common »

Not really.

1 B-Tree is relational data, not a table, each section is not a table, but is a 'node' within that collective table.  It is virtually no different in concept from a binary tree or linked list, as they also have nodes.  The difference is organization.  In a B-Tree system, you have x number of nodes, where x is greater than 2.  You would also order it directly by the file name, lexically (such as using strcmp/strcasecmp).

A table is simply a format for information (i.e.  struct), which all B-Trees share the same table information, or structure (may also depend on their type).

Also, NTFS does not use B-Trees, it uses B+Trees, which includes key hashing.  The problem with B-Trees is that you must still transverse the nodes in order to locate the proper node that you are looking for.  With B+Trees, key hasing allows you to rapidly tranverse nodes, without having to pass through all of them.

Also, on the note of a database as a file system, yes, a file system is a form of database.  However, there are some differences.  File systems do not really 'grow' at a variable rate, they are static in size, they take up the entire partition, or disk, etc (dynamic disks are still no exception as extended partitions are created and virtually affixed onto the previous drive).  Databases, on the other hand, consumes space as it needs to, or until the drive is full.

However, database systems can manage their own disks.  Microsoft SQL, for example, can utilize a raw disk directly, without the need to store its data files on the actual file system.

Indeed, this is possible.  All you need do is create a modern file system, and you have a database.  If you want multiple tables, then you could encapsulate the file system within itself.  For example, most file systems have at least one or two types of tables.  You have a general table, that defines everything, then your table, including each node present (directories, files, etc).  Then, you have pointers to the actual data, and possibly, indirect pointers to additional data that could not be defined within the constraints of the previous.  Simply add another layer to the equation, and you could have multiple tables.  However, I should also say, that multiple tables, is more or less the virtualization of what directories do.
knicos

RE:DBFS

Post by knicos »

I dont think you understand what I am trying to say. Im not saying it very well.

Ignore implementation.
Think of a normal relational database, eg. a customer and order database...

Wot if you could integrate any database into the OS database, so that everything, in theory, could be linked to anything else. Essentially, a record in a customer database would be the equivilent of an image file or document, and can be searched for in the same way. An image file could then easily be linked to a customer... This almost totally removes the idea of directories, and the location of data is irrelevant.

If you still dont understand, then I am not able to express my idea.
knicos

RE:DBFS

Post by knicos »

You appear to understand.
But im not thinking of just a disk file system, but a virtual file system. This database (file system) will include more than just files.
common

RE:DBFS

Post by common »

Okay,

I get what you are saying.  However, I don't see how you would properly support languages that are standard, such as ISO 9899:1999 E (C).  How would you handle if someone wanted to open a file?
knicos

RE:DBFS

Post by knicos »

It should be possible to create open,close,read,write.... functions that will operate above the database and hide the database, but this would be pointless. It removes most of the power of the database.

For example, the open function would search for a file based on the string passed as an argument. It would then store the unique record id as the file handle, and other operations (read,write) would modify the database record. I would only do this to comply with standards, it would not be how i recommend using the database.
Post Reply