Ive been thinking about implementing a database in my kernel. It would replace the file system, and other things.
Its a huge task, so I wanted to know what you thought about it. How could it be done and what should it be able to do?
The interface to such a database is very important, I would like to hear some of your ideas on this.
The database must allow very powerful searches, possibly based on something like SQL. It will also need to allow any file format (images, documents, emails) to all be stored in the database. It will need to be multi-user and possibly distributed. May also support XML format.
thanks.
DBFS
RE:DBFS
Well, technically, a filesystem is a database. A database has storage of data, just like a filesystem, except a filesystem uses a table to 'remember' where stuff is instead of sequentially storing one thing after another.
Imagine a filesystem as a one table relational database that contains a file name, type, permissions and pointers to its locations.
>> The database must allow very powerful searches, possibly based on something like SQL.
A filesystem has to provide means to support powerful searches.
Moose.
Imagine a filesystem as a one table relational database that contains a file name, type, permissions and pointers to its locations.
>> The database must allow very powerful searches, possibly based on something like SQL.
A filesystem has to provide means to support powerful searches.
Moose.
RE:DBFS
I know, but as you said, it only has one table. I would like to expand the database idea and have multiple tables, and have files related to each other (other than the directory they are in).
I was thinking of possibly removing away from the idea of directories and simply giving files attributes etc... which could be searched for. Several files could be linked by having common attributes...
Microsofts Longhorn is planning on having something similar to this. Its not an ordinary 1 table file system.
I was thinking of possibly removing away from the idea of directories and simply giving files attributes etc... which could be searched for. Several files could be linked by having common attributes...
Microsofts Longhorn is planning on having something similar to this. Its not an ordinary 1 table file system.
RE:DBFS
Now you're talking about B-tree (balanced tree) filesystems.
They are tables with contents based on a common simularity, which i think is the first letter of the file. Each table is then linked to another to narrow down the list of files.
So the file hello.txt in the below example would be a-n table -> h-n table, then perform regular file searches with the produced list.
It would be a lot more complicated mind you, and B-Trees are self balancing, the idea is to keep tables equal in size and to reduce search times.
a-n table o-z table
| |
a-g h-n o-s t-z
| | | |
= Files in different lists reducing number of potential matches forced to validate search against.
I would think MS longhorn is probably expanding on their ntfs b-tree system in order to compete with xfs better.
Moose.
They are tables with contents based on a common simularity, which i think is the first letter of the file. Each table is then linked to another to narrow down the list of files.
So the file hello.txt in the below example would be a-n table -> h-n table, then perform regular file searches with the produced list.
It would be a lot more complicated mind you, and B-Trees are self balancing, the idea is to keep tables equal in size and to reduce search times.
a-n table o-z table
| |
a-g h-n o-s t-z
| | | |
= Files in different lists reducing number of potential matches forced to validate search against.
I would think MS longhorn is probably expanding on their ntfs b-tree system in order to compete with xfs better.
Moose.
RE:DBFS
Not really.
1 B-Tree is relational data, not a table, each section is not a table, but is a 'node' within that collective table. It is virtually no different in concept from a binary tree or linked list, as they also have nodes. The difference is organization. In a B-Tree system, you have x number of nodes, where x is greater than 2. You would also order it directly by the file name, lexically (such as using strcmp/strcasecmp).
A table is simply a format for information (i.e. struct), which all B-Trees share the same table information, or structure (may also depend on their type).
Also, NTFS does not use B-Trees, it uses B+Trees, which includes key hashing. The problem with B-Trees is that you must still transverse the nodes in order to locate the proper node that you are looking for. With B+Trees, key hasing allows you to rapidly tranverse nodes, without having to pass through all of them.
Also, on the note of a database as a file system, yes, a file system is a form of database. However, there are some differences. File systems do not really 'grow' at a variable rate, they are static in size, they take up the entire partition, or disk, etc (dynamic disks are still no exception as extended partitions are created and virtually affixed onto the previous drive). Databases, on the other hand, consumes space as it needs to, or until the drive is full.
However, database systems can manage their own disks. Microsoft SQL, for example, can utilize a raw disk directly, without the need to store its data files on the actual file system.
Indeed, this is possible. All you need do is create a modern file system, and you have a database. If you want multiple tables, then you could encapsulate the file system within itself. For example, most file systems have at least one or two types of tables. You have a general table, that defines everything, then your table, including each node present (directories, files, etc). Then, you have pointers to the actual data, and possibly, indirect pointers to additional data that could not be defined within the constraints of the previous. Simply add another layer to the equation, and you could have multiple tables. However, I should also say, that multiple tables, is more or less the virtualization of what directories do.
1 B-Tree is relational data, not a table, each section is not a table, but is a 'node' within that collective table. It is virtually no different in concept from a binary tree or linked list, as they also have nodes. The difference is organization. In a B-Tree system, you have x number of nodes, where x is greater than 2. You would also order it directly by the file name, lexically (such as using strcmp/strcasecmp).
A table is simply a format for information (i.e. struct), which all B-Trees share the same table information, or structure (may also depend on their type).
Also, NTFS does not use B-Trees, it uses B+Trees, which includes key hashing. The problem with B-Trees is that you must still transverse the nodes in order to locate the proper node that you are looking for. With B+Trees, key hasing allows you to rapidly tranverse nodes, without having to pass through all of them.
Also, on the note of a database as a file system, yes, a file system is a form of database. However, there are some differences. File systems do not really 'grow' at a variable rate, they are static in size, they take up the entire partition, or disk, etc (dynamic disks are still no exception as extended partitions are created and virtually affixed onto the previous drive). Databases, on the other hand, consumes space as it needs to, or until the drive is full.
However, database systems can manage their own disks. Microsoft SQL, for example, can utilize a raw disk directly, without the need to store its data files on the actual file system.
Indeed, this is possible. All you need do is create a modern file system, and you have a database. If you want multiple tables, then you could encapsulate the file system within itself. For example, most file systems have at least one or two types of tables. You have a general table, that defines everything, then your table, including each node present (directories, files, etc). Then, you have pointers to the actual data, and possibly, indirect pointers to additional data that could not be defined within the constraints of the previous. Simply add another layer to the equation, and you could have multiple tables. However, I should also say, that multiple tables, is more or less the virtualization of what directories do.
RE:DBFS
I dont think you understand what I am trying to say. Im not saying it very well.
Ignore implementation.
Think of a normal relational database, eg. a customer and order database...
Wot if you could integrate any database into the OS database, so that everything, in theory, could be linked to anything else. Essentially, a record in a customer database would be the equivilent of an image file or document, and can be searched for in the same way. An image file could then easily be linked to a customer... This almost totally removes the idea of directories, and the location of data is irrelevant.
If you still dont understand, then I am not able to express my idea.
Ignore implementation.
Think of a normal relational database, eg. a customer and order database...
Wot if you could integrate any database into the OS database, so that everything, in theory, could be linked to anything else. Essentially, a record in a customer database would be the equivilent of an image file or document, and can be searched for in the same way. An image file could then easily be linked to a customer... This almost totally removes the idea of directories, and the location of data is irrelevant.
If you still dont understand, then I am not able to express my idea.
RE:DBFS
It should be possible to create open,close,read,write.... functions that will operate above the database and hide the database, but this would be pointless. It removes most of the power of the database.
For example, the open function would search for a file based on the string passed as an argument. It would then store the unique record id as the file handle, and other operations (read,write) would modify the database record. I would only do this to comply with standards, it would not be how i recommend using the database.
For example, the open function would search for a file based on the string passed as an argument. It would then store the unique record id as the file handle, and other operations (read,write) would modify the database record. I would only do this to comply with standards, it would not be how i recommend using the database.