A Categorical File System

Stevo14 · Post by **Stevo14** » Wed May 21, 2008 11:54 pm

Most (if not all) modern file systems use the directory/folder approach when it comes to organizing data on a hard disk. Would it be practical, instead, to use a file system based on categories?

For instance, each category would simply have a table of pointers to files on the disk. This means that the same file could be in multiple categories and a file could be re-categorized by simply removing/adding the pointer to it in a categories table. This means that only one copy of a file is needed on the disk, even if it appears in multiple categories.

I think that a categorical file system would be most advantageous when it comes to indexing and searching. If the user types the word "music" into the search bar, the OS now only has to search the names of each category instead of each file. The user could then browse the categories just as if they were folders. In this case it might bring up several categories including "Music" "Classical-Music" "System-Music" "Deleted-Music" etc.

So, is it a good idea? Is it practical?

Ready4Dis · Post by **Ready4Dis** » Thu May 22, 2008 1:34 am

But, then how do you figure out which categories to use? What if I want to install a very large program with tons of data files, do they go into a data files directory? Now what if I want to un-install, how do I know which files to remove (especially if some where created/modified after the program runs)? If the program is in a single directory, it's easy to realize what files are associated with it. Now, you could have a directory structure and also a listing by category, this would be similar to indexing (but not really). Also, what if you remove the file from the category folder, does it stay in the program folder or removed with it? Tons of design issues to figure out before we can say if it's viable or not, or even useful. Some file systems already support having multiple copies of a single file, and even support partial file sharing (aka, say you have a 10mb file and copy it, then slightly modify it, the FS is smart enough to only store the changed portion seperately while the parts that are the same are saved as one).

esa · Post by **esa** » Thu May 22, 2008 3:47 am

@Stevo14

I don't see why you couldn't have both.

You mention tables and searching so the whole thing starts to sound like a database. You could simply make a tag system which is composed of a database with a few tables and a kernel/filesystem plugin which extends the virtual filesystem. Make a special /tag/ directory (like /proc/ and /dev/) which queries a database (perhaps in a special file stored at the root of each volume that supports this "tagging") which stores associations between tags and files. Each tag would then be a directory under /tag/ and the "files" in those directories would be dynamically generated (from the database files) symbolic/hard links to the actual files. I guess it might be possible to implement this for Linux as a normal extension to the system.

Code: Select all

>> ls -l /tag/metal

lrwx lrwx lwrx 1 luser luser 0 2008-05-22 00:00:00 metalmerchants.mp3 -> /home/luser/music/metalmerchants.mp3
lrwx lrwx lwrx 1 luser luser 0 2008-05-22 00:00:00 deadtonight.mp3 -> /home/luser/music/deadtonight.mp3

Stevo14 · Post by **Stevo14** » Thu May 22, 2008 5:53 am

Ready4Dis wrote:But, then how do you figure out which categories to use? What if I want to install a very large program with tons of data files, do they go into a data files directory?

Field testing the system would probably yield information about which categories are best. Like you said, a "data" category would be far too broad to be practical. Let's say you are installing Firefox. It creates a "Firefox" category and then all of the files that it needs are first written to the disk and then categorized as "Firefox". That would be the base minimum needed for Firefox to run. Then the OS (or the user) would decide if some of the files need to have more categories added in order to make them easier to find. The file "Firefox.exe", for example, could be in both the "Firefox" category and the "internet" category.

Ready4Dis wrote: Now what if I want to un-install, how do I know which files to remove (especially if some where created/modified after the program runs)? If the program is in a single directory, it's easy to realize what files are associated with it. Now, you could have a directory structure and also a listing by category, this would be similar to indexing (but not really). Also, what if you remove the file from the category folder, does it stay in the program folder or removed with it? Tons of design issues to figure out before we can say if it's viable or not, or even useful.

The OS would have to differentiate between deleting a file and "un-categorizing" a file. In the Firefox example that I gave above, the uninstaller would tell the file system to delete anything categorized as "Firefox". It could also specify not to remove anything categorized as "firefox-tmp" or "firefox-favorites" if the user wants to save those. Uncategorizing a file only removes it from that category, deleting a file would remove it from all categories.

@esa:
That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.

Korona · Post by **Korona** » Thu May 22, 2008 6:02 am

Stevo14 wrote:That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.

What are the advantages of doing that? Your idea can be easily implemented using symbolic links or hard links. There is no need for a more complex file system structure.

Stevo14 · Post by **Stevo14** » Thu May 22, 2008 6:40 am

Korona wrote:
Stevo14 wrote:That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
What are the advantages of doing that? Your idea can be easily implemented using symbolic links or hard links. There is no need for a more complex file system structure.

The idea is that the file system structure would be less complex if it is implemented at the disk level instead of on top of an existing file system structure.

To implement this categorical file system at the disk level you would need 3 basic structures.

-A table of categories, each with a table of pointers to files in that category.
-A table of files.
-A space for the files' data.

As far as file systems go, I think that is pretty simple.

Colonel Kernel · Post by **Colonel Kernel** » Thu May 22, 2008 9:40 am

File system disk structures should be designed for correctness, then for performance (i.e. -- low disk access latency). Simplicity is nice, but secondary to these goals.

What you're describing, at least from the end user's point of view, can be achieved with metadata indexing, which has been done before. It works really well too.

Korona · Post by **Korona** » Thu May 22, 2008 12:20 pm

Stevo14 wrote:To implement this categorical file system at the disk level you would need 3 basic structures.

-A table of categories, each with a table of pointers to files in that category.
-A table of files.
-A space for the files' data.

As far as file systems go, I think that is pretty simple.

I don't think that design is much simpler than existing file systems. The categories must be stored in b-trees or a similar structure to archive high performance. That will make your system nearly as complex as a modern file system. Your design is just a file system without subdirectories. I don't think that the implementation of subdirectories is very complex, it probably occupies less than 200 source lines of code.

Ready4Dis · Post by **Ready4Dis** » Thu May 22, 2008 4:03 pm

I agree with the others, this seems more like something to put on top of a file system rather than file system itself. Storing links to files is great, but how do you store said files? What structure are your links/directories? As I said, this is something that can (and has) been done over file systems, which gives it the benefit of being able to use it with more than one file system type (so you can use it on ext2fs, fat32, etc).

edfed · Post by **edfed** » Thu May 22, 2008 4:20 pm

maybe you want to design something using "SQL like" data structures, but as a file system?

am i wrong?

Stevo14 · Post by **Stevo14** » Fri May 23, 2008 1:49 am

Colonel Kernel wrote:File system disk structures should be designed for correctness, then for performance (i.e. -- low disk access latency). Simplicity is nice, but secondary to these goals.

What do you mean exactly by "designed for correctness"? Do you mean something like "designed to work"? (that seems rather obvious...)

@everyone_else:
So apparently the consensus is that this should be implemented on top of a file system, not at the disk level... oh well. It was just an Idea anyway.

jal · Post by **jal** » Fri May 23, 2008 4:20 am

Stevo14 wrote:So, is it a good idea? Is it practical?

I think it is intermediate between a directory-based file system and a full searcheable, WinFS-type database system. What you describe looks a lot like a filesystem I once designed, that had files as objects on disks, and file lists as a (logical, but user made) grouping on top of that. However, I think an approach to categorize on meta data (i.e. WinFS in some form or another) is better, as it gives the user more freedom to search etc.

JAL

Colonel Kernel · Post by **Colonel Kernel** » Fri May 23, 2008 8:14 am

Stevo14 wrote:What do you mean exactly by "designed for correctness"? Do you mean something like "designed to work"? (that seems rather obvious...)

It does seem obvious, but you'd be surprised how many developers start by designing something fast but broken.

01000101 · Post by **01000101** » Sat May 24, 2008 7:24 am

edfed wrote:maybe you want to design something using "SQL like" data structures, but as a file system?

am i wrong?

SQL like structures... would be just like directory browsing in 'list' view. nothing new there.

xqterry · Post by **xqterry** » Thu Aug 14, 2008 11:47 pm

I think it is not a bad idea. Apple separates catagory info & file data in HFS, and store catagory info as a file.

OSDev.org

A Categorical File System

A Categorical File System

Re: A Categorical File System

Re: A Categorical File System

Re: A Categorical File System