A Categorical File System

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
User avatar
Stevo14
Member
Member
Posts: 179
Joined: Fri Mar 07, 2008 3:40 am
Location: Arad, Romania

A Categorical File System

Post by Stevo14 »

Most (if not all) modern file systems use the directory/folder approach when it comes to organizing data on a hard disk. Would it be practical, instead, to use a file system based on categories?

For instance, each category would simply have a table of pointers to files on the disk. This means that the same file could be in multiple categories and a file could be re-categorized by simply removing/adding the pointer to it in a categories table. This means that only one copy of a file is needed on the disk, even if it appears in multiple categories.

I think that a categorical file system would be most advantageous when it comes to indexing and searching. If the user types the word "music" into the search bar, the OS now only has to search the names of each category instead of each file. The user could then browse the categories just as if they were folders. In this case it might bring up several categories including "Music" "Classical-Music" "System-Music" "Deleted-Music" etc.

So, is it a good idea? Is it practical?
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

But, then how do you figure out which categories to use? What if I want to install a very large program with tons of data files, do they go into a data files directory? Now what if I want to un-install, how do I know which files to remove (especially if some where created/modified after the program runs)? If the program is in a single directory, it's easy to realize what files are associated with it. Now, you could have a directory structure and also a listing by category, this would be similar to indexing (but not really). Also, what if you remove the file from the category folder, does it stay in the program folder or removed with it? Tons of design issues to figure out before we can say if it's viable or not, or even useful. Some file systems already support having multiple copies of a single file, and even support partial file sharing (aka, say you have a 10mb file and copy it, then slightly modify it, the FS is smart enough to only store the changed portion seperately while the parts that are the same are saved as one).
User avatar
esa
Posts: 19
Joined: Tue May 20, 2008 1:25 pm
Location: Finland

Re: A Categorical File System

Post by esa »

@Stevo14

I don't see why you couldn't have both.

You mention tables and searching so the whole thing starts to sound like a database. You could simply make a tag system which is composed of a database with a few tables and a kernel/filesystem plugin which extends the virtual filesystem. Make a special /tag/ directory (like /proc/ and /dev/) which queries a database (perhaps in a special file stored at the root of each volume that supports this "tagging") which stores associations between tags and files. Each tag would then be a directory under /tag/ and the "files" in those directories would be dynamically generated (from the database files) symbolic/hard links to the actual files. I guess it might be possible to implement this for Linux as a normal extension to the system.

Code: Select all

>> ls -l /tag/metal

lrwx lrwx lwrx 1 luser luser 0 2008-05-22 00:00:00 metalmerchants.mp3 -> /home/luser/music/metalmerchants.mp3
lrwx lrwx lwrx 1 luser luser 0 2008-05-22 00:00:00 deadtonight.mp3 -> /home/luser/music/deadtonight.mp3
If debugging is the process of removing bugs, then programming must be the process of putting them in.
- Edsger W. Dijkstra
User avatar
Stevo14
Member
Member
Posts: 179
Joined: Fri Mar 07, 2008 3:40 am
Location: Arad, Romania

Post by Stevo14 »

Ready4Dis wrote:But, then how do you figure out which categories to use? What if I want to install a very large program with tons of data files, do they go into a data files directory?

Field testing the system would probably yield information about which categories are best. Like you said, a "data" category would be far too broad to be practical. Let's say you are installing Firefox. It creates a "Firefox" category and then all of the files that it needs are first written to the disk and then categorized as "Firefox". That would be the base minimum needed for Firefox to run. Then the OS (or the user) would decide if some of the files need to have more categories added in order to make them easier to find. The file "Firefox.exe", for example, could be in both the "Firefox" category and the "internet" category.
Ready4Dis wrote: Now what if I want to un-install, how do I know which files to remove (especially if some where created/modified after the program runs)? If the program is in a single directory, it's easy to realize what files are associated with it. Now, you could have a directory structure and also a listing by category, this would be similar to indexing (but not really). Also, what if you remove the file from the category folder, does it stay in the program folder or removed with it? Tons of design issues to figure out before we can say if it's viable or not, or even useful.
The OS would have to differentiate between deleting a file and "un-categorizing" a file. In the Firefox example that I gave above, the uninstaller would tell the file system to delete anything categorized as "Firefox". It could also specify not to remove anything categorized as "firefox-tmp" or "firefox-favorites" if the user wants to save those. Uncategorizing a file only removes it from that category, deleting a file would remove it from all categories.

@esa:
That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Post by Korona »

Stevo14 wrote:That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
What are the advantages of doing that? Your idea can be easily implemented using symbolic links or hard links. There is no need for a more complex file system structure.
User avatar
Stevo14
Member
Member
Posts: 179
Joined: Fri Mar 07, 2008 3:40 am
Location: Arad, Romania

Post by Stevo14 »

Korona wrote:
Stevo14 wrote:That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
What are the advantages of doing that? Your idea can be easily implemented using symbolic links or hard links. There is no need for a more complex file system structure.
The idea is that the file system structure would be less complex if it is implemented at the disk level instead of on top of an existing file system structure.

To implement this categorical file system at the disk level you would need 3 basic structures.

-A table of categories, each with a table of pointers to files in that category.
-A table of files.
-A space for the files' data.

As far as file systems go, I think that is pretty simple.
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Post by Colonel Kernel »

File system disk structures should be designed for correctness, then for performance (i.e. -- low disk access latency). Simplicity is nice, but secondary to these goals.

What you're describing, at least from the end user's point of view, can be achieved with metadata indexing, which has been done before. It works really well too.
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Post by Korona »

Stevo14 wrote:To implement this categorical file system at the disk level you would need 3 basic structures.

-A table of categories, each with a table of pointers to files in that category.
-A table of files.
-A space for the files' data.

As far as file systems go, I think that is pretty simple.
I don't think that design is much simpler than existing file systems. The categories must be stored in b-trees or a similar structure to archive high performance. That will make your system nearly as complex as a modern file system. Your design is just a file system without subdirectories. I don't think that the implementation of subdirectories is very complex, it probably occupies less than 200 source lines of code.
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

I agree with the others, this seems more like something to put on top of a file system rather than file system itself. Storing links to files is great, but how do you store said files? What structure are your links/directories? As I said, this is something that can (and has) been done over file systems, which gives it the benefit of being able to use it with more than one file system type (so you can use it on ext2fs, fat32, etc).
User avatar
edfed
Member
Member
Posts: 42
Joined: Wed Apr 09, 2008 5:44 pm
Location: Mars

Post by edfed »

maybe you want to design something using "SQL like" data structures, but as a file system?

am i wrong?
welcome in my dream.
User avatar
Stevo14
Member
Member
Posts: 179
Joined: Fri Mar 07, 2008 3:40 am
Location: Arad, Romania

Post by Stevo14 »

Colonel Kernel wrote:File system disk structures should be designed for correctness, then for performance (i.e. -- low disk access latency). Simplicity is nice, but secondary to these goals.
What do you mean exactly by "designed for correctness"? Do you mean something like "designed to work"? (that seems rather obvious...)

@everyone_else:
So apparently the consensus is that this should be implemented on top of a file system, not at the disk level... oh well. It was just an Idea anyway.
:wink:
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: A Categorical File System

Post by jal »

Stevo14 wrote:So, is it a good idea? Is it practical?
I think it is intermediate between a directory-based file system and a full searcheable, WinFS-type database system. What you describe looks a lot like a filesystem I once designed, that had files as objects on disks, and file lists as a (logical, but user made) grouping on top of that. However, I think an approach to categorize on meta data (i.e. WinFS in some form or another) is better, as it gives the user more freedom to search etc.


JAL
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Post by Colonel Kernel »

Stevo14 wrote:What do you mean exactly by "designed for correctness"? Do you mean something like "designed to work"? (that seems rather obvious...)
It does seem obvious, but you'd be surprised how many developers start by designing something fast but broken.
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
User avatar
01000101
Member
Member
Posts: 1599
Joined: Fri Jun 22, 2007 12:47 pm
Contact:

Post by 01000101 »

edfed wrote:maybe you want to design something using "SQL like" data structures, but as a file system?

am i wrong?
SQL like structures... would be just like directory browsing in 'list' view. nothing new there.
xqterry
Posts: 5
Joined: Sat May 10, 2008 12:18 pm

Re: A Categorical File System

Post by xqterry »

I think it is not a bad idea. Apple separates catagory info & file data in HFS, and store catagory info as a file.
Post Reply