A Categorical File System
A Categorical File System
Most (if not all) modern file systems use the directory/folder approach when it comes to organizing data on a hard disk. Would it be practical, instead, to use a file system based on categories?
For instance, each category would simply have a table of pointers to files on the disk. This means that the same file could be in multiple categories and a file could be re-categorized by simply removing/adding the pointer to it in a categories table. This means that only one copy of a file is needed on the disk, even if it appears in multiple categories.
I think that a categorical file system would be most advantageous when it comes to indexing and searching. If the user types the word "music" into the search bar, the OS now only has to search the names of each category instead of each file. The user could then browse the categories just as if they were folders. In this case it might bring up several categories including "Music" "Classical-Music" "System-Music" "Deleted-Music" etc.
So, is it a good idea? Is it practical?
For instance, each category would simply have a table of pointers to files on the disk. This means that the same file could be in multiple categories and a file could be re-categorized by simply removing/adding the pointer to it in a categories table. This means that only one copy of a file is needed on the disk, even if it appears in multiple categories.
I think that a categorical file system would be most advantageous when it comes to indexing and searching. If the user types the word "music" into the search bar, the OS now only has to search the names of each category instead of each file. The user could then browse the categories just as if they were folders. In this case it might bring up several categories including "Music" "Classical-Music" "System-Music" "Deleted-Music" etc.
So, is it a good idea? Is it practical?
But, then how do you figure out which categories to use? What if I want to install a very large program with tons of data files, do they go into a data files directory? Now what if I want to un-install, how do I know which files to remove (especially if some where created/modified after the program runs)? If the program is in a single directory, it's easy to realize what files are associated with it. Now, you could have a directory structure and also a listing by category, this would be similar to indexing (but not really). Also, what if you remove the file from the category folder, does it stay in the program folder or removed with it? Tons of design issues to figure out before we can say if it's viable or not, or even useful. Some file systems already support having multiple copies of a single file, and even support partial file sharing (aka, say you have a 10mb file and copy it, then slightly modify it, the FS is smart enough to only store the changed portion seperately while the parts that are the same are saved as one).
Re: A Categorical File System
@Stevo14
I don't see why you couldn't have both.
You mention tables and searching so the whole thing starts to sound like a database. You could simply make a tag system which is composed of a database with a few tables and a kernel/filesystem plugin which extends the virtual filesystem. Make a special /tag/ directory (like /proc/ and /dev/) which queries a database (perhaps in a special file stored at the root of each volume that supports this "tagging") which stores associations between tags and files. Each tag would then be a directory under /tag/ and the "files" in those directories would be dynamically generated (from the database files) symbolic/hard links to the actual files. I guess it might be possible to implement this for Linux as a normal extension to the system.
I don't see why you couldn't have both.
You mention tables and searching so the whole thing starts to sound like a database. You could simply make a tag system which is composed of a database with a few tables and a kernel/filesystem plugin which extends the virtual filesystem. Make a special /tag/ directory (like /proc/ and /dev/) which queries a database (perhaps in a special file stored at the root of each volume that supports this "tagging") which stores associations between tags and files. Each tag would then be a directory under /tag/ and the "files" in those directories would be dynamically generated (from the database files) symbolic/hard links to the actual files. I guess it might be possible to implement this for Linux as a normal extension to the system.
Code: Select all
>> ls -l /tag/metal
lrwx lrwx lwrx 1 luser luser 0 2008-05-22 00:00:00 metalmerchants.mp3 -> /home/luser/music/metalmerchants.mp3
lrwx lrwx lwrx 1 luser luser 0 2008-05-22 00:00:00 deadtonight.mp3 -> /home/luser/music/deadtonight.mp3
If debugging is the process of removing bugs, then programming must be the process of putting them in.
- Edsger W. Dijkstra
- Edsger W. Dijkstra
Ready4Dis wrote:But, then how do you figure out which categories to use? What if I want to install a very large program with tons of data files, do they go into a data files directory?
Field testing the system would probably yield information about which categories are best. Like you said, a "data" category would be far too broad to be practical. Let's say you are installing Firefox. It creates a "Firefox" category and then all of the files that it needs are first written to the disk and then categorized as "Firefox". That would be the base minimum needed for Firefox to run. Then the OS (or the user) would decide if some of the files need to have more categories added in order to make them easier to find. The file "Firefox.exe", for example, could be in both the "Firefox" category and the "internet" category.
The OS would have to differentiate between deleting a file and "un-categorizing" a file. In the Firefox example that I gave above, the uninstaller would tell the file system to delete anything categorized as "Firefox". It could also specify not to remove anything categorized as "firefox-tmp" or "firefox-favorites" if the user wants to save those. Uncategorizing a file only removes it from that category, deleting a file would remove it from all categories.Ready4Dis wrote: Now what if I want to un-install, how do I know which files to remove (especially if some where created/modified after the program runs)? If the program is in a single directory, it's easy to realize what files are associated with it. Now, you could have a directory structure and also a listing by category, this would be similar to indexing (but not really). Also, what if you remove the file from the category folder, does it stay in the program folder or removed with it? Tons of design issues to figure out before we can say if it's viable or not, or even useful.
@esa:
That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
What are the advantages of doing that? Your idea can be easily implemented using symbolic links or hard links. There is no need for a more complex file system structure.Stevo14 wrote:That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
The idea is that the file system structure would be less complex if it is implemented at the disk level instead of on top of an existing file system structure.Korona wrote:What are the advantages of doing that? Your idea can be easily implemented using symbolic links or hard links. There is no need for a more complex file system structure.Stevo14 wrote:That is basically what I'm thinking of except instead of building it on top of an existing file system, it would be implemented at the disk level.
To implement this categorical file system at the disk level you would need 3 basic structures.
-A table of categories, each with a table of pointers to files in that category.
-A table of files.
-A space for the files' data.
As far as file systems go, I think that is pretty simple.
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
File system disk structures should be designed for correctness, then for performance (i.e. -- low disk access latency). Simplicity is nice, but secondary to these goals.
What you're describing, at least from the end user's point of view, can be achieved with metadata indexing, which has been done before. It works really well too.
What you're describing, at least from the end user's point of view, can be achieved with metadata indexing, which has been done before. It works really well too.
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
I don't think that design is much simpler than existing file systems. The categories must be stored in b-trees or a similar structure to archive high performance. That will make your system nearly as complex as a modern file system. Your design is just a file system without subdirectories. I don't think that the implementation of subdirectories is very complex, it probably occupies less than 200 source lines of code.Stevo14 wrote:To implement this categorical file system at the disk level you would need 3 basic structures.
-A table of categories, each with a table of pointers to files in that category.
-A table of files.
-A space for the files' data.
As far as file systems go, I think that is pretty simple.
I agree with the others, this seems more like something to put on top of a file system rather than file system itself. Storing links to files is great, but how do you store said files? What structure are your links/directories? As I said, this is something that can (and has) been done over file systems, which gives it the benefit of being able to use it with more than one file system type (so you can use it on ext2fs, fat32, etc).
What do you mean exactly by "designed for correctness"? Do you mean something like "designed to work"? (that seems rather obvious...)Colonel Kernel wrote:File system disk structures should be designed for correctness, then for performance (i.e. -- low disk access latency). Simplicity is nice, but secondary to these goals.
@everyone_else:
So apparently the consensus is that this should be implemented on top of a file system, not at the disk level... oh well. It was just an Idea anyway.
Re: A Categorical File System
I think it is intermediate between a directory-based file system and a full searcheable, WinFS-type database system. What you describe looks a lot like a filesystem I once designed, that had files as objects on disks, and file lists as a (logical, but user made) grouping on top of that. However, I think an approach to categorize on meta data (i.e. WinFS in some form or another) is better, as it gives the user more freedom to search etc.Stevo14 wrote:So, is it a good idea? Is it practical?
JAL
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
It does seem obvious, but you'd be surprised how many developers start by designing something fast but broken.Stevo14 wrote:What do you mean exactly by "designed for correctness"? Do you mean something like "designed to work"? (that seems rather obvious...)
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
SQL like structures... would be just like directory browsing in 'list' view. nothing new there.edfed wrote:maybe you want to design something using "SQL like" data structures, but as a file system?
am i wrong?
Website: https://joscor.com
Re: A Categorical File System
I think it is not a bad idea. Apple separates catagory info & file data in HFS, and store catagory info as a file.