Re:Filesystem design
Posted: Wed Jan 12, 2005 3:32 am
okay. I've walked a bit more around the "extensible attributes" framework to get something readable. Here you get it
If you consider a file (e.g. a file entry in a directory or an inode), you may classify informations stored about it as such:
* information that tells where the datablocks of the file actually are.
* information that tells how the file should be used, who made it, when, etc. and that makes sense globally thorough the system -- which i'll call attributes.
* information that tells more about the file (its class, its original source, its "importance" for the user, when it was originally created if created locally, etc) but which are of no use for the system itself, which i'll callmetadata.
Some of the metadata for a file can be made very small ... Even the "mp3 author" can become compact if you use an external table to map an "authorID" to its unicode string. Other metadata are relatively large (like the comments and the original source). What i'd advocate for (and implement as a part of *FS if we come to an agreement on that with Candy) is the ability for the system to handle both attributes and metadata transparently.
Transparently means that the song's author remains the same if you move or rename the file. Most frameworks nowadays fail to offers this (just using the media library of newer winamp proves you shouldn't move your MP3s ever ... RealOne is ever worse)
Let's say now i wish to have a selection that automatically shows "all the songs from author X" based on a directory that'd contain "all the songs". If i have "song author" as a part of the real data (such as an ID3 tag), it means i have to crawl the directory, open all files (descending the indirect index blocks?), check ID3 tag, perform string compare, close files. Rather slow & pathetic.
Storing "author" as a field in a additionnal "metadata stream" about the file will not really help either.
Now what about storing the author's ID (just an int) within the inode itself ? first of all, checking the inode is enough to tell if we keep the file or not for the current selection.
Questions still pending:
* how do we tell what should be kept at inode and what shouldn't ?
* could the inode just be a "cache" for any last-recently-used metadata&attributes ?
* could we just keep latests "accept" or "denied" rules for the file in the inode and the more complex (?) decision rule in the "metadata stream" ?
* how much overhead does the fact that "owner" is a key instead of a hard-coded location involves ? the hardest part of it was to read the data out of the disk anyway ... once in cache ...
If you consider a file (e.g. a file entry in a directory or an inode), you may classify informations stored about it as such:
* information that tells where the datablocks of the file actually are.
* information that tells how the file should be used, who made it, when, etc. and that makes sense globally thorough the system -- which i'll call attributes.
* information that tells more about the file (its class, its original source, its "importance" for the user, when it was originally created if created locally, etc) but which are of no use for the system itself, which i'll callmetadata.
Some of the metadata for a file can be made very small ... Even the "mp3 author" can become compact if you use an external table to map an "authorID" to its unicode string. Other metadata are relatively large (like the comments and the original source). What i'd advocate for (and implement as a part of *FS if we come to an agreement on that with Candy) is the ability for the system to handle both attributes and metadata transparently.
Transparently means that the song's author remains the same if you move or rename the file. Most frameworks nowadays fail to offers this (just using the media library of newer winamp proves you shouldn't move your MP3s ever ... RealOne is ever worse)
Let's say now i wish to have a selection that automatically shows "all the songs from author X" based on a directory that'd contain "all the songs". If i have "song author" as a part of the real data (such as an ID3 tag), it means i have to crawl the directory, open all files (descending the indirect index blocks?), check ID3 tag, perform string compare, close files. Rather slow & pathetic.
Storing "author" as a field in a additionnal "metadata stream" about the file will not really help either.
Now what about storing the author's ID (just an int) within the inode itself ? first of all, checking the inode is enough to tell if we keep the file or not for the current selection.
Questions still pending:
* how do we tell what should be kept at inode and what shouldn't ?
* could the inode just be a "cache" for any last-recently-used metadata&attributes ?
* could we just keep latests "accept" or "denied" rules for the file in the inode and the more complex (?) decision rule in the "metadata stream" ?
* how much overhead does the fact that "owner" is a key instead of a hard-coded location involves ? the hardest part of it was to read the data out of the disk anyway ... once in cache ...