Page 1 of 2

BrainStorming: what would you put in a FileSystem?

Posted: Thu Apr 24, 2003 2:57 am
by Pype.Clicker
I'm a bit disappointed we're so many OS developers and talk so little about design issue ... I know very little of us are planning to implement a new FS, but if you had to design one, what kind of new features would you introduce ?

I generally consider the use of file has become more and more complicated in a modern OS (who haven't wasted precious seconds looking for a file lost somewhere ?)... I would like a system where i just have to mention i'm looking for "Intel Manuals" and the system would find it regardless of where it has been stored and under which name (did i kept the cryptic serial documentation number from Intel ? well, the system remember the content of the <a> ... </a> tags i downloaded it and that can help me finding back the file ... or did it just read the first pages of the PDF document to extract the title ?)

Is the document no more on my disk ? the system keep a track of where i archived it (cdrom title and burning date :) or where i get it from (oh yeah . we had that file, it was coming from developer.intel.com, but now we don't have it anymore ...)

This can certainly be implemented on top of EXT3 or NTFS, but i think a few extra features at the FS level could help:
  • use of strict (i mean unmodifiable by the user and correctly guessed by the system) and hierarchised file classes : a PDF file is an extension of Document (and therefore it has a title and an author), and it implements the TableOfContent and Printable interfaces ...
  • store extra information that are class-specific aside of the file. No need to read the file content to know what's the title (or only once when the file is imported on the system) of a document or to know the list of files #included by a C source or the images requested by an HTML document.
  • implements both index-accessed (get bytes from position 0x1234 to 0x5678) and key-accessed (get bytes for the record that has key "cyborg jeff") files ...

Re:BrainStorming: what would you put in a FileSystem?

Posted: Thu Apr 24, 2003 9:03 am
by Curufir
Hmm, I'd disagree with some of those suggestions. From my point of view the filesystem exists to store and maintain data in a known manner, not to manipulate that data.

Searching for data inside files, caching filenames etc are all just internal data from the fileserver IMHO.

I'd much prefer to see a filesystem that can access files of any size quickly, recover from system crashes without data integrity loss (Which isn't even in your poll and rates very highly for me), and not take up half my disk with system information (Hint: I can't even be bothered to fill out meta-information for my MP3s, there's just no way I'm going to fill it out for anything else, and auto collection of that information will be flawed at best).

Re:BrainStorming: what would you put in a FileSystem?

Posted: Thu Apr 24, 2003 10:27 am
by pskyboy
Hey

I'm planning to implement my own FS. I was thinking very much along the same sort of lines as you Pype i was thinking of appending meta data to all my files and having finding files be done by using meta data rather then filenames.

I have noticed a lot of people have been thinking along the same lines as me in terms of what i want to do in my OS and i am wondering if its time a few of us got together and developed a full OS with these new ideas. I mean ideally i would like ot develop it all myself but there is just to much to do and if i ever managed to finish it someone else would probably beat me to implementing the new ideas. I have got to teh point now where i just want to see my ideas implemented as i think they could make a big difference to the way PC's are used.

Peter

Re:BrainStorming: what would you put in a FileSystem?

Posted: Thu Apr 24, 2003 5:47 pm
by _mark
Sounds like a maintenance nightmeer to me. What happens when you come accross a file you do not recognize. Or perhaps a very very large tar file that is just a backup of someones harddrive, do you re-index the whole thing? There are many many "what happens when" lurking in this one.

_mark()

Re:BrainStorming: what would you put in a FileSystem?

Posted: Fri Apr 25, 2003 1:46 am
by Pype.Clicker
_mark() is right by saying that i raises a lot of questions, which finally comes up to the user ... "Will i keep a separate copy of the images for every webpage or will i ask my computer to find which one are the same and just use links to them ? If i use same links, do i want to use it immediately, or can i delay the comparison ? ..."

I think such kind of decision can be abstracted in a POLICY for the HTML file class (just as indexing incoming tarz can be a policy aswell). The policy is a piece of software that can be tuned (using some dialog boxes, whatever) by the user -- kind of plugins executed when files are imported.

Now i have to admit that it may become as frustrating as MS Word's auto-translation of your text. But the automated operations could be limited to a subset of your hard disk (for instance, no author/title indexing will be made until you move a music to the MusicCollection ...)

btw, remember this is a brainstorm. Feel free to come with your OWN ideas: you're not limited to discuss mine :)

Re:BrainStorming: what would you put in a FileSystem?

Posted: Fri Apr 25, 2003 5:58 am
by damonbrinkley
I believe security is a big deal this day and age. People will always want speed to you'll need to find a happy medium between security and speed. Something journaling or similar is definitely a must for those time when your OS bombs out and you want a quick recovery.

I would start with a base of the major features in filesystems today and then add onto it from there.

Re:BrainStorming: what would you put in a FileSystem?

Posted: Fri Apr 25, 2003 11:57 am
by jamescox3k
I like the idea of meta data. I know i wouldn't use it for any old file. But for the ones taht are important, It could rapidly increase seach speeds. Also I think some kind of extencibility (I think thats the word) would be could where you can ezily add new fetures to your system without make older formatted devices function incorectly. For example instaed of FAT12, FAT16 and FAT32 you'd just have some kind of extrad data that could be ignored by an OS that only understands FAT12. Obviously this would make design a bit more complicated.

Re:BrainStorming: what would you put in a FileSystem?

Posted: Fri Apr 25, 2003 5:27 pm
by soap
I'm planning to take the meta-data/document type hierarchy idea to what seems to me to be its logical conclusion and implement something similar to the unix VFS, except everything is an object rather than a file.

Every object (file, device, program, etc) in the system would belong to a strictly defined type hierarchy and export:

* a list of properties/attributes, making it easy to search for or index meta-data.
* a list of methods callable by other objects in the system.
* a list of any other objects contained within or provided by the object. This should make it easy to implement directories/mount points/foreign file systems/Hurd style translators/multiple data streams/bundles/whatever.

Can anyone see a down side to this besides the various tricky security/permissions issues?

Re:BrainStorming: what would you put in a FileSystem?

Posted: Sat Apr 26, 2003 3:38 am
by pskyboy
To expand on my ideas i plan to build a Layered file system. I am designing my OS to work over networks and hence my file system will follow this seem design strategy. I plan to have a NFL (Network File Layer) on top of which lies the File System in use which will be a versioning Journaled file system.

Peter

DF-FS v2.. or v3...

Posted: Sun Apr 27, 2003 2:41 am
by df
After doing a first version of my FS a while ago (the lovely named df-fs...), i made several conclusions doing my implementation...

1 - i want to be able to do dynamic cluster sizing, from 64bytes to high range. i thought mostly, after doing stats on my windows FS, 64 bytes, 512, 2kb, 8kb, 16kb and 4mb.
(so file in directory A might be using 64byte clusters, but another file in same directory might be using 4mb clusters.. and another file my be using a combination of all of them!)

2 - extents are good. extents are your friend. use extents!

3 - fine grained security (acl list per file/etc) can be a real pain to manage and implement.. so much so.. its better to implement that as a standard file in the FS relating to the OS, rather than embedding it into the FS itself.

4 - inodes and bitmapping consume an enormous amount of space if you let them.

i'm going to implement df-fs-2 on top of a DB style system. the DB layer will handle physical read/write. directories become libraries. files become tables. data can be written as true DB entries (create a file form a scheme), or raw data via an IFS.

keeping transparency to the user, so they never know. two manipulation meothds will be used, fallback (aka old standard, this looks like a file system to me mode, and db-style.)

how many files do you have that are purly raw info? no headers, no meta-data, etc? look at mp3, they are all 'frames' and a bunch of headers, and meta-data at the end.

zip files, tar files, apache.conf files, mbox email files, etc. Most files supported by applications are DB style files in some way. Even executable files are loaded with header, split into 4kb pages, etc. Most files use some kind of header/records/meta storage system. even it its just text files (ini files), etc.

using a DB layer at the base, also allows me to dynamically size things. disk-space-free becomes fluid. your not limited to seeing just a 'partition', it kinda becomes an LVM. add another partition to it, and you dont have to resize or anything.

getting right off track ;) i'm looking at having a phyiscal disk IO layer, DB layer, FS layer, VFS layer.

apps that dont know about the DB talk
vfs->fs->db->io

apps that talk DB
db->io

by having the DB layer, I get all the benefits of the DB. transacitons where required, indexing + search capabilities, dynamic meta-data + schemas, etc, etc.

i might throw in a NUDA (?? non-unified disk architecture?? hmmm) stuff and insert a VDB layer above the DB.

you could then specifiy say, order of use.
so if you were using local hard disks, an old external 8bit scsi disk and a compact flash card for storage, they all have different speeds, in theory, nothing stopping you from using them all as storage, but naturally you dont want to use the slowest one (CF card) first, etc...

the vdb layer could also intercept calls where you are using local hard disks on a machine thats on your lan.. but then other layers come into play (transport, etc)...

mm i'm getting carried away now...

Multiple file roots

Posted: Sun Apr 27, 2003 3:04 am
by Nice
Hi folks,

this is slightly off topic: less about the binary of the fs, but about how applications access it. Sorry, but I am itching to explain my idea! ;-)

wouldn't it be great for:

Code: Select all

FILE *f = fopen("ftp://www.myos.org/mission.html","rwb");
to work?

In dos/windows, there are multiple file roots: each representing a physical media. You can have logical media too, subst etc, but anyway.

In unix etc, you have a single unified VFS. All media, physical or logical, is just a branch.

It is my observation that, in this internet-connected world, URIs are pervasive. A user isn't scared of an http:// address. Nor ftp:// nor file:// nor anything:// else.

I propose that there are several virtual file systems. Each is separated by the protcol (file, http, ftp etc). Applications need not actually know what is local and what isn't. Extra queriable attributes on a FILE orbject such as 'bool remote()' and 'int cost()' might help applications make sensible adaptive behaviour, but if they don't bother they'll still work. The idea of a cost metric with file access is useful across local file systems too (imagine mounting an archive file locally).

The actual top-level-protocols available would be loaded by plugins to the VFS (ok. this is one unified VFS, the very root node just isn't navigable). Each is still a VFS, and so can mount/submount under it (e.g. mounting a zip file transparently to browse into it).

This idea of virtual file system isn't particularly new. KDE for example has it. There is even libferris. I am sure other examples can be plucked. But I am advocating an OS that does this for you, so apps don't need to use special libraries, and so that support is across the board instead of on an app-by-app basis.

Imagine installing a new "sftp" root plugin on your box. suddenly you can fire up your favourite text editor and tell it to open an sftp:// document, and it just works. Even though the text editor has no idea of how it got that sftp document.

Authentication would be handled out-of-band by the VFS. Normally this would mean a system dialog prompt if credentials haven't been cached.

Remote filesystems might benefit from caching. This is placing your cache into the system rather than into the browser. Ever been annoyed that you browse a webpage, then tell your browser to 'view source' or something and the system then downloads the page _again_ into a file and starts the source viewer on that file? Messy and slow and ...

There is still room for apps that specialise in a particular protocol to do their thing with their custom libraries, rather than using the generic lowest-common-denominator abstraction provided by the VFSes. But 99% of apps will benefit from the setup I've described above.

Re:DF-FS v2.. or v3...

Posted: Wed Apr 30, 2003 3:08 am
by Pype.Clicker
df wrote: After doing a first version of my FS a while ago (the lovely named df-fs...), i made several conclusions doing my implementation...

1 - i want to be able to do dynamic cluster sizing, from 64bytes to high range. i thought mostly, after doing stats on my windows FS, 64 bytes, 512, 2kb, 8kb, 16kb and 4mb.
(so file in directory A might be using 64byte clusters, but another file in same directory might be using 4mb clusters.. and another file my be using a combination of all of them!)
hmm ... so you might have several clusters in a single sector ? this could be nice for very small files indeed, but if your small files is a bit larger - say 120 bytes - how will you tell which cluster is next (i mean won't you need too much cluster ids)
Do you think of pre-splitting the disk with N MB reserved for 64-byte clusters (with its own free marking / cluster chaining scheme) ?
Could a file span across several clustering scheme ? (i mean if my file is actually 1MB+12 byte, will it consume 257 4KB clusters, or can i leave the last 12 bytes in a 64-byte cluster and use larger clusters for the rest of the file ?)
2 - extents are good. extents are your friend. use extents!
what do you mean by "extents" (sorry for my ignorance, here ...) ?
3 - fine grained security (acl list per file/etc) can be a real pain to manage and implement.. so much so.. its better to implement that as a standard file in the FS relating to the OS, rather than embedding it into the FS itself.
thus having a kind of permissions database file for a given domain (or am i completely wrong here ?)
4 - inodes and bitmapping consume an enormous amount of space if you let them.
Do they ? i mean i admit that inode consume blocks, but usually it's just one block per file, and files are usually larger than a few blocks. If you don't have i-node (this is, information blocks telling the file size, first cluster, etc.) how can you have consistent hard links ? (or maybe you think hard links don't worth the case and have a single directory pointing to your file ...)

Also bitmapping (i suppose you mean keeping one bit to tell whether a given cluster is used or freed) takes fixed amount of memory, but what else would you suggest to make sure there is always enough memory to store the usage information (never finding in the situation where a sector cannot be used anymore because there is no place to mark it as used, which could potentially occur with lists, etc) ?

Re:BrainStorming: what would you put in a FileSystem?

Posted: Wed Apr 30, 2003 3:53 pm
by smurf975
I agree with the others.

Filesystems should just file the data as needed and keep the data intergrity.

However you can add plugins to the filesystems. Like a indexing/search engine to search for that intell manuell.

<Windows 2000 indexing service can do this to some extend. And noone is stopping third parties for doing it better>

Or else should the memory managment care what kind of data you store in the RAM. Should it tag it as a word document or mp3?

I'm for: keep it simple and stupid and let users decide with plugins what extra functionalty they need.

Smurf

Re:BrainStorming: what would you put in a FileSystem?

Posted: Wed Apr 30, 2003 5:02 pm
by Pype.Clicker
alright, of course things like automatic indexing, presenting ZIP file as folders, etc. must be plug-ins to the file system ...

But, if you knew you'll need a lot of key->record files to handle such plugins (just think about it: what is a directory ? a key-> record file !) and if you knew that low-level techniques (i mean techniques that operate at the same level as the FAT, directly manipulating pointers to sectors) that may improve the search in a directory dramatically (like making it O(log N) instead of O(N) -- thus searching a 1M-entries file takes 20 operations instead of 1 000 000), would you still say "keep it simple and stupid" ?

Re:BrainStorming: what would you put in a FileSystem?

Posted: Wed Apr 30, 2003 5:18 pm
by smurf975
Well if you are heading that way why not put everthing in kernel space? Like MS is doing with its IIS.

I mean to say, yes you can have speed improvements by embedding it but that brings also costs with it.

But this all depends on what you want to accomplish. Are you using it as multi user database server or are you doing a simple desktop OS?

Simple desktop users can just wait. They have time and they can't be bothered with to complex mechanismes that will also add new bugs.

But as you explained in your last post. I can't say I'm against it but just say watch out.

Me