Random thoughts about filesystems

Jezze · Post by **Jezze** » Tue Nov 15, 2011 5:31 pm

I've been pondering over the design of my VFS for a couple of weeks now. It works pretty much as any other operating system in the sense that it has files and files are grouped in directories. My thoughts here is not about implementation details but more about why everyone seem to be using this sort of arrangement for resources. There is something itching in the back of my head telling me that this can not be the best way to manage resources but at the same time I feel I would be an idiot thinking I could come up with something better.

I've had to go back to basics in a little thought experiment just to come to terms with what the underlying problem is. What is it I'm actually trying to achieve?

1. Need a way to say: This part of memory is a resource of some kind.
2. Need a way to reference it by labeling it in a way that both the computer and I (the user) can understand.
3. Need a way to group resources belonging to each other in some way.
4. Need a way to find a resource as fast as possible.

I can see how files and directories all achieve this. A file is a piece of memory and is labeled by an inode and a name and it is grouped using directories. They can also be found easily by traversing the tree.

Here are my thoughts on the subject though and this is where it might get laughable.

About 1. Not much to say. You need to say this part of memory is a resource. This I can't do without.

About 2. I think from a program perspective it would be enough to have a unique number for a resource like an inode but do I really need to reference resources by name for the user? Why is it so important that resources have a name I wonder. If I would have a lot of metadata about a resource I should be able to know it is the file I'm looking for without having to explicitly look at the name. As long as two resources are not allowed to have the same metadata there can never be disambiguations either.

About 3. Grouping in this case would also just be metadata so I wouldn't need directories.

About 4. I imagine finding a file using metadata could potentially be a lot faster than traversing a directory structure but referencing a certain file would be a nightmare. How would I know how much meta data I would need to supply in order to be sure I get the right file back during a lookup. My solution here would be to filter resources before executing the search. What I thought about was to create something simular to a view that works by only showing me files in the filesystem that fills a criteria i.e. having a certain type of metadata. In this view only a subset of resources are available to you. If you think about it, for most of your tasks you do you are only working on a small set of files at the same time. You are most likely not interested in the entire filesystem all at once. If you switch and work on something else you might be interested in another set of files. Is it actually necessary to have the entire tree of directories available to you the whole time?

I am gonna start doing some experiments on implementing views instead of directories. It will be interesting to see how having different views of the filesystem would work and what problems I would run into. I would need a way to define a view (what metadata is part of it) and how to list views and how to switch between views. I also need some other type of function than fopen() to open files by metadata instead of pathname. Finally I need to evalute if this works better somehow.

Commence laughing.

Jezze · Post by **Jezze** » Tue Nov 15, 2011 6:34 pm

That gave me a bit more clarity into areas I hadn't thought about. I especially like the use of git's hash sum as the key and also that you could stretch the context thinking into more than just covering the filesystem. Would be awesome to have the whole kernel basically context-aware so depending on what type of job you are doing the kernel will let those resources have full attention and perhaps even change things like scheduling or power management.

Great stuff!

Owen · Post by **Owen** » Tue Nov 15, 2011 9:13 pm

I've been building a design around a file system which is built upon "compound document" concepts. Therefore, rather than building around files and directories, my file system design is built around type-tagged objects (...and is therefore more like an object database(*)!)

Every object has the following children:

Child objects
Attributes
Streams

Streams are just bytewise chunks of data, owned by the object, and private to it. Child objects are, well, other objects. Attributes are also other objects - but serve a different purpose. While children are designed to encapsulate components of the object, attributes are designed to encapsulate object meta data (which may be completely unknown to any component which deals with an object type). You would traverse an object's children by their index (probably by using information contained in one of the objects' streams) while you would traverse its attribute by their defined data type

The system uses a COM-style interfaces and components system, so a "data type ID" would be associated with an object which can open it.

From this, one can hopefully see that one could emulate a traditional file system on top of this structure, by defining "File System Directory" and "Generic data" objects implementing their respective interfaces (which store their directory and body in a stream, respectively). However, one can expand this concept - for example, by implementing a "File system directory" object for, say, zip files. The system will also come with a library for transporting such compound object graphs through non native environments (e.g. a network protocol, foreign file systems) whenever the system is not automatically handling this.

(*) With support for an object having one - and only one - owner, though incorporating both "identity weak reference" (always targets that specific object) and "spatial weak reference" (always tracks object at that specific location, i.e. like a symlink) support.

Jezze · Post by **Jezze** » Sun Nov 20, 2011 4:30 pm

Ok, I've written a small prototype just to try it out. I took a lot of inspiration from Git really.

Instead of directories I now have something I call a views that I can change using the cd command just like directories. It changes the view to whatever I choose so a "cd bin" and then "ls" would show me only my user binaries for example. It is actually quite nice that you never have to think about where you are - "cd bin" will always get you to the same place. It feels very simular to doing a "git checkout mybranch".

To translate a normal directory structure to my kind of structure I translate things like /usr/bin/ to become two different views. One called usr and one called bin.

I also had to change my fopen(char *path) to something like fopen(char *view, char *name). It's cool since you now never have to worry about where this program might execute from. The view in contrast to a directory will always be available from anywhere.

Even though this is only a prototype I find the results to be promising. There are tons of work left for this to become more than a prototype but I'm encouraged by the results so far.

turdus · Post by **turdus** » Mon Nov 21, 2011 2:51 am

berkus wrote:... which focus the user on something, and at the same time focus the OS on the same thing, so that using the system in a certain modality is actually easy and rewarding (when you compose music you most probably don't need your sunday picnic pictures getting in the way).

Is this mean your OS is only singleuser? How do you handle the situations where multiple users logged in, and each running a different task, for example one is composing music, and another is sorting picnic pictures? Is it efficient in that scenario?

turdus · Post by **turdus** » Mon Nov 21, 2011 7:24 am

berkus wrote:Do you mean a scenario, when two users are sharing a single keyboard and a single monitor to do two different tasks at the same time? In that sense any currently existing OS is single user.

No, I mean remote access. I suppose you don't want to redesign your kernel after net finally implemented, aren't you?
And you mistaken about how current existing OS handle this, what you write only applies to dos and win*, but not win server. You probably heard of remote desktop and such. Linux, and every other unices was designed to be multi user from the start, that's why I asked. But you've answered my question already, your OS is going to be single user only.

Jezze · Post by **Jezze** » Mon Nov 21, 2011 11:13 am

Thanks for the input again Berkus.

I like the idea that all metadata is of equal importance in describing a blob. To describe the blob in the example you gave earlier I would need to supply the attributes usr, bin and ls.

If I supply only usr and ls. Would I according to your example get the same blob or would I get nothing?

Why I wonder is that if I get nothing that would mean I would always need to supply all attributes when describing the blob I want which would be no different than supplying an absolute path in a directory structure except I can write the individual directories in any order.

On the other hand. If I would get the blob. How can I know it is the same blob I get every time because there might be a blob that has the attributes usr, ls and sbin for instance. Which blob would I actually get?

Jezze · Post by **Jezze** » Mon Nov 21, 2011 2:42 pm

I can sort of imagine what you are describing here. I feel that for me personally I'm not confident enough at this point to make something like that because it might mean taking on more than I can chew. It is my intention to keep it very simple for as long as possible so I don't get stuck early on with something that might take too long to complete so I'll stick to what I have now with the minor change that fopen will only have one argument where you define the view and the blob name in one string.

turdus · Post by **turdus** » Mon Nov 21, 2011 2:43 pm

berkus wrote:Well, your assumption is wrong. Are you able to figure out why?

No, because you wrote

berkus wrote:Do you mean a scenario, when two users are sharing a single keyboard and a single monitor to do two different tasks at the same time? In that sense any currently existing OS is single user.

Which is quite clear about your though on what multi user means, and you states that your OS (as one of any currently existsting OS) is single user. No assumption on my side. Furthermore, after I explained what multi user is, you wrote

berkus wrote:it's relatively easy to support multiple users scenario - just as there are multiple user logins, there are multiple focuses that OS maintains, one per user

which does not an answer my original question at all: is having multiple modalities (aka contexts) on the entire directory tree (which is compressed by the way) efficient? I mean more modality introduces race conditions, which are not relatively easy to support. As a matter of fact, avoiding them is one of the most difficult task an OS faces. Having a mutual exclusion on the entire tree is NOT efficient. Checking which modalities can be exist at a time without interfere is not an easy task either.

But never mind, I'm not interested any more, I'm getting a feeling that you're just brainstorming, you do not have any working implementation of assocfs, so you cannot answer my question on technical details. No offense, there's nothing wrong with planning state, all great software had it once. Just work a little bit more on it, that's all.

Cheers.

OSDev.org

Random thoughts about filesystems

Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems

Re: Random thoughts about filesystems