I developed and implemented this idea for my OS a while ago, but I'm going back over it to try and formalize things. I consider it one of the key novel parts of my OS' architecture, and it sort of captures a lot of my OS' design. The basic idea is to have a string format that can identify a resource on the system, in a way that sort of combines the idea of a path, a PID, and a GUID in a human-readable form. This format is tentatively called a resource address. I just want to bounce this concept off the forums to see if people have issues, suggestions, or comments. For any formal language stuff, I'm going to use some informal mix of regular expressions and BNF. I've never had any formal training or real experience in using either, so don't shoot me if something's wrong.
First, you need to understand the concept of a "resource" in my system. A resource is simply something that can have a message sent to it. This means processes, files, directories, links, windows, terminals, block devices, etc. are all resources. This is a pretty standard definition. Processes are collections of one or more resources, each with a unique resource index number (basically an inode number.) Most processes only contain a resource with index 0, which represents that process. Driver processes contain more resources, corresponding to files, directories, and links they manage. When a PID and resource index are combined, they create a "resource pointer" (in C, a 64 bit integer type) that uniquely identifies that resource on the system. Because of the integer format, a PID is a resource pointer to the process it identifies. Resource pointers have a string format that is bijective with their integer representation, and I will mean the string representation when I say "resource pointer" from here on.
The string representation of a resource pointer is defined as follows:
Code: Select all
<resource_pointer> = '@' <pid> | '@' <pid> ':' <index>
<pid> = <uint>
<index> = <uint>
<uint> = "[0-9]+"
There exists a null resource pointer that represents no resource.
So basically, "@42:1234" refers to PID 42, index 1234; "@12" refers to PID 12, index 0.
Paths are pretty much the same as in UNIX. They are sequences of letters delimited by slashes that indicate a sequence of directory entries to follow when looking up a file. I'm going to limit things to absolute paths for now. For the sake of precision, I'm going to give them a grammar too:
Code: Select all
<path> = '/' <dirent> | '/' <dirent> <path>
<dirent> = "[^/]+"
The problem with this definition of paths is that in my system, it is not possible to reliably determine the parent of a directory from just that directory (this is because directories can be hardlinked, and because symbolic links make this very tricky.) So, paths like "/bin/../sbin/halt" are not usable in this form. "/bin/./ls" is also tricky, so let's kill two birds with one stone and define a simplified path as having no "." or ".." dirents:
Code: Select all
<simplified_path> = '/' <clean_dirent> | '/' <clean_dirent> <simplified_path>
<clean_dirent> = "([^/\.]+)|(\.[^/]+)|(\.\.[^/]+)"
It is pretty easy to see that there is an algorithm for transforming a (valid) path into a simplified path, at least once the path is turned into an absolute one. For example, if PWD="/home/nick" and we type "cd ../.", it means "cd /home/nick/../.", and this gets transformed to "cd /home". Let us define a function s:P -> S that turns valid paths into simplified paths.
Resource addresses are a combination between resource pointers and paths, and are a superset of resource pointers. The concept here is that we can represent the VFS as a function that takes resource addresses to resource addresses, ultimately reaching a resource pointer that can then be directly used to reference the resource. Let's define them now:
Code: Select all
<resource_address> = <resource_pointer> | <resource_pointer> <simplified_path>
Clearly, R is a subset of A. The VFS can be thought of as a function f:A -> A that eventually converges to a resource pointer (or does not find the file) by traversing its directory graph. For example, if @1:1 is a directory and @1:2 is a directory at entry "bin" in @1:1 and @1:3 is a file at entry "ls" in @1:2, then f("@1:1/bin/ls") = "@1:2:/ls". Let us define f*:A -> R as the repeated evaluation of f until it returns a resource pointer. So, f*("@1:1/bin/ls") = "@1:3".
It may be easier to think of f(a) where a is a resource address with pointer r and path p as being a DFA with transition function f, current state r, and input queue p. Basically, f is the table of the DFA and a is the execution state of the DFA. When you add symlinks, this makes the VFS into a sort of giant distributed parallel extended DFA thing, which I think is pretty cool.
Thinking of the VFS as this simpler function f instead of f* has some advantages for me. In my system, there is no central VFS server: drivers each manage their own VFS trees. All resources (most importantly directories; for files it is a no-op) implement a "find" function (analogous to f) that performs a single step toward returning the resource pointer. This makes it very simple to implement directories and symbolic links, and would also make it easy to implement something like a tag-based filesystem. The distributedness of the VFS also means that it is safe to allow user processes to present a VFS interface, which means an application can present its own resources as files.
So, what's the point here? There are some pretty cool things I've figured out how to do with this flexible format so far. First, if you allow symbolic links to be redirects to resource addresses instead of just paths, then you can implement mountpoints using symbolic links. Second, references to specific files can be passed around the system without worry that the VFS layout might change or that other processes may be using different roots. Third, it is possible for the user to (safely) construct a pointer to a resource that is not mounted in the filesystem, which is a neat trick for emergency situations and as a C programmer makes me feel all warm and fuzzy inside.
If you were like tl;dr, don't worry. I mostly wrote this to get my ideas down: explaining an idea to someone is one of the best ways to clarify and refine it.