designing my VFS: where to resolve relative paths and ..

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Hellbender
Member
Member
Posts: 63
Joined: Fri May 01, 2015 2:23 am
Libera.chat IRC: Hellbender

designing my VFS: where to resolve relative paths and ..

Post by Hellbender »

Hi.

I'm designing my first virtual file system API, and was thinking about the best way to handle relative paths and ".", ".." traversing.

I'd like to do all this in the libC side like this pseudo-code:

Code: Select all

fopen("../foo/./bar.txt")
=> library expands into "$(CWD)/../foo/./bar.txt"
=> library normalises into "/home/hellbender/foo/bar.txt"
=> library calls VFS_resolve("/home/hellbender/foo/bar.txt");
Is there some huge downside on doing it like this, instead for example of letting the VFS service to actually traverse the whole thing "/" => "home" => "hellbender" => "somedirectory" => ".." => "foo" => "." => "bar.txt"?

Do you know how this is handled e.g. in linux?
Hellbender OS at github.
ExeTwezz
Member
Member
Posts: 104
Joined: Sun Sep 21, 2014 7:16 am
Libera.chat IRC: exetwezz

Re: designing my VFS: where to resolve relative paths and ..

Post by ExeTwezz »

This is how the path name resolution is done in Linux: http://man7.org/linux/man-pages/man7/pa ... ion.7.html.
Hellbender
Member
Member
Posts: 63
Joined: Fri May 01, 2015 2:23 am
Libera.chat IRC: Hellbender

Re: designing my VFS: where to resolve relative paths and ..

Post by Hellbender »

Thanks.

Reading that, I already see one problem: my method would resolve things like "./doesnotexists/.." to current directory, even when doesnotexist doesn't exist..
Hellbender OS at github.
User avatar
Roman
Member
Member
Posts: 568
Joined: Thu Mar 27, 2014 3:57 am
Location: Moscow, Russia
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by Roman »

I have a small implementation of VFS. I'll be able to send it tomorrow.
"If you don't fail at least 90 percent of the time, you're not aiming high enough."
- Alan Kay
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by Brendan »

Hi,
Hellbender wrote:I'm designing my first virtual file system API, and was thinking about the best way to handle relative paths and ".", ".." traversing.
My theory is, things like "current working directory" and relative paths are none of the VFS's problem and should be handled by the process itself (e.g. part of the C standard library that the process uses). This means the VFS only ever has to care about absolute paths; and different processes (written in different languages with different libraries, for different semantics and/or compatibility requirements) can potentially use very different ways to convert relative paths into absolute paths (or "none" if the process doesn't use relative paths to start with).

More specifically, I think VFS should only ever care about "absolute, canonical paths". What I mean here is that "/foo/../bar" would be illegal and the process/library would need to convert it to the canonical "/bar" instead; and "/foo/mySymbolicLink" refers to the symbolic link itself and never whatever the symbolic link points to.

Note that this avoids a significant amount of unnecessary complexity in the VFS (e.g. consider the situation where there's 5 symbolic links pointing to the current directory and you do "cd .." - to handle this properly you need to keep track which of the 5 parents is the right parent); and the VFS has more than enough complexity that can't be avoided (e.g. keeping its caches synchronised, combined with needing very low overhead, combined with managing mount points, combined with providing a usable "fully asynchronous" interface, combined with...).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
piranha
Member
Member
Posts: 1391
Joined: Thu Dec 21, 2006 7:42 pm
Location: Unknown. Momentum is pretty certain, however.
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by piranha »

More specifically, I think VFS should only ever care about "absolute, canonical paths". What I mean here is that "/foo/../bar" would be illegal and the process/library would need to convert it to the canonical "/bar" instead
Curious: What if foo/ didn't exist? I would consider that an error, but if the userspace were to automatically resolve .. path components, then it would have to check to see if each element of the path exists in order to maintain that error reporting. This could potentially involve a lot of system calls (and therefore overhead) that would be superfluous. I don't think the additional complexity of doing path resolution in kernel-space is that bad.
SeaOS: Adding VT-x, networking, and ARM support
dbittman on IRC, @danielbittman on twitter
https://dbittman.github.io
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by Brendan »

Hi,
piranha wrote:
More specifically, I think VFS should only ever care about "absolute, canonical paths". What I mean here is that "/foo/../bar" would be illegal and the process/library would need to convert it to the canonical "/bar" instead
Curious: What if foo/ didn't exist? I would consider that an error, but if the userspace were to automatically resolve .. path components, then it would have to check to see if each element of the path exists in order to maintain that error reporting. This could potentially involve a lot of system calls (and therefore overhead) that would be superfluous. I don't think the additional complexity of doing path resolution in kernel-space is that bad.
Yes, and no.

For software that expects the idiotic semantics imposed by C's standard library; the C library the process uses would need to check if "/foo" exists while converting (e.g.) "/foo/../bar" into "/bar". However, (for well designed software) this is rare and/or not performance critical anyway (e.g. far less likely than the "already canonical" case); and the overhead of the system call is likely less than the overhead of determining if "/foo" exists (e.g. loading "unlikely to matter" directories from disk or network into VFS cache and all the cache misses involved if the directory is already in cache), so the performance difference between "user space resolves paths" and "VFS resolves paths" is likely to be minor (e.g. a little more than the cost of SYSCALL/SYSRET instructions - maybe 50 cycles added to something that's going to cost 500 to 50000 cycles).

More importantly; for native software, the idiotic semantics imposed by C's standard library can be ignored and there's no need to check if "/foo" exists, so that "500 to 50000 cycles of stupidity" can be avoided, which makes it leaner and faster than doing the checking in the VFS.

Finally; it's also possible for different processes to use something completely different to convert relative paths into canonical absolute paths. For example, if a process is trying to emulate the behaviour of VMS (where "[-]" is used for parent directory and not ".."), or Classic Mac OS (where "::" is used for parent directory) or RISC OS (where "^" is used for parent directory) or AmigaOS (where "foo//bar" is used instead of "foo/../bar") or MS-DOS (where '/' is not a directory separator and you've got a different working directory for each disk drive); where ".." is a legal directory name in some of these cases (no different to "zz"); then the VFS can support all of these without any problem at all because it doesn't have *nix baked into it.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
piranha
Member
Member
Posts: 1391
Joined: Thu Dec 21, 2006 7:42 pm
Location: Unknown. Momentum is pretty certain, however.
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by piranha »

Interesting. I agree, it does depend on if you consider such a path valid or not. Interestingly, gcc and ld do that kind of path screwery all the time. In your system, then, how would you handle roots? (again, somewhat assuming unix semantics) if you chroot and your VFS handles absolutely no relative path, wouldn't that include a path to the root if a process is executing in a chroot'd environment? If it handles storing the "current directory" and "root" locations in userspace, that could be overwritten, and the process could break out.

Another performance issue would be resolving really long paths. If I access 'foo/file' from my current directory, and the kernel handles relative paths, I only need to traverse the current directory for foo and foo for file (getting inodes along the way). If it doesn't, I prepend my current directory and give the VFS an absolute path, forcing it to resolve the entire thing.

Of course, if you're writing totally native software it might not matter (chroot might not even be a thing!), I certainly agree with that. And the versatility and simplicity is definitely good.
SeaOS: Adding VT-x, networking, and ARM support
dbittman on IRC, @danielbittman on twitter
https://dbittman.github.io
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by Brendan »

Hi,
piranha wrote:Interesting. I agree, it does depend on if you consider such a path valid or not. Interestingly, gcc and ld do that kind of path screwery all the time. In your system, then, how would you handle roots? (again, somewhat assuming unix semantics) if you chroot and your VFS handles absolutely no relative path, wouldn't that include a path to the root if a process is executing in a chroot'd environment? If it handles storing the "current directory" and "root" locations in userspace, that could be overwritten, and the process could break out.
I don't support 'chroot' (for my OS design).

If processes use (what they think are) canonical absolute paths; there's no real reason the VFS couldn't support 'chroot' with simple concatenation (e.g. if the root is "/foo" and the process wants "/bar" the result is "/foo/bar"). Simple concatenation isn't possible otherwise (e.g. if the root is "/foo" and the process wants "/.." you'd end up with "/foo/..." and break out of the root) so more complex/slower checking is required.
piranha wrote:Another performance issue would be resolving really long paths. If I access 'foo/file' from my current directory, and the kernel handles relative paths, I only need to traverse the current directory for foo and foo for file (getting inodes along the way). If it doesn't, I prepend my current directory and give the VFS an absolute path, forcing it to resolve the entire thing.
There's 2 separate issues here..

First; if the kernel handles relative paths and you access 'foo/file' from your current directory, then you ask VFS for "foo/file" and the VFS prepends your working directory to it to get "/workDir/foo/file" and then has to check for and handle/remove any "//" and "./" and "../" (even if there are none it still must check); and after all that the VFS has an absolute path it can work with.

If the kernel only handles absolute paths and you access 'foo/file' from your current directory, then you ask the C library for "foo/file" and the C library prepends your working directory to it to get "/workDir/foo/file" and then has to check for and handle/remove any "//" and "./" and "../" (even if there are none it still must check), then the C library asks the VFS for "/workDir/foo/file" and the VFS has an absolute path it can work with. This is almost entirely identical, except that relative is converted to absolute in user-space and not by VFS.

Of course a native application (that doesn't bother having a current directory in the first place) would just ask for the file "/workDir/foo/file" and avoid the unnecessary overhead of converting relative to absolute completely. This is where the majority of the performance advantage is.

The second issue is whether or not directories along the path need to be checked. If the kernel handles absolute paths, then (e.g.) "/this/is/a/really/long/path/hello.txt" can be treated as a path string and a file name string. The path string can be converted to a hash and the kernel can find the directory with a single hash table lookup. When the directory info for the file is in the VFS cache, there's no need to find the "/this" directory, then find the "/this/is" directory, then find "/this/is/a" directory, then... (and no need to pound the living daylights out of the CPU's cache).

If the directory info for the file isn't in the VFS cache, then its going to involve disk IO (asking file system for it, waiting for file system to fetch the directory) and that is going to be the bottleneck; but even in this case VFS can work backwards (e.g. check if "/this/is/a/really/long/" is in the cache, and if it's not try "/this/is/a/really/", and so on) until it finds something that is in the cache and knows which file system to ask for the "currently needed next" directory info. Of course things closer to "/" are more likely to be in the cache, so it's likely that (e.g.) the VFS finds "/this/is" and doesn't have to find "/this".

However; for "non-canonical" paths each of those pieces may be a symbolic link or something, so it's extremely hard to do this efficiently and it's probably better to find the "/this" directory (and check if its a symbolic link or not) then find the "/this/is" directory (and check if its a symbolic link or not) then....

Also note that the main reason I'm suggesting "VFS only sees canonical absolute paths" is that it reduces complexity in the VFS; and I really do think you're grossly underestimating just how insane the complexity really is otherwise. For an example; imagine that "foo/mylink" is a symbolic link to "/bar"; and the first process changes its current/working directory to "/foo/mylink/stuff" and starts reading the directory "things" (which is actually "/foo/mylink/stuff/things", which is actually "/bar/stuff/things"). Then a second process unlinks/"deletes" the symbolic link. After that the first process sees the file "hello.txt" in the directory it's still reading and tries to open the file "things/hello.txt" (which is actually "/foo/mylink/stuff/things/hello.txt", which is actually "/bar/stuff/things/hello.txt"). The programmer who wrote the first process expects the 'open()' to succeed (they've ensured the file exists and the file actually does still exist). Does it work?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
piranha
Member
Member
Posts: 1391
Joined: Thu Dec 21, 2006 7:42 pm
Location: Unknown. Momentum is pretty certain, however.
Contact:

Re: designing my VFS: where to resolve relative paths and ..

Post by piranha »

Also note that the main reason I'm suggesting "VFS only sees canonical absolute paths" is that it reduces complexity in the VFS; and I really do think you're grossly underestimating just how insane the complexity really is otherwise. For an example; imagine that "foo/mylink" is a symbolic link to "/bar"; and the first process changes its current/working directory to "/foo/mylink/stuff" and starts reading the directory "things" (which is actually "/foo/mylink/stuff/things", which is actually "/bar/stuff/things"). Then a second process unlinks/"deletes" the symbolic link. After that the first process sees the file "hello.txt" in the directory it's still reading and tries to open the file "things/hello.txt" (which is actually "/foo/mylink/stuff/things/hello.txt", which is actually "/bar/stuff/things/hello.txt"). The programmer who wrote the first process expects the 'open()' to succeed (they've ensured the file exists and the file actually does still exist). Does it work?
There's a reason I didn't talk about symbolic links, I know how complicated they make things :wink:
First; if the kernel handles relative paths and you access 'foo/file' from your current directory, then you ask VFS for "foo/file" and the VFS prepends your working directory to it to get "/workDir/foo/file" and then has to check for and handle/remove any "//" and "./" and "../" (even if there are none it still must check); and after all that the VFS has an absolute path it can work with.
Another implementation could store simply a directory entry as the current working directory. Then the two would not be identical, instead you would simply get a head start on the userspace version because you get to start resolving the path halfway through, and you don't need to do the working directory each time.
The second issue is whether or not directories along the path need to be checked . . .
Yes, I agree. Again, this depends on the overall design (and of course, they're all valid!). I think it works well in the basic cases (looking up, creating, deleting, all trivial). But it does add complexity to cases like symbolic links changing (how would your design handle symbolic links?), mount points (if supported), chrooting...

I agree that there is a huge amount of additional complexity caused by the VFS handling relative paths as well, but I also believe that handling only absolute paths introduces different complexities if you want to support things like mount points and chroot - I don't think it's a magic bullet. I'm not grossly underestimating the complexity; I do agree that in a lot of cases it is much simpler to do it your way. I'm just saying that there are things / problems to consider for both designs.
SeaOS: Adding VT-x, networking, and ARM support
dbittman on IRC, @danielbittman on twitter
https://dbittman.github.io
Post Reply