In my current implementation, the kernel accepts only absolute, canonical paths (i.e., “.” and “..” folded out, no extraneous separators, etc.), and handling of relative paths is done entirely in the libc. This also means that the kernel does not have any concept of a current working directory for a process; that is also tracked by libc. Instead, the kernel only tracks the root node (directory) for each process. This approach offers a clean separation of concerns, and seems to work well, since relative paths are easily converted to absolute paths in libc by combining with the current working directory, which is stored in a global variable. Any occurrence of “..” in a path is handled entirely lexically. There is also a system call (fdpath) that provides access to the absolute path that was used to obtain a given file descriptor. This is used for implementing functions like fstatat(), fchdir(), etc.
My understanding is that things work differently in most unix implementations. The kernel usually tracks the current working directory (in addition to the root) for each process, not as a path, but as a vnode. And relative paths are resolved starting at that node. System calls like fstatat() are handled exactly in this way, using the node associated with the passed-in descriptor as the starting point. This means that handling “..” requires the kernel to track each directory’s parent node, which in turn means that there must be a unique parent node for each directory. Presumably, this is one of the reasons why multiple hard links to directories are problematic for most unix implementations. This also means that looking up the current working directory path from the kernel is an involved process, which requires searching each parent directory for the relevant node. The kernel path handling can also lead to confusing results in the presence of symbolic links. This is why most shells implement their own path handling layer to provide more interactive user-friendly behavior.
On the balance, so far I prefer my approach, since it seems to be simpler and more flexible (though perhaps slower), but am interested in hearing people’s thoughts. Quite possibly, there are issues I have not considered or encountered yet.
By the way, recently, I came across a paper by Rob Pike about handling of “..” in Plan 9, which describes a lot of the relevant issues (highly recommended reading). Interestingly, their approach has some similarities to mine (they have an fd2path system call), but they still handle relative and non-canonical paths in the kernel.
Any thoughts/comments on the above are welcome. If you have tried either of these approaches, or any others, am interested in hearing about any lessons learned.
Pathname resolution, “..”, fstatat(), etc.
Re: Pathname resolution, “..”, fstatat(), etc.
This concept fails the moment symlinks containing relative path names enter the mix. Unless you have libc run a userspace version of realpath(3) on every single path.yr wrote:In my current implementation, the kernel accepts only absolute, canonical paths (i.e., “.” and “..” folded out, no extraneous separators, etc.), and handling of relative paths is done entirely in the libc.
That is correct. Note that this also allows the CWD to be inherited without user space cooperation, which is quite the important behavior for many shell utilities. "rm -rf ." means something very different whether it is executed in some deep subdirectory or in the root.yr wrote:The kernel usually tracks the current working directory (in addition to the root) for each process, not as a path, but as a vnode.
Well, one reason. Symlinks are enough of a headache as it is, since they turn the file tree into a directed graph. Hard links on directories would only add more edges to that graph, and ones that are not specially marked.yr wrote:This means that handling “..” requires the kernel to track each directory’s parent node, which in turn means that there must be a unique parent node for each directory. Presumably, this is one of the reasons why multiple hard links to directories are problematic for most unix implementations.
Oh boy, yes. The problem is that after following a symlink, when you go back with "cd .." you will be in a different directory, and most people will be terribly confused by that. So the shell acts as if the symlink was a directory, which again might confuse people, so now there are also options to "cd" you can use to tell the shell which behavior to use.yr wrote:This is why most shells implement their own path handling layer to provide more interactive user-friendly behavior.
Speed you already mentioned. If I am deep in a directory structure, and I want to open .., then your kernel will look up all the directories leading to my current one except the lowest one, whereas most other UNIX implementations will just look up the parent of the CWD, which is a single lookup.yr wrote:On the balance, so far I prefer my approach, since it seems to be simpler and more flexible (though perhaps slower), but am interested in hearing people’s thoughts. Quite possibly, there are issues I have not considered or encountered yet.
Flexible? Not sure what you mean there. The structures we are talking about are quite rigid, and nobody wants to have flexibility in how path names are interpreted. BTW, if /symlink is a symlink to /a/b, and somebody opens /symlink/.., do they get /a or /? Because most would expect /a, and indeed most Unices deliver /a.
Another issue is long path names. Linux (and probably most other Unices, but I haven't read their source codes) places a hard limit of PATH_MAX (4096) on the length of a path given to a system call, but files can have a longer absolute name. You just can't refer to them with the absolute name in one go, but you can chdir() to the midway point or something. But granted, this is highly esoteric. Most people get bored of typing a path name after 100 characters or so.
Symlinks with relative paths I already mentioned. Their semantics are such that it is not acceptable to transform the relative path into an absolute one, since they must retain their relative target even if the directory is moved or mounted elsewhere.
I am going to handle relative paths in kernel; there's just no way around it. An approach I am still undecided on is to turn the CWD into a normal FD. And then just have four default FDs that are inherited from process to process. The reason I am hesitant is that I fear that shell scripts using FDs directly might overwrite FD 3, and then the working directory is gone. Maybe having a vnode that cannot be closed is a good thing. But on the other hand, that approach and all the *at() system calls would immediately rid me of all the special handling for the CWD in path name lookup. But I will need special handling for the root, because that is not supposed to be easily changed. So if I have special handling anyway, what is one more case?yr wrote:Any thoughts/comments on the above are welcome. If you have tried either of these approaches, or any others, am interested in hearing about any lessons learned.
Good thing I won't be at that point for a while longer.
Carpe diem!
Re: Pathname resolution, “..”, fstatat(), etc.
I've actually been thinking about this very issue for the last couple of days. As you say, it necessitates running the equivalent of realpath on every path. Since I've just started adding support for symbolic links, that's what I'll do for now. But the overhead might turn out to be unacceptable, and that would indeed mean switching to the more conventional approach. Correctness should not be an issue though.nullplan wrote:This concept fails the moment symlinks containing relative path names enter the mix. Unless you have libc run a userspace version of realpath(3) on every single path.
Another fair point. I had not thought about the inheritance aspect, though as you mention, this can be handled via cooperation between kernel and user space.nullplan wrote:That is correct. Note that this also allows the CWD to be inherited without user space cooperation, which is quite the important behavior for many shell utilities. "rm -rf ." means something very different whether it is executed in some deep subdirectory or in the root.
It's more flexible in that it can handle multiple parents for a directory vnode without any issues. This could allow for more potential to customize the namespace for a child process, but I suspect symlinks would mess that up as well, since they store paths.nullplan wrote:Flexible? Not sure what you mean there. The structures we are talking about are quite rigid, and nobody wants to have flexibility in how path names are interpreted. BTW, if /symlink is a symlink to /a/b, and somebody opens /symlink/.., do they get /a or /? Because most would expect /a, and indeed most Unices deliver /a.]
That's an interesting one. As you said, esoteric, but worth being aware of. It seems like a natural consequence of handling relative paths in the kernel.nullplan wrote:Another issue is long path names. Linux (and probably most other Unices, but I haven't read their source codes) places a hard limit of PATH_MAX (4096) on the length of a path given to a system call, but files can have a longer absolute name. You just can't refer to them with the absolute name in one go, but you can chdir() to the midway point or something. But granted, this is highly esoteric. Most people get bored of typing a path name after 100 characters or so.
I'm starting to come around to that perspective as well, for some of the reasons you mentioned, and especially because of the complexity symlinks inject. I'll continue with my current approach for now, just to see how far it can go, but will quite possibly change it in the future.nullplan wrote:I am going to handle relative paths in kernel; there's just no way around it.
Do shell scripts often assume FD numbers? That seems to be asking for trouble. As far as I know, apart from stdin, stdout, and stderr, there is no guarantee in POSIX around any particular FD number.nullplan wrote:An approach I am still undecided on is to turn the CWD into a normal FD. And then just have four default FDs that are inherited from process to process. The reason I am hesitant is that I fear that shell scripts using FDs directly might overwrite FD 3, and then the working directory is gone. Maybe having a vnode that cannot be closed is a good thing. But on the other hand, that approach and all the *at() system calls would immediately rid me of all the special handling for the CWD in path name lookup. But I will need special handling for the root, because that is not supposed to be easily changed. So if I have special handling anyway, what is one more case?
Re: Pathname resolution, “..”, fstatat(), etc.
They have little choice. Shell syntax allows you to open a path on a specific FD, but I am not aware of anything that allows you to open a path on a variable FD. For example, you can runyr wrote:Do shell scripts often assume FD numbers?
Code: Select all
exec 3>logfile
Code: Select all
command >&3 2>&3
Code: Select all
exec 3>&-
Then again, very few shell scripts need to open a file and keep it open. So maybe it will be alright in the long run.
And yet, here we are. POSIX also does not prescribe signal numbers, yet the kill command has a numeric argument with prescribed meanings. It also does not prescribe mode constants, yet the chmod command has an octal argument with prescribed meanings. Many things have grown historically and just fit better if you make certain arrangements and don't rock the boat. Then again, rocking the boat is what writing your own OS is about, at least partly.yr wrote:As far as I know, apart from stdin, stdout, and stderr, there is no guarantee in POSIX around any particular FD number.
Carpe diem!
Re: Pathname resolution, “..”, fstatat(), etc.
CWD typically does not change (as in, does not refer to a different inode) even if a component of the path is moved. That kind of behavior cannot be emulated by manipulating paths, you'd have to store something like a dirfd (as nullplan suggested).
Another aspect: on Linux, the CWD of a process can be discovered using /proc/<pid>/cwd. Handling CWD entirely in user space would make this impossible.
Another aspect: on Linux, the CWD of a process can be discovered using /proc/<pid>/cwd. Handling CWD entirely in user space would make this impossible.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].