Pathname resolution, “..”, fstatat(), etc.
Posted: Tue Oct 19, 2021 3:57 pm
In my current implementation, the kernel accepts only absolute, canonical paths (i.e., “.” and “..” folded out, no extraneous separators, etc.), and handling of relative paths is done entirely in the libc. This also means that the kernel does not have any concept of a current working directory for a process; that is also tracked by libc. Instead, the kernel only tracks the root node (directory) for each process. This approach offers a clean separation of concerns, and seems to work well, since relative paths are easily converted to absolute paths in libc by combining with the current working directory, which is stored in a global variable. Any occurrence of “..” in a path is handled entirely lexically. There is also a system call (fdpath) that provides access to the absolute path that was used to obtain a given file descriptor. This is used for implementing functions like fstatat(), fchdir(), etc.
My understanding is that things work differently in most unix implementations. The kernel usually tracks the current working directory (in addition to the root) for each process, not as a path, but as a vnode. And relative paths are resolved starting at that node. System calls like fstatat() are handled exactly in this way, using the node associated with the passed-in descriptor as the starting point. This means that handling “..” requires the kernel to track each directory’s parent node, which in turn means that there must be a unique parent node for each directory. Presumably, this is one of the reasons why multiple hard links to directories are problematic for most unix implementations. This also means that looking up the current working directory path from the kernel is an involved process, which requires searching each parent directory for the relevant node. The kernel path handling can also lead to confusing results in the presence of symbolic links. This is why most shells implement their own path handling layer to provide more interactive user-friendly behavior.
On the balance, so far I prefer my approach, since it seems to be simpler and more flexible (though perhaps slower), but am interested in hearing people’s thoughts. Quite possibly, there are issues I have not considered or encountered yet.
By the way, recently, I came across a paper by Rob Pike about handling of “..” in Plan 9, which describes a lot of the relevant issues (highly recommended reading). Interestingly, their approach has some similarities to mine (they have an fd2path system call), but they still handle relative and non-canonical paths in the kernel.
Any thoughts/comments on the above are welcome. If you have tried either of these approaches, or any others, am interested in hearing about any lessons learned.
My understanding is that things work differently in most unix implementations. The kernel usually tracks the current working directory (in addition to the root) for each process, not as a path, but as a vnode. And relative paths are resolved starting at that node. System calls like fstatat() are handled exactly in this way, using the node associated with the passed-in descriptor as the starting point. This means that handling “..” requires the kernel to track each directory’s parent node, which in turn means that there must be a unique parent node for each directory. Presumably, this is one of the reasons why multiple hard links to directories are problematic for most unix implementations. This also means that looking up the current working directory path from the kernel is an involved process, which requires searching each parent directory for the relevant node. The kernel path handling can also lead to confusing results in the presence of symbolic links. This is why most shells implement their own path handling layer to provide more interactive user-friendly behavior.
On the balance, so far I prefer my approach, since it seems to be simpler and more flexible (though perhaps slower), but am interested in hearing people’s thoughts. Quite possibly, there are issues I have not considered or encountered yet.
By the way, recently, I came across a paper by Rob Pike about handling of “..” in Plan 9, which describes a lot of the relevant issues (highly recommended reading). Interestingly, their approach has some similarities to mine (they have an fd2path system call), but they still handle relative and non-canonical paths in the kernel.
Any thoughts/comments on the above are welcome. If you have tried either of these approaches, or any others, am interested in hearing about any lessons learned.