Octocontrabass wrote:Couldn't it be the application's job instead?
Could be, but does that worth it? Seems very inconvenient. Last time I saw something like that was with
OS/390 and VMS, both _optionally_ supported fixed length records in files (but variable per file, so could be different than the underlying sector size). Also let's admit those were designed more than 40 years ago. Amiga, DOS, BeOS, MacOS (mach), Win, Linux, SCO UNIX, all BSDs allow any byte position and any buffer size, so it seems to me like a must have feature these days.
If you limit applications to reading/writing aligned 4kB blocks, you can optimize for the extremely common case where the filesystem uses blocks in some multiple of 4kB and cut out a lot of unnecessary work in the VFS.
True, but 1) see above on limiting the app, 2) unfortunately you can't guarantee all file systems to be 4k aligned. Imagine for example if cluster size is not multiple of 4096 on a FAT partition, or if root directory starts on a sector which is not multiple of 8. I think the best we can do is using a "shortcut" solution when the fs consist of 4k blocks and the buffer is also aligned, and a slower, but general code if not. That way we can have the performance gain you mentioned without limiting the OS to certain specially formatted file systems.
Actually, it's pretty common to do this in language standard libraries, so most applications would see no difference at all.
Yes that's a possibility. You can do that in the standard library if you send all fs and storage characteristics to the process. I haven't examined this solution, because my standard library does not store file offsets, knows nothing about filesystems let alone storage devices (therefore it does not know the sector size).
Has anybody implemented this? Would you mind sharing your experience with us, pros and cons of this solution? Sounds like an interesting idea, but I have questions about locking (see below).
Korona wrote:The FS (and not the VFS) can easily handle caching and maintain the file offset.
Okay, but where exactly would that FS be in a micro-kernel? I think in the same process as the VFS. Other than that, you are right read and write do not have to strictly go through the VFS, they only have to access the same memory as open/close (like file offset which could be in the process' address space too). But I think positioning/reading over file end could be problematic if you omit FS/VFS for read/writes (but not impossible to solve).
I'm not sure how to implement locking if file offset is handled by the standard library in the process' memory though (for example when one process is writing the first 64k of the file, and several others are reading the same file, and only the readers with file offset + read size < 64k should be blocked, I mean
F_SETLK fcntl command).
This is how I've implemented this: I have an FS process, which includes VFS functions and disk cache too. It has a VFS node for every open file and directory (
vnode, containing device reference, partition position etc.), and a table for every opened file (
openfiles, with file offset field and referencing the file's
vnode and the opener's
pid). On the standard library side I only have a simple integer table, which basically contains global
openfiles indeces, nothing more. When a process calls read() with fd 3 for example, then the standard library translates fd 3 with that table (let's say 123) and uses that for the syscall. That way my VFS can use the
openfiles index 123 received from the syscall, and it does not know that to the process that's fd 3. Hope this makes sense.
Cheers,
bzt