Page 1 of 1

interesting I/O architecture

Posted: Tue Jun 09, 2009 6:57 pm
by NickJohnson
I've hit a somewhat gray area in my OS design relating to I/O. I have a sort of hybrid architecture, but for effectively all I/O aspects, it's a microkernel. My design, as you will see, is quite unorthodox, but I think I have filled in the major holes - if you think there is something totally wrong, please tell me, but do not assume I haven't weighed my decisions :wink: .

Here's my original plan. Drivers are privileged processes that may be communicated with via a simple (optionally) synchronous message passing system. User processes *are* allowed to send messages directly to drivers, but the driver knows the sender's PID, so it can filter requests, and certain important message types are also restricted. The VFS server exists only as a sort of directory system: a user process may request the driver PID and file ID of, let's say, "/home/nick/important_homework.txt" from the VFS, and then request a file handle from the given driver using that information. This means that the drivers are responsible for keeping track of files opened by a specific process, as well as handling permissions to those files. Because user processes are actually responsible for their own self-destruction (i.e. even the kernel may not directly kill a process), they are able to make sure all file handles are closed before they call exit(). Even if the handles are not freed, the driver can see if the holding process is still alive when another tries to open that descriptor. Btw, this "killing architecture" is actually secure and reasonable because the kernel may give control of the user process to a piece of read-only, trusted, but unprivileged code mapped in all address spaces (called the "libsys") at any time, so as long as the libsys is written correctly, everything will work smoothly. When fork()ing, the child process may reopen all file handles before fork() itself returns. I'm just planning to have a table of file descriptors set up by either the C library or the libsys that is used for read() and write() calls which are just wrappers for messages.

I know that everyone always seems to say "don't let users send messages to drivers!" Is there some reasoning behind this that means it still is a bad design even if the drivers can do easy filtering? My messages are preemptible as well, and somewhat DoS-proof by design, so sending too many messages, even asynchronously, won't be a problem.

Is there a major problem in giving the job of handling file handles to the drivers themselves? I thought it would be much more flexible: you could make files act however you want if you're a driver writer, so you could even make things that are not files seem like them (a la Plan 9.) The drivers have *plenty* of address space to do this stuff, but the kernel does not, which is one of the many reasons I'm pushing so many things into userspace.

Are there any gaping holes or naive misinterpretations of the hardware in my design?

P.S. Please, if you can help it, don't try and argue that the "libsys"/voluntary exit() concept won't work - I've made many special design decisions that really do make things secure, I'm sure it works, and without intimate knowledge of my design, nobody else will understand why it works.

Re: interesting I/O architecture

Posted: Tue Jun 09, 2009 10:25 pm
by Brendan
Hi,

Let me see if I understand this right...

There's CPL=3 code (libsys) that has write access to a list of open file handles, etc (that is used to do clean-up when the process crashes), and for some reason CPL=3 code (the process itself) can't accidentally trash this list of open file handles, etc?

If libsys can send messages directly to a device driver, then what prevents a process from sending messages directly to device drivers and bypassing libsys?

If one process is a service that handles something that libsys doesn't know about, and a process that uses this service crashes, then who tells the service that the process crashed? For example, imagine if a developer writes their own font engine (the service), and then writes 3 different games that send requests to this font engine, and one of these games asks the font engine to load some fonts and then crashes without telling the font engine that it's finished. In this case, does the font engine end up with 20 MiB of font data that's never freed?

To improve performance, most OS's use the VFS to cache file data. For your OS will every file system need to maintain it's own cache? If the kernel is running out of RAM, will it need to send a message to every file system to ask it to free up some RAM (reduce the amount of data in it's cache); and how will each file system know if the data it's cacheing is more important (or less important) than the data that other file systems are cacheing (to make sure that the most important data is still cached and only the least important data is freed)?

Most OS's also have something called "notifications"; where a process can ask the VFS to tell it if a certain file is changed. For example, imagine if you're using a text editor to edit "foo.txt", and something changes the text editor's configuration file, and then something changes the file "foo.txt". With notifications, the VFS tells the text editor that it's configuration file changed and the text editor can reload it's configuration (so the user doesn't need to close the program and restart it again), and the VFS tells the text editor that the file "foo.txt" changed and the text editor can warn the user (e.g. a dialog box saying "foo.txt was changed, do you want to reload the new version, or..."). For your OS would every file system need to handle "notifications" itself?

For some OSs, files that haven't been used for a long time can be automatically compressed, and automatically decompressed if the file actually is needed. In this case would every file system need to include compression and decompression code?

How do you plan to support mount points? For example, if an application wants to read the file "/home/bcos/mycd/foo/bar.txt", then would it need to ask the VFS for the correct file system and the mount point for that file system, then truncate the name to "foo/bar.txt", and then ask the correct file system for the file "foo/bar.txt"? What if "/home/bcos/mycd" is a symbolic link to "/mnt/cdrom" - how does the VFS know that there's a symbolic link in the path? Would the VFS need to ask the file system mounted at "/" if "/home" is a symbolic link, then ask if "/home/bcos" is a symbolic link, then ask if "/home/bcos/mycd" is a symbolic link, then ask if "/home/bcos/mycd/foo" is a symbolic link, before it can figure out what the correct file system for "/home/bcos/mycd/foo/bar.txt" actually is?


Cheers,

Brendan

Re: interesting I/O architecture

Posted: Wed Jun 10, 2009 8:09 am
by NickJohnson
Brendan wrote:Hi,

Let me see if I understand this right...

There's CPL=3 code (libsys) that has write access to a list of open file handles, etc (that is used to do clean-up when the process crashes), and for some reason CPL=3 code (the process itself) can't accidentally trash this list of open file handles, etc?

If libsys can send messages directly to a device driver, then what prevents a process from sending messages directly to device drivers and bypassing libsys?
Yes, both of those things are true. The user process can trash its own file handle table, but the important information relating to the file handles is stored within the respective drivers, safe from the user. And the user process can also send messages directly to the drivers, but the driver is notified of the caller's PID, so it can quickly filter requests, and certain messages are restricted to the kernel and drivers. The reason I said my message passing system is mostly DoS-proof is that messages (which are really more like *nix signals) can preempt each other, and sending a message ends a process' timeslice, so only so many message can be sent at once.
Brendan wrote:If one process is a service that handles something that libsys doesn't know about, and a process that uses this service crashes, then who tells the service that the process crashed? For example, imagine if a developer writes their own font engine (the service), and then writes 3 different games that send requests to this font engine, and one of these games asks the font engine to load some fonts and then crashes without telling the font engine that it's finished. In this case, does the font engine end up with 20 MiB of font data that's never freed?
I think this is a real problem though. I guess the best solution would be to have a table in the process structure (in the kernel) that keeps track of which processes to notify when a client exits. Drivers could register themselves to be notified, but not regular processes.
Brendan wrote:To improve performance, most OS's use the VFS to cache file data. For your OS will every file system need to maintain it's own cache? If the kernel is running out of RAM, will it need to send a message to every file system to ask it to free up some RAM (reduce the amount of data in it's cache); and how will each file system know if the data it's cacheing is more important (or less important) than the data that other file systems are cacheing (to make sure that the most important data is still cached and only the least important data is freed)?

Most OS's also have something called "notifications"; where a process can ask the VFS to tell it if a certain file is changed. For example, imagine if you're using a text editor to edit "foo.txt", and something changes the text editor's configuration file, and then something changes the file "foo.txt". With notifications, the VFS tells the text editor that it's configuration file changed and the text editor can reload it's configuration (so the user doesn't need to close the program and restart it again), and the VFS tells the text editor that the file "foo.txt" changed and the text editor can warn the user (e.g. a dialog box saying "foo.txt was changed, do you want to reload the new version, or..."). For your OS would every file system need to handle "notifications" itself?

For some OSs, files that haven't been used for a long time can be automatically compressed, and automatically decompressed if the file actually is needed. In this case would every file system need to include compression and decompression code?

How do you plan to support mount points? For example, if an application wants to read the file "/home/bcos/mycd/foo/bar.txt", then would it need to ask the VFS for the correct file system and the mount point for that file system, then truncate the name to "foo/bar.txt", and then ask the correct file system for the file "foo/bar.txt"? What if "/home/bcos/mycd" is a symbolic link to "/mnt/cdrom" - how does the VFS know that there's a symbolic link in the path? Would the VFS need to ask the file system mounted at "/" if "/home" is a symbolic link, then ask if "/home/bcos" is a symbolic link, then ask if "/home/bcos/mycd" is a symbolic link, then ask if "/home/bcos/mycd/foo" is a symbolic link, before it can figure out what the correct file system for "/home/bcos/mycd/foo/bar.txt" actually is?
My intention is to allow drivers to handle a lot of policy on this individually. I could make a system call that reads the current memory usage, so drivers could regulate their cache sizes. Because of my system's purely event driven architecture, it would be easy to implement notifications within each driver as well. The VFS would have to synchronize with the state of the real filesystems, so it keeps track of mountpoints and symlinks internally (isn't that what the VFS does anyway?). I think the more interesting part is that you could have two or more VFSs set up with different mountpoints. E.g. VFS 1 could be *nix style (/mnt/device), but VFS 2 could be MS-DOS/Windows style (device:\). Giving this much responsibility to the drivers shouldn't be too much of a burden: because the functionality is all similar, I could provide a shared library that has all of the core file handle/cache logic, but can still be overridden if the driver needs a unique setup.

Re: interesting I/O architecture

Posted: Wed Jun 10, 2009 11:24 pm
by skyking
I use a similar architecture and don't think there's any problem with this aproach. For the case where a process terminates abnormally any microkernel has to take care of this situation, having a VFS layer between the process and the driver won't help if the VFS layer can't detect that a process has terminated efficiently. I think there's an advantage in sending messages to the driver directly since it avoids one or two context switches.

The cache problem would be solved somewhat by putting the caching in the block device driver. Different block devices have different caching needs, and the filesystem specific caching is filesystem specific, handling this centrally may be no better. The duplication of code may be reduced by using dynamically linked libraries for caching functionality. For memory handling one could use special system calls to allow memory used for caching purposes to be reclaimed if another process needs to allocate the memory.

Mounting can be handled by forwarding the request to another FS driver. The request for "/home/bcos/mycd/foo/bar.txt" to the root FS is forwarded as a request for "/foo/bar.txt" for the mounted FS. If there's a symlink either the driver for the FS where the symlink is located will forward the request for the link name instead (this requires that there is a TTL counter or simiar in the request to solve the problem with circular/long chains of symlinks), or return a symlink object that can be folowed by the caller. The difference again to using a VFS layer between is that it's more like a recursive RPC instead of an iterative sequence of RPC's to the underlying FS drivers.

Notifications need to be handled by the FS driver anyway unless the VFS layer keep track of the relevant FS calls and actually can do this. How do you solve this if a file can change by other means than the FS calls that VFS can interpret? On a NFS a file can change without the VFS layer being able to anticipate it (since the change was not due to a write request from this computer). Again the duplicated code needed to perform this in each driver can be shared via libraries.

Re: interesting I/O architecture

Posted: Thu Jun 11, 2009 12:03 am
by Brendan
Hi,
NickJohnson wrote:The user process can trash its own file handle table, but the important information relating to the file handles is stored within the respective drivers, safe from the user.
In this case, why does the user process need to keep its own file handle table, etc? For example, why can't a process just "exit()" without closing any file handles (or doing any other clean-up) if the device drivers need to be capable of doing the clean-up anyway.
NickJohnson wrote:And the user process can also send messages directly to the drivers, but the driver is notified of the caller's PID, so it can quickly filter requests, and certain messages are restricted to the kernel and drivers. The reason I said my message passing system is mostly DoS-proof is that messages (which are really more like *nix signals) can preempt each other, and sending a message ends a process' timeslice, so only so many message can be sent at once.
If messages can preempt each other, then you might end up with a re-entrancy nightmare. For example, a device driver receives a message and starts modifying a critical data structure, but is then pre-empted by a second message. To handle the second message the driver needs to use this critical data structure, but the critical data structure is in an inconsistant state because it hasn't finished modifying it. In this case if you use a re-entrancy lock, then you'll just get a dead-lock because after the driver is preempted it won't be able to acquire the lock a second time. There's no easy way to solve this problem.

One way to solve the problem would be to allow a device driver to prevent pre-emption while the driver is modifying the critical data structure; but then the kernel would need to wait (for an unknown length of time) before it can send a message to the driver.

Another way would be to use a special type of re-entrancy lock, where if the lock is already acquired you switch to the pre-empted code until it releases the lock, and then switch back. In this case you'd need to be careful of misuse (e.g. drivers that decide to switch to other tasks for other reasons).
NickJohnson wrote:
Brendan wrote:If one process is a service that handles something that libsys doesn't know about, and a process that uses this service crashes, then who tells the service that the process crashed? For example, imagine if a developer writes their own font engine (the service), and then writes 3 different games that send requests to this font engine, and one of these games asks the font engine to load some fonts and then crashes without telling the font engine that it's finished. In this case, does the font engine end up with 20 MiB of font data that's never freed?
I think this is a real problem though. I guess the best solution would be to have a table in the process structure (in the kernel) that keeps track of which processes to notify when a client exits. Drivers could register themselves to be notified, but not regular processes.
That's how I do it - I call this "obituaries" (if any task is terminated for any reason, the any other task that has requested it is told about it).

If a device driver crashes, then how do regular processes know that any pending requests won't be honored?

For my OS design, if a device driver (or file system) crashes then the VFS gets the obituary, and because the VFS manages all file I/O it knows which requests are still pending and can return errors for any effected requests. For example, an application asks to read 1024 bytes from a file handle, the VFS checks it's cache and if the data isn't in the cache the VFS asks the file system driver to fetch the data, and if the file system driver crashes the VFS can return an error to the application. However, I also plan to use redundant file systems, where the VFS might be able to successfully complete the application's request by asking a different (mirrored) file system to fetch the data from the file.

Of course in my case, if the VFS crashes the OS is entirely screwed (but that's likely to be the case for all OSs, regardless of what you do).

I guess that in your case, you could allow regular processes (libsys) to register for obituaries and then have obituary handling code in libsys...
NickJohnson wrote:
Brendan wrote:To improve performance, most OS's use the VFS to cache file data. For your OS will every file system need to maintain it's own cache? If the kernel is running out of RAM, will it need to send a message to every file system to ask it to free up some RAM (reduce the amount of data in it's cache); and how will each file system know if the data it's cacheing is more important (or less important) than the data that other file systems are cacheing (to make sure that the most important data is still cached and only the least important data is freed)?
My intention is to allow drivers to handle a lot of policy on this individually. I could make a system call that reads the current memory usage, so drivers could regulate their cache sizes.
If one file system has 5 MiB of cached data that hasn't been used for ages and another file system has 5 MiB of cached data that is used very often, which 5 MiB of data will be freed?
NickJohnson wrote:
Brendan wrote:Most OS's also have something called "notifications"; where a process can ask the VFS to tell it if a certain file is changed.
Because of my system's purely event driven architecture, it would be easy to implement notifications within each driver as well. The VFS would have to synchronize with the state of the real filesystems, so it keeps track of mountpoints and symlinks internally (isn't that what the VFS does anyway?). I think the more interesting part is that you could have two or more VFSs set up with different mountpoints. E.g. VFS 1 could be *nix style (/mnt/device), but VFS 2 could be MS-DOS/Windows style (device:\).
Normally the VFS keeps track of everything, so that (for e.g.) if a process wants to read the file "/home/bcos/foo/bar/hello/goodbye.txt" the VFS knows exactly where this file is (including mountpoints and symbolic links) and knows if the process has permission to access the file, and if the data is already cached it can send the data to the process immediately (without the file system itself being involved).

If there's 2 different VFSs being used, and I create a symbolic link from "/home/bcos/mycd" to "/mnt/cdrom", then would the file system tell one VFS that there's a new symbolic link from "/home/bcos/mycd" to "/mnt/cdrom" and tell the other VFS that there's a new symbolic link from "c:\home\bcos\mycd" to "D:\"? Does this mean that all file systems need to know about all mount points for all VFSs?

I'd also point out that IMHO choice is good (e.g. allowing an arbitrary number of different VFSs), but standardization is a lot better (e.g. having one standard VFS that all users, administrators, technicians and programmers can become familiar with). If you've ever wondered why there's so little commercial software for Linux, just think about being a help desk operator for "XYZ accounting package" when a customer rings up because they've got problems installing it... ;)


Cheers,

Brendan

Re: interesting I/O architecture

Posted: Thu Jun 11, 2009 12:48 am
by NickJohnson
Brendan wrote:
NickJohnson wrote:The user process can trash its own file handle table, but the important information relating to the file handles is stored within the respective drivers, safe from the user.
In this case, why does the user process need to keep its own file handle table, etc? For example, why can't a process just "exit()" without closing any file handles (or doing any other clean-up) if the device drivers need to be capable of doing the clean-up anyway.
Well, the user process still needs to keep track of which driver to ask for file data and which files it has opened somehow, even if there's nothing special about the table.
Brendan wrote:
NickJohnson wrote:And the user process can also send messages directly to the drivers, but the driver is notified of the caller's PID, so it can quickly filter requests, and certain messages are restricted to the kernel and drivers. The reason I said my message passing system is mostly DoS-proof is that messages (which are really more like *nix signals) can preempt each other, and sending a message ends a process' timeslice, so only so many message can be sent at once.
If messages can preempt each other, then you might end up with a re-entrancy nightmare. For example, a device driver receives a message and starts modifying a critical data structure, but is then pre-empted by a second message. To handle the second message the driver needs to use this critical data structure, but the critical data structure is in an inconsistant state because it hasn't finished modifying it. In this case if you use a re-entrancy lock, then you'll just get a dead-lock because after the driver is preempted it won't be able to acquire the lock a second time. There's no easy way to solve this problem.

One way to solve the problem would be to allow a device driver to prevent pre-emption while the driver is modifying the critical data structure; but then the kernel would need to wait (for an unknown length of time) before it can send a message to the driver.

Another way would be to use a special type of re-entrancy lock, where if the lock is already acquired you switch to the pre-empted code until it releases the lock, and then switch back. In this case you'd need to be careful of misuse (e.g. drivers that decide to switch to other tasks for other reasons).
I was thinking of having an "overflow" queue of requests, which is only used when a request has been preempted (which doesn't happen often - handling is quite fast), and then read from in a more polling-like fashion when the driver gets its real timeslice. The queue would need deadlock protection, but I think I may have discovered a linked list variation that is actually deadlock proof, which I could use.

Here's a quick explanation; I may not explain it well the first time. First, I'll tell you how I got the idea. At my school, we have a bunch of carts that are filled with laptops - each laptop is numbered, and has a numbered slot. Of course, many of the students don't put the laptops back in the right order, or bother to sort things out before putting their laptop in a random open slot. So what you get is most of the laptops in their correct slots, but many out of order, and every slot filled. Here's the key: if you choose any slot with the wrong laptop in it, and then look at the slot that laptop is supposed to be in, that slot will also have a mismatched laptop. Pretty obvious once you think about it. But now the logical leap: if you interpret the number on the laptop as a pointer, and the slot as an address, the out-of-order laptops will *always* form at least one ring-shaped linked list. If you are careful to screw things up properly, they will form one ring. To add a node to the list/loop, simply swap a misplaced laptop with a correctly placed one: the correctly placed laptop becomes part of the list. The cool part about this is that you can always find which nodes are in the list and which aren't, in constant time - if the laptop and slot don't match, it's part of the list. That means even if all hell breaks loose, you can always get back to somewhere in the list. Inserting at any point will not break the loop, even if the next node changes while you try and insert. Therefore, *no locking is ever needed*. 8)
Brendan wrote:
NickJohnson wrote:
Brendan wrote:To improve performance, most OS's use the VFS to cache file data. For your OS will every file system need to maintain it's own cache? If the kernel is running out of RAM, will it need to send a message to every file system to ask it to free up some RAM (reduce the amount of data in it's cache); and how will each file system know if the data it's cacheing is more important (or less important) than the data that other file systems are cacheing (to make sure that the most important data is still cached and only the least important data is freed)?
My intention is to allow drivers to handle a lot of policy on this individually. I could make a system call that reads the current memory usage, so drivers could regulate their cache sizes.
If one file system has 5 MiB of cached data that hasn't been used for ages and another file system has 5 MiB of cached data that is used very often, which 5 MiB of data will be freed?
It would be easy to have the VFS or some other central server keep a list of which drivers are most important. When a driver wants to know if it should free cache, it just consults the amount of free RAM and its placement on that list. I'm assuming the drivers will play nicely - my reason for making a (sort of) microkernel is not reliability and security but instead modularity and flexibility.
Brendan wrote:
NickJohnson wrote:
Brendan wrote:Most OS's also have something called "notifications"; where a process can ask the VFS to tell it if a certain file is changed.
Because of my system's purely event driven architecture, it would be easy to implement notifications within each driver as well. The VFS would have to synchronize with the state of the real filesystems, so it keeps track of mountpoints and symlinks internally (isn't that what the VFS does anyway?). I think the more interesting part is that you could have two or more VFSs set up with different mountpoints. E.g. VFS 1 could be *nix style (/mnt/device), but VFS 2 could be MS-DOS/Windows style (device:\).
Normally the VFS keeps track of everything, so that (for e.g.) if a process wants to read the file "/home/bcos/foo/bar/hello/goodbye.txt" the VFS knows exactly where this file is (including mountpoints and symbolic links) and knows if the process has permission to access the file, and if the data is already cached it can send the data to the process immediately (without the file system itself being involved).

If there's 2 different VFSs being used, and I create a symbolic link from "/home/bcos/mycd" to "/mnt/cdrom", then would the file system tell one VFS that there's a new symbolic link from "/home/bcos/mycd" to "/mnt/cdrom" and tell the other VFS that there's a new symbolic link from "c:\home\bcos\mycd" to "D:\"? Does this mean that all file systems need to know about all mount points for all VFSs?

I'd also point out that IMHO choice is good (e.g. allowing an arbitrary number of different VFSs), but standardization is a lot better (e.g. having one standard VFS that all users, administrators, technicians and programmers can become familiar with). If you've ever wondered why there's so little commercial software for Linux, just think about being a help desk operator for "XYZ accounting package" when a customer rings up because they've got problems installing it... ;)
The reason it should work is that all security and general management of files beyond simple directory structure is done by the drivers themselves. However, all directory structure is done exclusively by the VFS: drivers only update the VFS structure when the directory structure on disk changes. Mountpoints are local to a specific VFS, but symbolic links, being stored on disk, are global. Just because I support multiple concurrent VFS implementations doesn't mean I'll use that feature much - a standard VFS will be provided with the base system. The concurrency might be useful for making more flexible jails, but the use of varying implementations is really more for future proofing. I would rather build in flexibility now than rewrite everything later or when I realize that I needed some new resource to finish things.

Re: interesting I/O architecture

Posted: Thu Jun 11, 2009 12:49 am
by Brendan
Hi,
skyking wrote:The cache problem would be solved somewhat by putting the caching in the block device driver. Different block devices have different caching needs, and the filesystem specific caching is filesystem specific, handling this centrally may be no better.
At the highest level there's directories and files, and at the lowest level there's devices, bytes and sectors. In the middle there's a complex mess that's used to figure out which directories and files correspond to which devices, bytes and sectors. Cacheing at the highest level avoids the complex mess in the middle. Cacheing at the lowest level would seriously increase the overhead of something like "ls -R *" (assuming "warm cache" conditions), and anything else that involves directory lookups (e.g. "open()").
skyking wrote:Mounting can be handled by forwarding the request to another FS driver. The request for "/home/bcos/mycd/foo/bar.txt" to the root FS is forwarded as a request for "/foo/bar.txt" for the mounted FS. If there's a symlink either the driver for the FS where the symlink is located will forward the request for the link name instead (this requires that there is a TTL counter or simiar in the request to solve the problem with circular/long chains of symlinks), or return a symlink object that can be folowed by the caller. The difference again to using a VFS layer between is that it's more like a recursive RPC instead of an iterative sequence of RPC's to the underlying FS drivers.
Many messages going everywhere (including task switches); rather than one piece of code that creates a hash and looks up the information in a table (where parts of the table are more likely to still be in the CPU's cache).
skyking wrote:Notifications need to be handled by the FS driver anyway unless the VFS layer keep track of the relevant FS calls and actually can do this. How do you solve this if a file can change by other means than the FS calls that VFS can interpret? On a NFS a file can change without the VFS layer being able to anticipate it (since the change was not due to a write request from this computer).
The NFS would need to tell the VFS when something has been changed by a remote computer (so that the VFS can invalidate any cached data that was effected by the change), and the VFS can issue notifications (if necessary) at this time. This is one of the reasons I like "versioning" (where existing files never change and new versions of the file are created instead), but that's a different story.


Cheers,

Brendan

Re: interesting I/O architecture

Posted: Thu Jun 11, 2009 1:31 am
by skyking
Brendan wrote:Hi,
skyking wrote:The cache problem would be solved somewhat by putting the caching in the block device driver. Different block devices have different caching needs, and the filesystem specific caching is filesystem specific, handling this centrally may be no better.
At the highest level there's directories and files, and at the lowest level there's devices, bytes and sectors. In the middle there's a complex mess that's used to figure out which directories and files correspond to which devices, bytes and sectors. Cacheing at the highest level avoids the complex mess in the middle. Cacheing at the lowest level would seriously increase the overhead of something like "ls -R *" (assuming "warm cache" conditions), and anything else that involves directory lookups (e.g. "open()").
No, "ls -R *" leads to requests to FS drivers. These can cache structures that is read from the block device, the only possible overhead may be that the block device may cache information as well (but that could be solved if the client can request uncached reads). OTOH if caching is done on the highest level the VFS must know the nature of the devices it's caching, also you will miss such things as the FAT fs driver might really like to cache the FAT(s).
skyking wrote:Mounting can be handled by forwarding the request to another FS driver. The request for "/home/bcos/mycd/foo/bar.txt" to the root FS is forwarded as a request for "/foo/bar.txt" for the mounted FS. If there's a symlink either the driver for the FS where the symlink is located will forward the request for the link name instead (this requires that there is a TTL counter or simiar in the request to solve the problem with circular/long chains of symlinks), or return a symlink object that can be folowed by the caller. The difference again to using a VFS layer between is that it's more like a recursive RPC instead of an iterative sequence of RPC's to the underlying FS drivers.
Many messages going everywhere (including task switches); rather than one piece of code that creates a hash and looks up the information in a table (where parts of the table are more likely to still be in the CPU's cache).
That situation is also the case when you write to a device. If you don't get to send the request straight to the driver you will have one or two context switches more.

You have a point here though, question is which overhead will be the biggest.

Re: interesting I/O architecture

Posted: Thu Jun 11, 2009 5:46 am
by gravaera
NickJohnson wrote: Here's my original plan. Drivers are privileged processes that may be communicated with via a simple (optionally) synchronous message passing system. User processes *are* allowed to send messages directly to drivers, but the driver knows the sender's PID, so it can filter requests, and certain important message types are also restricted.
Nothing wrong there. That seems logical, so far.
The VFS server exists only as a sort of directory system: a user process may request the driver PID and file ID of, let's say, "/home/nick/important_homework.txt" from the VFS, and then request a file handle from the given driver using that information. This means that the drivers are responsible for keeping track of files opened by a specific process, as well as handling permissions to those files. Because user processes are actually responsible for their own self-destruction (i.e. even the kernel may not directly kill a process), they are able to make sure all file handles are closed before they call exit(). Even if the handles are not freed, the driver can see if the holding process is still alive when another tries to open that descriptor. Btw, this "killing architecture" is actually secure and reasonable because the kernel may give control of the user process to a piece of read-only, trusted, but unprivileged code mapped in all address spaces (called the "libsys") at any time, so as long as the libsys is written correctly, everything will work smoothly. When fork()ing, the child process may reopen all file handles before fork() itself returns. I'm just planning to have a table of file descriptors set up by either the C library or the libsys that is used for read() and write() calls which are just wrappers for messages.
What I didn't understand was this. From what I can reason out, you want to have drivers do more than just provide an interface to the hardware, even going so far as to require them to keep track of open resource streams having to do with their device, and even closing off unused streams.

This is a bigswitch from the idea of having a centralized, kernel driven resource management environment, and only using drivers for the routines of opening and recieving hardware resources.

It may also be inefficient, seeing as each driver, (this is how I interpreted your post) will have its own mini-management system. In other words: The NIC, Smart Card, Modem, HDD, and everyone else will be running separate code in memory for managing the scraps that applications leave behind. This is...CPU intensive, and RAM guzzling, to say the very least.

I may be wrong about how I interpreted the idea, but if I was right, then maybe you should have a central monitoring system, with only one set of managment routines, so that you don't have several independent hardware level managing programs in memory at once.

Apart from that, there's nothing wrong with your idea. Having messages sent to drivers isn't going to be that bad, as long as it's done properly.
Is there a major problem in giving the job of handling file handles to the drivers themselves? I thought it would be much more flexible: you could make files act however you want if you're a driver writer, so you could even make things that are not files seem like them (a la Plan 9.) The drivers have *plenty* of address space to do this stuff, but the kernel does not, which is one of the many reasons I'm pushing so many things into userspace.
This is what I was saying above.

Apart from that, good luck, and I wish you well.

Re: interesting I/O architecture

Posted: Thu Jun 11, 2009 7:22 pm
by NickJohnson
holypanl wrote:
The VFS server exists only as a sort of directory system: a user process may request the driver PID and file ID of, let's say, "/home/nick/important_homework.txt" from the VFS, and then request a file handle from the given driver using that information. This means that the drivers are responsible for keeping track of files opened by a specific process, as well as handling permissions to those files. Because user processes are actually responsible for their own self-destruction (i.e. even the kernel may not directly kill a process), they are able to make sure all file handles are closed before they call exit(). Even if the handles are not freed, the driver can see if the holding process is still alive when another tries to open that descriptor. Btw, this "killing architecture" is actually secure and reasonable because the kernel may give control of the user process to a piece of read-only, trusted, but unprivileged code mapped in all address spaces (called the "libsys") at any time, so as long as the libsys is written correctly, everything will work smoothly. When fork()ing, the child process may reopen all file handles before fork() itself returns. I'm just planning to have a table of file descriptors set up by either the C library or the libsys that is used for read() and write() calls which are just wrappers for messages.
What I didn't understand was this. From what I can reason out, you want to have drivers do more than just provide an interface to the hardware, even going so far as to require them to keep track of open resource streams having to do with their device, and even closing off unused streams.

This is a bigswitch from the idea of having a centralized, kernel driven resource management environment, and only using drivers for the routines of opening and recieving hardware resources.

It may also be inefficient, seeing as each driver, (this is how I interpreted your post) will have its own mini-management system. In other words: The NIC, Smart Card, Modem, HDD, and everyone else will be running separate code in memory for managing the scraps that applications leave behind. This is...CPU intensive, and RAM guzzling, to say the very least.
I would consider it to be just a different, but still basic, interface to the hardware that can be used directly by the user. In a microkernel, there is a lot of complexity associated with redirecting things from one server to another, so I think it may actually simplify things a bit to have the direct connection. The code needed to manage file handles would be trivial, at least in comparison to the driver and block cache. Having the permission system in the drivers would also be easier, because different filesystems etc. have different permission systems internally, and less conversion would be needed if everything is in one place - in fact, it would be even simpler than a monolithic setup, because drivers don't have to comply with a VFS but still can have a well-defined interface because they use the normal IPC methods.