So, I think you have some misconceptions about what an exokernel is. In the MIT papers, they implement a UNIX interface on top of their exokernel, and then simply re-link existing applications. These applications have the same safety guarantees, and they often have
better performance. Then, they implement some applications that bypass traditional all-things-to-all-people abstractions, including the Cheetah webserver, that improve performance by a factor of 8.
Brendan wrote:I'd just send a message to all threads (that have indicated that they want it) when memory is getting low (it's mostly required for micro-kernels when VFS/disk caches are run in user-space). It doesn't require an exo-kernel approach; is probably done by quite a few micro-kernels already, and is also easily done by a monolithic kernel. The only real problem is that it's "non POSIX" (regardless of kernel type).
That
is the exokernel approach. That is one way in which those kernels are "exokernely," much like DRM in Linux is "exokernely." Wayland's client-allocated buffers are "exokernely." The idea is closely related to "mechanism, not policy"- the trusted, non-replaceable module (in this case, the kernel) securely multiplexes hardware resources, while everything else is relegated to libraries. This way, low-level "/dev/sda" access can still be secure, different apps can run different libraries at the same time, securely, and abstractions can by fully
or partially bypassed when they're inconvenient.
Brendan wrote:Effective scheduling requires global knowledge (e.g. "is thread 1 in process 1 more important than both thread 2 in process 2"), changes rapidly (e.g. if you need to task switch to decide which task to switch to then you've got a performance disaster) and is better served by "hints" (e.g. threads that tell you their desired scheduling policy and priority).
These issues are exactly the same in an exokernel and everything else! Exokernels don't have the microkernel dogma of pushing everything out of the kernel- they just provide a lower-level interface to multiplex
hardware resources rather than abstractions on top of them. An exokernel scheduler would thus schedule simpler, lighter-weight entities than processes (say, scheduler activations?), with many traditional features securely implemented in libOSes.
Brendan wrote:Most decent OSs support IO priorities (were you can issue a very low priority "read" to prefetch) and IO cancellation (where you can cancel a low priority read and issue a higher priority read if the data wasn't prefetched before you need it). Of course most of the time read requests only get as far as the VFS and are satisfied from the VFS's global file cache (there's no sane reason for file systems and disk drivers to be involved in a "VFS cache hit"); and not having a global file cache would suck (e.g. per process file cache with "cold cache" every time a process starts and the same data cached by multiple processes; or no file cache at all and only sector caches where you fail to avoid the overhead of file system layer for the "cache hit" case).
Again, there is no reason an exokernel couldn't have IO priorities. There is also no reason for an exokernel not to have a global (unified!) disk block cache- Xok and Aegis do exactly this. The difference is that the kernel only keeps the mapping of disk blocks to pages, while applications are the ones that decide when and where to load those pages. What makes it an exokernel is that the libOS directly asks the kernel for a block rather than reading from a file (it figured out which block by reading the filesystem metadata, which it also requested on its own).
Brendan wrote:If you find that the exo-kernel is inefficient (e.g. you want to bypass the "meta-file system in kernel" thing in Xok) then you're still stuck with doing kernel patches and/or using lower level "/dev/sda" style access. Also note that patching a (monolithic) kernel is more likely to benefit other processes (even processes that were written 10 years ago and nobody has touched since) than to cause design conflicts and/or slower kernels.
No, if you want to bypass the file system, you just allocate yourself some disk blocks and don't put them in a file system. You'd need an on-disk structure to track that allocation, and the "meta-file system" is what lets the kernel understand it, but after that you've got no more overhead than straight "/dev/sda" (this could still work with a "raw disk" fs kmod). Yes, performance patches to monolithic kernels benefit everyone, but so do performance patches to libOSes (dynamic linking). However,
feature patches push kernel abstractions more toward the all-things-to-all-people problem. In libOSes, on the other hand, applications using those features don't have to go through the one-kernel-to-rule-them-all trying to handle all the features that application
isn't using.
Brendan wrote:Most of the file system's code is maintaining file/directory information and figuring out which blocks correspond to which files/directories. It doesn't matter if your kernel is only using part of the file/directory information (e.g. the permissions and not the other metadata) you're still doing a large amount of the work that a file system has to do.
File system logic is not duplicated. It is moved. libFS does everything it can until it needs to ask the kernel for permission, and then the kernel uses libFS-provided, deterministic bytecode functions to provide access control. To create a file in a directory, for example, the libFS chooses a free disk block (using a kernel-managed free list) in an ideal location, and asks the kernel to modify the directory metadata block with it. The kernel then runs the deterministic function to check that the requested modification does indeed allocate the requested block, and performs the modification. The libFS then maps in the new block, writes to it, and eventually decides when to write back both the disk block and the metadata block. This is all described straightforwardly in the Xok paper...
Brendan wrote:Agreed. Of course "trusted loadable modules" is what most micro-kernels and monolithic kernels use to support file systems. I'd say the single largest problem here is the design of old interfaces - e.g. processes not being able to use "standard hints" to effect things like disk block placement, IO priorities, etc. For a simple example, to open a file for writing you should probably have to provide an "expected resulting file size", plus some flags (if you expect the file to be appended to later, if the file will be read often, if the file is unlikely to be modified after its created, etc); so that the file system code can make better decisions about things like disk placement.
The difference is that rather than supporting a "crusty old" all-things-to-all-people API, or having to create a new all-things-to-all-people API with hints for all possible eventualities, the exokernel just lets the libFS make all those decisions. Then when you see a need to change the interface, you just use a different version of the libFS, rather than making a breaking change to the kernel/userspace API. With proper library versioning, you could have applications linked against different versions of different libFSes, with no overhead from the kernel supporting multiple interfaces, no "minimum kernel version" requirements, and no need to trust new programs that use these new APIs.
Brendan wrote:Actually; that's probably the single largest benefit of exo-kernels - avoiding problems caused by "crusty old APIs".
Exactly this. You avoid making APIs that are all things to all people that have to support every possible use case from high performance servers to games to productivity software, but you keep all the safety and isolation guarantees of a regular monolithic kernel.