Brendan wrote:[*]What actually determines who gets CPU time when (each individual process or the kernel)?
The kernel provides CPU time, because the exokernel's job is to "securely multiplex hardware."
Brendan wrote:[*]What actually determines when to read data, whether it be a high priority read or a prefetch, (each individual process or the kernel)?
Individual processes issue read schedules (for regular and prefetch reads) that are merged by the kernel.
Brendan wrote:[*]What actually determines when a CPU should enter a power management state (each individual process or the kernel)?
The kernel "securely multiplexes hardware." It's not really to the advantage of any individual application to care about PM.
Brendan wrote:[*]What actually determines when load should be migrated between CPUs (each individual process or the kernel)?[/list]
Individual applications can request specific CPUs, the kernel, "securely multiplexing hardware," hands out CPU time, and also notifies the applications so they can decide what to do with the particular CPU they're running on.
Let's look at some more questions:
- Who decides which disk blocks to read, write, or allocate, when and where? Can you do a zero-touch, all-files-at-once asynchronous-I/O-based cp?
- Who decides which physical pages to swap out, discard, or otherwise free (and how to do so)? Can you use an algorithm other than LRU to decide, depending on your application?
- Who decides how to handle application-caused interrupts/exceptions? (signal handlers decide too much performance-wise)
- Who decides the internals of the network stack? Is it the kernel, possibly providing zero-copy interfaces like sendfile/splice, or the applications, who can implement sendfile-like operations for even non-file situations? Is it the kernel, providing lots of ioctls and flags for things like delayed TCP ack, or is the application, who can just do it however is optimal (e.g. also combining the fin packet with the last data packet)? Is it the kernel, who has separate TCP retransmission buffer, or the application, who can retransmit directly from its own buffers?
- Who pulls the data out of/puts the data in network packets? (Linux packet mmap is a very exokernel-like interface) Can you implement precomputed checksums for zero-touch disk cache->network packet transmission? Can you discard rather than copy packets, skip checksum verification, etc. for stress testing tools? Can you implement a zero-overhead TCP/IP forwarder for e.g. a network load balancing algorithm?
Of course you could put hints and options for these in the kernel, but patching the kernel, especially to change interfaces, is a lot more work than just doing it yourself in userspace. It's also a lot less future-proof.
The Xok benchmarks may be missing some features (although I'm still not sure what, the Cheetah vs Harvest certainly isn't), but it goes both ways- Cheetah implements several features not present in OpenBSD. There's also a slightly newer paper from 2002:
http://pdos.csail.mit.edu/papers/exo:tocs.pdf It has a few more examples of exokernel-specialized applications. They also add IIS/NT to their Cheetah benchmark, which
does use zero-copy, and which even their baseline Socket/Xok server outperforms on slower hardware.
Brendan wrote:For another example, the kernel API might provide extremely low level abstractions - e.g. it might only expose raw blocks and pages, where the library has to implement "open()" and "fopen()", and "sbrk()" and "malloc()" on top of that. None of this changes the fact that it's still a monolithic kernel - it's just a sliding scale of from "very high level kernel API abstractions" to "very low level kernel API abstractions". If fact nothing really prevents a monolithic kernel's API from providing multiple levels of abstraction at the same time - e.g. it could provide very high level abstractions in addition to providing a very low level abstractions.
When you provide those "very low level kernel API abstractions" and still protect applications from each other, you
are an exokernel. At that point, you may as well move the high level abstractions out of the kernel to 1) make them more easily modified.debugged, 2) reduce system call overhead, and 3) reduce the surface kernel attackers can target.
Brendan wrote:if it made sense to provide very low level abstractions then all of the existing monolithic kernels would be providing those low level abstractions already (possibly in addition to their existing higher level abstractions, for the sake of compatibility); or at a minimum they'd be evolving towards lower level abstractions; so that they can give the "massive number of people" that write their own library (and don't just use the system's standard C/C++/POSIX library) a lot more flexibility.
They are. Linux graphics is one example. Obviously, as you mention, compatibility is a concern. I would add momentum- nobody wants to overhaul the Linux file systems and network stack to run in user space, they simply weren't designed for that.
Brendan wrote:Of course I'm been a little sarcastic - some people actually do write their own library instead of just using the system's standard C/C++/POSX library. Examples of this are the Wine project (implementing the Win32 API on *nix systems) and Microsoft's POSIX subsystem (implementing a standard C/C++/POSX library on Windows). It's not like you need very low level abstractions for this.
So what are the benefits of having very low level abstractions? The main benefit is, if you're a researcher and not writing a "real world OS" and don't need to care about malicious code, or compatibility, or features that people have grown to expect (e.g. like your web server being able to files stored on an NFS file system); then you can make it seem like low level abstractions provide better performance in some cases without acknowledging the fact that the sacrifices you've made to gain that extra performance are unlikely to be practical in a real OS.
Of course you don't need low-level abstraction to do that, but without low-level abstraction, you have to translate between e.g. Win32 and POSIX, which is slower (see also Cygwin). That also doesn't mean you have to go all the way to the bottom just because that's what the kernel exposes. For example, Cheetah could still load disk blocks from local or NFS file systems using a VFS library, which would have no effect on optimizations like TCP retransmission, packet merging, checksum precomputation, etc.