bzt wrote:Yes, absolutely. Calling the kernel to pass a message was as simple as calling a shared library function these days. And because there was only one address space, it only needed to pass pointers around, which is the fastest possible way. No monolithic kernel that implement address spaces has that luxury.
On current hardware, at least. If the hardware can automatically make a transition between protection domains on control transfers between two programs operating at the lowest privilege level, you could have the best of both worlds (assuming that this functionality doesn't impair the performance of the hardware too much to be viable, as with the Intel TSS mechanism). On hardware that allowed programs to load supplementary address spaces and had some way of restricting when a given address space could be accessed, you could have a message passing system that looked like this:
1) Your thread starts out with various address spaces loaded (such as the one containing the code the thread is currently executing, the one containing the thread's stack and thread-local storage, the one containing the heap for whatever job the thread is doing, maybe a few memory mapped files, etc), and a few tables something like an x86 LDT specifying what address spaces may currently be accessed (such as one table containing the list of libraries may be called from the current code address space, another containing a list of data address spaces that may be loaded when the current stack/thread local storage address space is loaded, another containing a list of address spaces with memory mappings of open file descriptors, etc).
2) The thread wants to pass a message to some microkernel component to request some service or other.
3) The thread marks any loaded address spaces that will not be needed for the system call to be marked as inaccessible on the next control transfer to a different code address space. It also loads any address spaces that will be needed that are not yet loaded (but it is probably making the service request to operate on data that it has recently manipulated, so those address spaces are likely to already be loaded).
4) The thread constructs a stack frame, including any pointers into the address spaces that will be needed for the service request.
5) The thread executes a far call to a function in the address space for the microkernel component it is requesting service from. This automatically causes the register pointing at the address space descriptor table for the current code space to be loaded with a new value, pointing at the descriptor table listing the address spaces callable by the microkernel component.
6) The function call and stack frame in 4) and 5) serve as the message from the program to the microkernel component. The microkernel component is able to use any pointers passed directly. It queues up the work requested (or, if it can be done quickly, or the request is blocking, does the work immediately).
7) The microkernel component returns execution to the code that called it, and the address spaces that were marked inaccessible on the original control transfer are re-marked as valid. For non-blocking requests, it calls a callback in the application code, which functions as a "request completed" message (if this is needed for the request in question). For blocking requests, the "request completed" message is the return itself.
Note that there's not really a requirement for a core kernel to implement a message-passing service, the message is sent by a direct function call from the application to the microkernel component in question, as is possible on a system with no protection whatsoever.
The interesting thing about such a system is that you retain the concepts of executables, libraries, and threads, but it becomes hard to say what a "process" is. I think the idea of a "process" is largely a product of being unable, on traditional architectures, to make protection domain transitions without coming out of the lowest privilege level. It's basically a bundle of code and one or more threads working on that code, plus a common heap for the threads to use, that are all bundled together in one protection domain.
The Intel TSS mechanism basically implemented this, but it used a reload-the-world approach that made it prohibitively expensive to do. They almost managed to implement a more workable system with segmentation, but they structured it wrong. Imagine a system where instead of referencing the GDT for segment selectors with bit 2 clear, and the LDT for bit 2 set, you instead have two more segment registers (for a total of eight) and an LDT for each segment register (basically folding the LDTR into the segment registers), with the LDT referenced by a given selector being determined by bits 2-4 of the selector (I'd actually use the high order bits, but for the purpose of illustration I'm making as few changes to the protected mode Intel segmentation architecture as I can while still ending up with a system with the desired properties). So now an eighth of your segment selectors reference the LDT for the CS register, and eighth reference the LDT for the DS register, and so on. The next change is that the LDT format is not the same as the GDT format, but rather an index into the GDT, plus some metadata. Then you double the size of each GDT descriptor, and add a base and limit for an LDT to the descriptor format.
When a program loads a segment register, the CPU selects an LDT based on bits 2-4 of the selector, then uses the rest of the selector to index into that LDT. The LDT descriptor is then used to index into the GDT, and the contents of the GDT descriptor are loaded into the segment register. Because the new GDT descriptor format contains a base and offset for an LDT, and the LDTR has been folded into the segment registers, this causes an LDT switch (for one of the eight LDTs) any time a segment register is loaded. This allows every segment to have a list of other segments that may be loaded when that segment is loaded.
The tricky bit is figuring out how to handle returns from intersegment calls. The code segment containing an executable might have all of the libraries it uses referenced in its LDT, so it can call those libraries, but the libraries probably won't have all the executables that call them referenced in their LDTs, and even if they do, figuring out what selector to push to the stack to make sure that the right segment is returned to would be tricky. One could push a GDT selector, but that would defeat the system: a program could access an arbitrary segment if it knew that segment's GDT selector, simply by writing the selector onto the stack and then popping it. One solution I've considered is a separate "segment escrow stack", which cannot be directly addressed by programs, but to which segment registers can be pushed or popped.
Such a system would have been, I think, both much more powerful and much more performant than the TSS mechanism. If I were introducing a new architecture, I wouldn't base the mechanism for transitioning betweeen protection domains on base:offset segmentation, but it shows how close Intel was to something much more useful than what they ended up designing.