I feel I could mention something here that a colleague of mine suggested some time ago. It may be obvious, but is still worth noting in my opinion.
Basically, as it stands right now, VM solutions are another, we may say easier approach to advance the system architecture. Most of the things that a VM offers can be done more efficiently by the OS, but are hard to do properly in practice. Containers, user mode drivers, application consistent snapshoting, etc., are evolving, but not to the extent that VMs can be replaced. In the meantime, VMs introduce agents, paravirtualization, etc., to get help from the OS and perform their surrogate function more efficiently. OS vendors, on the other hand, are becoming more focused on making competitive virtualization offering and work on the paravirtualization front, non-blocking synchronization, etc. The result is a merger between the OS and virtualization technology. OSes subsume virtualization and it becomes indivisible aspect of their offering.
The point I'm trying to make is that we can use virtualization as example of practical microkernel architecture right now. At least, to some extent. This also applies to other kernel features that "could have been" and have migrated to VM features instead.
Whenever there's some form of isolation between pieces you need some form of IPC to "punch through" the isolation, regardless of whether that IPC is some form of messaging, or something like RPC (Remote Procedure Call).
This is how virtualization implements isolation. The guest VM and device driver VM communicate messages. The driver stack suffers communication overhead, no matter what the kernel architecture. Basically, for that commonplace "enterprise" setup, savings from monolithic kernel design are somewhat debatable. A counterpoint can be made. File system drivers, network protocols, encryption drivers, etc., live in the guest VM and usually forward requests without queuing. Low-level hardware drivers that live in the driver VM benefit queuing to optimize the request schedule, utilize parallelism, and enforce QoS policies. So, with VMs, the two types of drivers communicate messages on a natural boundary, whereas a general microkernel architecture I assume penalizes all driver interactions with message based communication. Unless the protocol drivers can be loaded as stateless shared code in the process of the device drivers?
Not so. When you have "small address spaces" (or isolation with segmentation), you break the isolation by long jumps, and you access external data by accessing new address spaces or by loading new selectors.
In all honesty, I am starved on the x86 protection mechanisms. Will such approach require trap into the kernel before the long jump? I mean, how is the long jump restricted to a proper entry point? I ask, because I have wondered if approach like this can be used for IPC. Especially now, that the address space is large enough to accommodate some applications many times over. I thought, multiple applications could load as a group in the same address space and communicate through traps of sorts. The problem is how to enforce the entry points with sufficiently low overhead to make this useful. Also, it is also not suitable when you map files TBs in size, so the code would probably be given some restrictions. Since the segmentation has been nerfed in x64, this entire premise is somewhat lacking perspective there.