A (maybe) New Approach to Microkernels

JohnnyTheDon · Post by **JohnnyTheDon** » Sun Dec 14, 2008 7:53 pm

After taking a look at the various kernel architectures (monolithic, microkernel, exokernel), I chose microkernel because of its benefits in security and stability. Microkernels tend to have performance issues because of message passing, which is much less efficient than normal system calls. I almost switched to a monolithic design because this performance hit can be quite severe (GNU Hurd runs at about 50% of the speed of most other Unix based OSes). However, I think I have found a solution to this issue. I'm not sure if my solution to this issue has been tried before, but I haven't seen it on any of my searches for OS dev related info.

Instead of using message passing, I use page table permissions (this is very x86_64 specific, because x86 doesn't have enough address space to implement this type of system) to isolate the different parts of the OS. The format of the address space is like this:

| Driver n |
...
| Driver 1 |
| Server n |
...
| Server 2 |
| Server 1 |
| Kernel |
| User Process |

The kernel, each server, and each driver occupy a 1GB block aligned on page directory boundaries. When a user process is executing normally (not using anything from the OS's API) the kernel, servers, and drivers are all set as supervisor mode pages at the page directory level.

When the user process wants to make a system call, it executes a SYSENTER, SYSCALL, or interrupt, which gives the kernel control. The kernel interprets what system call was requested and which server is responsible. It then sets that server's page directory to user mode. Then it uses a SYSLEAVE, SYSRET, or call gate to enter the server in user mode. The server has full access to any buffers in the user process.

It the server needs to call into another server or a driver, it goes through the same process the user process does (call into kernel, kernel changes permissions, etc.). These calls are allowed to stack up if neccessary, and the kernel manages this.

Is this a good system? Do you think it would work? Constructive criticism is welcome.

Colonel Kernel · Post by **Colonel Kernel** » Sun Dec 14, 2008 11:28 pm

It sounds a little bit like the architecture of the Windows CE kernel, although CE is cramming everything into only a 32-bit address space.

In principle, it sounds like it could work, but giving the servers access to the user process' address space removes one of the key benefits of a microkernel -- isolation of components. I'm not sure this could really be described as a microkernel, also because it essentially uses thread migration instead of message passing.

It sounds interesting enough to try out in an experiment though, if you're doing this for research purposes.

Brendan · Post by **Brendan** » Sun Dec 14, 2008 11:44 pm

Hi,

JohnnyTheDon wrote:Is this a good system? Do you think it would work? Constructive criticism is welcome.

I can't think of anything wrong with this design - no major performance problems or other limitations that would make it a bad system.

I think it'd make a small improvement in performance, but maybe not as much difference as you're hoping for. Mostly because when you change the permissions for a page directory you need to invalidate all of the TLB entries that may have been effected (and it may be faster to reload CR3 and flush too much than to have a loop that does INVLPG up to 262144 times). When a page directory is changed from supervisor mode to user mode you could use "lazy TLB invalidation" to avoid this (but that doesn't work when the page directory is changed from user mode to supervisor mode).

Note: I'm not too sure what the biggest performance problem/s with GNU Hurd are. IMHO for most message passing systems the problem isn't the overhead of passing a message, but it's how often messages are passed...

Cheers,

Brendan

Love4Boobies · Post by **Love4Boobies** » Mon Dec 15, 2008 2:29 pm

Maybe a bit OT, but can't IPC be improved by using shared pages instead of message passing? I am aware of the race conditions that need handling, yet if the right mechanisms were implemented through some IPC server's API, they way communication is handled would be up to the clients. Perhaps the use of transactional memory would help here. Correct me if I'm wrong.

JohnnyTheDon · Post by **JohnnyTheDon** » Mon Dec 15, 2008 3:40 pm

I think it'd make a small improvement in performance, but maybe not as much difference as you're hoping for. Mostly because when you change the permissions for a page directory you need to invalidate all of the TLB entries that may have been effected.

Wouldn't a message passing microkernel (not using transactional memory) have to change page tables anyway? If the servers and drivers have their own address space a CR3 change and full TLB flush is necessary to switch to the new address space. Even if transactional memory is used, there will always be a change to the page tables so that the server or driver can be mapped in. This change would be delayed until the server or driver is scheduled, but from the caller's perspective this causes longer response times.

Maybe a bit OT, but can't IPC be improved by using shared pages instead of message passing? I am aware of the race conditions that need handling, yet if the right mechanisms were implemented through some IPC server's API, they way communication is handled would be up to the clients. Perhaps the use of transactional memory would help here. Correct me if I'm wrong.

Yeah, that does seem like a good alternative. The only issue I can think of is who gets access to the transactional memory. If we have multiple processes sharing one block of transactional memory (more than the two that are communicating) requests could be easily forged. And if every two processes that need IPC get their own transactional memory, a server making a call to a driver with data from a user process would need to copy the transactional memory, which with large blocks of data could become an issue.

Colonel Kernel · Post by **Colonel Kernel** » Tue Dec 16, 2008 12:50 am

AFAIK, transactional memory is completely orthogonal to the implementation of message passing. It is a way of implementing atomicity without locks. How is it supposed to help...?

Regarding sending messages by sharing pages, I think BCOS does that already (Brendan?). I know that Mach used to.

Love4Boobies · Post by **Love4Boobies** » Tue Dec 16, 2008 9:54 am

Colonel Kernel wrote:AFAIK, transactional memory is completely orthogonal to the implementation of message passing. It is a way of implementing atomicity without locks. How is it supposed to help...?

Yep, that's right. What I meant was using transactional memory to avoid race conditions - the message passing mechanism would still be shared memory.

Venkatesh · Post by **Venkatesh** » Tue Dec 16, 2008 11:23 pm

On x86 and (newer) x86_64), segments can help you do better; servers need not be supervisor-level.

http://i30www.ira.uka.de/research/docum ... -spaces.ps
(Improved Address Space Switching on Pentium Processors by Transparently Multiplexing User Address Spaces)

EROS and Coyotos have a similar mechanism, called 'small spaces'.

JohnnyTheDon · Post by **JohnnyTheDon** » Wed Dec 17, 2008 12:02 pm

My servers are not supervisor level. They run at user level. The whole point of calling to the kernel and then altering page tables is to allow user level code to access the servers in a controlled manner.

And bases and limits of segments are ignored on x86_64.

pillow · Post by **pillow** » Tue Jan 27, 2009 4:26 pm

One possible problem is that a bug in the server could bring down the calling user application as well. With a traditional microkernel design it may be possible to have the user app call the server, which then crashes and is restarted by the kernel, and then processes the request successfully and returns to the user app as if nothing bad had happened. In theory.

I'm not sure how feasible it is to implement something like this in practice anyway, though.

A buggy server could also bring down other functioning servers or drivers if it depends on them (causing their memory space to be flipped to PL3) and then subsequently overwrites them with bad data.

iammisc · Post by **iammisc** » Tue Jan 27, 2009 9:49 pm

I haven't read the whole post so forgive me if this has been mentioned before. I think that this idea is good but it does reduce one of the key microkernel benefits: isolation (I know someone said this before).

Anyway, my solution to this would be to flip the page privelege level to supervisor mode for all the other pages except the currently executing server's pages. So any attempt by the server to access the userspace program would result in an error. This way, IPC can be implemented cleanly in a microkernel and still be fairly fast.

In fact, just a couple of days ago, I've been implementing a system fairly similar. In my system, servers run as separate processes. However, a part of the server can be compiled as position independent and with one message to my main system server (sysd), the server can send portions of itself to processes who request it. I call this system quickrpc (yeah, I know, it's very original). For example, my vfs server tells sysd that it is quickrpc capable and specifies the start and begin addresses to the quickrpc executable code ( I use linker script magic to accomplish the embedding of both static code and relocatable code in one elf file ). Anyway, a process uses a userspace library which handles the mapping in of quickrpc components. For example, a process calls my rpc library with a message to the vfs server. The rpc library sees that this process wants to use quickrpc and it passes the message onto the quickrpc library (if the process doesn't want to use quickrpc, then the message is sent through normal rpc ). Then the quickrpc library checks to see if the requested server is mapped in. If it is, the message is sent ( by invoking a special system call which changes permissions, etc.). If the server is not mapped in, the process can call the sysd module to map it into the current address space.

Each quickrpc module has both system wide memory and per-process memory. By using the quickrpc library in server mode, it can allocate and use memory in these two separate memory spaces. The actual server process can also access the same memory that the quickrpc server component can handle (it can even access the per-process memory).

This system, IMHO, is pretty cool and I think it works reasonably well, although I still haven't implemented it fully and there are still some bugs to work out.

OSDev.org

A (maybe) New Approach to Microkernels

A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels

Re: A (maybe) New Approach to Microkernels