microkernel development

rdos · Post by **rdos** » Wed Sep 28, 2016 1:45 am

Brendan wrote: Whenever there's some form of isolation between pieces you need some form of IPC to "punch through" the isolation, regardless of whether that IPC is some form of messaging, or something like RPC (Remote Procedure Call).

Not so. When you have "small address spaces" (or isolation with segmentation), you break the isolation by long jumps, and you access external data by accessing new address spaces or by loading new selectors. This is similar to the messaging mechanism, but much faster. The small address spaces or segmentation isolation is not perfect, but then you can do things wrong with IPC as well.

abhoriel · Post by **abhoriel** » Wed Sep 28, 2016 1:37 pm

While IPC overhead may be reduced by using segmentation tricks (this was done successfully on L4 iirc) and I appreciate the addition of this interesting point to the discussion, I do not favour this approach for practical reasons. Segmentation is only really supported on x86 running legacy mode, therefore this technique massively sacrifices portability.

Brendan · Post by **Brendan** » Thu Sep 29, 2016 2:18 am

Hi,

rdos wrote:
Brendan wrote: Whenever there's some form of isolation between pieces you need some form of IPC to "punch through" the isolation, regardless of whether that IPC is some form of messaging, or something like RPC (Remote Procedure Call).
Not so. When you have "small address spaces" (or isolation with segmentation), you break the isolation by long jumps, and you access external data by accessing new address spaces or by loading new selectors. This is similar to the messaging mechanism, but much faster. The small address spaces or segmentation isolation is not perfect, but then you can do things wrong with IPC as well.

First you say "not so", then you say "this is similar to the messaging mechanism". I hear schizophrenia can be a serious mental issue...

Cheers,

Brendan

simeonz · Post by **simeonz** » Thu Sep 29, 2016 2:03 pm

I feel I could mention something here that a colleague of mine suggested some time ago. It may be obvious, but is still worth noting in my opinion.

Basically, as it stands right now, VM solutions are another, we may say easier approach to advance the system architecture. Most of the things that a VM offers can be done more efficiently by the OS, but are hard to do properly in practice. Containers, user mode drivers, application consistent snapshoting, etc., are evolving, but not to the extent that VMs can be replaced. In the meantime, VMs introduce agents, paravirtualization, etc., to get help from the OS and perform their surrogate function more efficiently. OS vendors, on the other hand, are becoming more focused on making competitive virtualization offering and work on the paravirtualization front, non-blocking synchronization, etc. The result is a merger between the OS and virtualization technology. OSes subsume virtualization and it becomes indivisible aspect of their offering.

The point I'm trying to make is that we can use virtualization as example of practical microkernel architecture right now. At least, to some extent. This also applies to other kernel features that "could have been" and have migrated to VM features instead.

Whenever there's some form of isolation between pieces you need some form of IPC to "punch through" the isolation, regardless of whether that IPC is some form of messaging, or something like RPC (Remote Procedure Call).

This is how virtualization implements isolation. The guest VM and device driver VM communicate messages. The driver stack suffers communication overhead, no matter what the kernel architecture. Basically, for that commonplace "enterprise" setup, savings from monolithic kernel design are somewhat debatable. A counterpoint can be made. File system drivers, network protocols, encryption drivers, etc., live in the guest VM and usually forward requests without queuing. Low-level hardware drivers that live in the driver VM benefit queuing to optimize the request schedule, utilize parallelism, and enforce QoS policies. So, with VMs, the two types of drivers communicate messages on a natural boundary, whereas a general microkernel architecture I assume penalizes all driver interactions with message based communication. Unless the protocol drivers can be loaded as stateless shared code in the process of the device drivers?

Not so. When you have "small address spaces" (or isolation with segmentation), you break the isolation by long jumps, and you access external data by accessing new address spaces or by loading new selectors.

In all honesty, I am starved on the x86 protection mechanisms. Will such approach require trap into the kernel before the long jump? I mean, how is the long jump restricted to a proper entry point? I ask, because I have wondered if approach like this can be used for IPC. Especially now, that the address space is large enough to accommodate some applications many times over. I thought, multiple applications could load as a group in the same address space and communicate through traps of sorts. The problem is how to enforce the entry points with sufficiently low overhead to make this useful. Also, it is also not suitable when you map files TBs in size, so the code would probably be given some restrictions. Since the segmentation has been nerfed in x64, this entire premise is somewhat lacking perspective there.

rdos · Post by **rdos** » Fri Oct 14, 2016 2:12 am

simeonz wrote: In all honesty, I am starved on the x86 protection mechanisms. Will such approach require trap into the kernel before the long jump? I mean, how is the long jump restricted to a proper entry point? I ask, because I have wondered if approach like this can be used for IPC. Especially now, that the address space is large enough to accommodate some applications many times over. I thought, multiple applications could load as a group in the same address space and communicate through traps of sorts.

My main concern was with the OS and drivers, making sure they are not linked into a huge file, and loaded close to each others so they can accidentally corrupt each others.

simeonz wrote: The problem is how to enforce the entry points with sufficiently low overhead to make this useful. Also, it is also not suitable when you map files TBs in size, so the code would probably be given some restrictions. Since the segmentation has been nerfed in x64, this entire premise is somewhat lacking perspective there.

x64 has its own form of segmentation. If you use RIP-relative addressing, you cannot access more than a 4GB space without loading fixed 64-bit addresses. That works as a primitive form of segmentation, and the address space can be partitioned into 65536 distinct areas, which is a lot more than the 8192 GDT selectors in x86.

Octocontrabass · Post by **Octocontrabass** » Fri Oct 14, 2016 3:33 am

rdos wrote:x64 has its own form of segmentation. If you use RIP-relative addressing, you cannot access more than a 4GB space without loading fixed 64-bit addresses. That works as a primitive form of segmentation, and the address space can be partitioned into 65536 distinct areas, which is a lot more than the 8192 GDT selectors in x86.

But, unlike real segmentation, there is no protection between the different address spaces.

rdos · Post by **rdos** » Fri Oct 14, 2016 8:44 am

Octocontrabass wrote:
rdos wrote:x64 has its own form of segmentation. If you use RIP-relative addressing, you cannot access more than a 4GB space without loading fixed 64-bit addresses. That works as a primitive form of segmentation, and the address space can be partitioned into 65536 distinct areas, which is a lot more than the 8192 GDT selectors in x86.
But, unlike real segmentation, there is no protection between the different address spaces.

In kernel space, there is nothing stopping you from loading any GDT selector, but if you load CS and DS with unique values per driver, the driver will normally use only those selectors. It's similar as an x64 driver being confined to its own 4GB address space. In both case, you can load other selectors / fixed 64-bit addresses, but it's only those operations that allows the driver to use data outside it's own address space. You can see those operations as similar as IPC in a microkernel.

Brendan · Post by **Brendan** » Fri Oct 14, 2016 9:55 am

Hi,

rdos wrote:In kernel space, there is nothing stopping you from loading any GDT selector, but if you load CS and DS with unique values per driver, the driver will normally use only those selectors. It's similar as an x64 driver being confined to its own 4GB address space. In both case, you can load other selectors / fixed 64-bit addresses, but it's only those operations that allows the driver to use data outside it's own address space. You can see those operations as similar as IPC in a microkernel.

This is like drawing a picture of a lock on your door with pink crayon, to keep thieves out.

Cheers,

Brendan

simeonz · Post by **simeonz** » Fri Oct 14, 2016 10:32 am

My main concern was with the OS and drivers, making sure they are not linked into a huge file, and loaded close to each others so they can accidentally corrupt each others.

I suspected the use case was different. It will work for controlled software environments, like embedded use. A slightly better malware resistance may be possible with micro-kernels based on address spaces, so some selling points may be lost this way.

x64 has its own form of segmentation. If you use RIP-relative addressing, you cannot access more than a 4GB space without loading fixed 64-bit addresses.

The driver may want to perform call to 64-bit address through a pointer (e.g. interface pointers, global callbacks, pointer-based switching). Again, speaking in the context of cooperative/advisory safety strategies, you could try to remove that entire programming gimmick from the programmer's vocabulary, or constrain it to a small number of use cases, using some kind of helper apis. Otherwise, with the right kind of buffer overflow, you may end up changing a bit or two in a function pointer and call somewhere else.

Overall, I see the point though - to improve the safety, not to create uncompromising isolation. In the end, it is better than the monolithic kernels today and comes almost for free (especially in terms of latency.)

For the user-space case that I discussed, the entry points could be enforced with "call gates" in x86, but this technique is also fruitless in x64, due to lack of proper segmentation.

rdos · Post by **rdos** » Fri Oct 14, 2016 1:36 pm

simeonz wrote: Overall, I see the point though - to improve the safety, not to create uncompromising isolation. In the end, it is better than the monolithic kernels today and comes almost for free (especially in terms of latency.)

At least at a much lower cost than IPC. For the x86 solution, it does perform slightly slower than a flat memory model, but still much faster than isolation with address-spaces. For the x64 solution, there will be more TLB misses, but that's still minor to full address space switches.

simeonz wrote: For the user-space case that I discussed, the entry points could be enforced with "call gates" in x86, but this technique is also fruitless in x64, due to lack of proper segmentation.

Yes. I do that when running in protected mode. In long mode, call gates no longer are supported, so I'll have to revert to sysenter there. Still, this works seemlessly by patching the request at run-time either to a call gate or to an x64-style syscall.

OSDev.org

microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development

Re: microkernel development