Help me choose a kernel style

simeonz · Post by **simeonz** » Tue Aug 29, 2017 9:16 am

Korona wrote:In general the answers to your questions heavily depend on the kernel/driver design. The answers vary depending on which microkernel you consider.

I am not familiar with actual microkernel designs. I would have quoted some for sure. I am not sure what microkernel would be considered mature and evolved.

Korona wrote:Carefully designed microkernels should be able to recover from driver crashes.

Do you mean that the system services themselves will recover, or that the applications will continue uninterrupted as well? The point I was trying to emphasize in a few of my questions is that applications can be isolated from each other more effectively, not simply because they live in separate address spaces, but because there is restricted data flow between them. What I was speculating is that in the presence of data consumers of a crashed component, be it driver or application, those consumers can be affected. Or are there some kind of transactional infrastructures in microkernels that manage to avoid this?

Korona wrote:However a microkernel alone is not sufficient to contain malicious drivers. You need an IOMMU (e.g. Intel V-d) to do that. Devices like XHCI are capable of DMA from/to arbitrary memory addresses and can thus corrupt even read-only kernel code.

You mean exploits through the devices themselves, like directing the device dma at arbitrary physical memory. But what about exploits that do not involve devices. Imagine that my SMB driver has been exploited and I keep sensitive data on my SMB server. The user application makes request to my driver - "here is the data; please make sure to encrypt it?" But not after sending it to a hundred unsolicited locations. What if my block device driver got exploited, or my network driver? Essentially, my theory is, that if the user data enters the driver stack unencrypted, unsolicited use by exploited and malicious driver code cannot be prevented.

Korona wrote:Single-address space operating systems cannot be made safe without running trusted code (e.g. compiled from a managed language) only. At least this cannot be done without hardware assistance (e.g. something like segmentation + instrumentation, look up how Google's NaCl works).

Thanks. I'll take a look.

OSwhatever · Post by **OSwhatever** » Tue Aug 29, 2017 12:36 pm

dozniak wrote:
OSwhatever wrote:I don't really understand the motivation for that monolithic kernels are "simpler" to write.
It's easy - less moving parts to design and it's easier to redo them in one compilation unit if need be, rather than change full interfaces specfications for 25 to 50 separate components.

Designing interface specifications is a huge task in operating system development. If you don't do that, what on earth will you end up with? Designing interfaces and understand how different system should interact with each other as well as understanding what/how resources should be handled *IS* operating system development. Yes, it is a lot of hard work designing all of this from scratch but I think it is really interesting and rewarding at the same time.

Korona · Post by **Korona** » Tue Aug 29, 2017 1:38 pm

Mikrokernels need significantly more glue code than monolithic kernels. Consider this 1k file that just implements the client side (aka libc side) of a few POSIX file operations. It contains absolutely no logic at all. It does not even contain the protocol serialization and deserialization itself (which is automatically generated from a definition file).

Much of that work is not even designing how the protocol should work but figuring out how it is possible to get acceptable performance despite the stronger separation.

OSwhatever · Post by **OSwhatever** » Tue Aug 29, 2017 2:46 pm

Korona wrote:Mikrokernels need significantly more glue code than monolithic kernels. Consider this 1k file that just implements the client side (aka libc side) of a few POSIX file operations. It contains absolutely no logic at all. It does not even contain the protocol serialization and deserialization itself (which is automatically generated from a definition file).

Much of that work is not even designing how the protocol should work but figuring out how it is possible to get acceptable performance despite the stronger separation.

This is a conversion between POSIX calls and some IPC message format. Of course if your kernel accepts many POSIX calls directly you will not need this conversion but you seem to implicitly mean that a monolithic kernel must implement POSIX system calls. If your monolithic kernel would have another interface the conversion must be done anyway and thus "glue code" as well.

Now IPC serialization/deserialization is an overhead of course but when you implement system calls instead you need to do a lot of checking as well. If buffers are coming from user space you must deal with that and copy them/check them, check every parameter, ensure there is no overflow. System calls aren't exactly free either.

Korona · Post by **Korona** » Wed Aug 30, 2017 12:58 am

OSwhatever wrote:This is a conversion between POSIX calls and some IPC message format. Of course if your kernel accepts many POSIX calls directly you will not need this conversion but you seem to implicitly mean that a monolithic kernel must implement POSIX system calls. If your monolithic kernel would have another interface the conversion must be done anyway and thus "glue code" as well.

That is not really true. The glue code is needed to translate POSIX calls into an IPC protocol that is explicitly crafted to carry POSIX calls. Microkernels by definition do not have a native API to do file I/O. Thus this glue code is needed for any microkernel, even if the OS does not provide a POSIX layer. In this sense, the glue code is the native file I/O API.

Let me elaborate on what I said about performance. Consider the situation where a program does a epoll_wait()-read() loop. Assume that read() sends an IPC message to a driver sitting in another process. How do you implement epoll()? The first thing one probably thinks of is sending a "signal me when the file has (no) pending data" request to the driver. But that does not actually work, as it violates causality! If read() consumes all available data but the "file has no pending data anymore" signal does not arrive in time, epoll_wait() will return a bad result. So in addition to that signal you now need an additional "peek at the current state" message in each epoll_wait() (of course, other solutions like strong ordering guarantees also work) . But that introduces an unacceptable performance penalty and effectively turns epoll() into poll().

I solve this problem by mapping a shared "per-file status page" into each program that has an open file descriptor (effectively making the peek "message" nearly free). The point is, however, that microkernels introduce challenges of a different nature than the ones that monolithic kernels face.

dozniak · Post by **dozniak** » Wed Aug 30, 2017 6:27 am

OSwhatever wrote:
dozniak wrote:
OSwhatever wrote:I don't really understand the motivation for that monolithic kernels are "simpler" to write.
It's easy - less moving parts to design and it's easier to redo them in one compilation unit if need be, rather than change full interfaces specfications for 25 to 50 separate components.
Designing interface specifications is a huge task in operating system development. If you don't do that, what on earth will you end up with? Designing interfaces and understand how different system should interact with each other as well as understanding what/how resources should be handled *IS* operating system development. Yes, it is a lot of hard work designing all of this from scratch but I think it is really interesting and rewarding at the same time.

Well, thanks for not reading what I write. I'll answer with the same and state: (RE-)designing interfaces in a single unit is much simpler than between 25 to 50 separate components which all depend on each other and each other's interfaces.

Now move your eyes up and find the word "simpler" above, then think what you were answering again.

OSwhatever · Post by **OSwhatever** » Wed Aug 30, 2017 1:10 pm

Korona wrote:Let me elaborate on what I said about performance. Consider the situation where a program does a epoll_wait()-read() loop. Assume that read() sends an IPC message to a driver sitting in another process. How do you implement epoll()? The first thing one probably thinks of is sending a "signal me when the file has (no) pending data" request to the driver. But that does not actually work, as it violates causality! If read() consumes all available data but the "file has no pending data anymore" signal does not arrive in time, epoll_wait() will return a bad result. So in addition to that signal you now need an additional "peek at the current state" message in each epoll_wait() (of course, other solutions like strong ordering guarantees also work) . But that introduces an unacceptable performance penalty and effectively turns epoll() into poll().

epoll is a Linux specific system call and I don't understand why you would bring that up. There is no law to implement epoll just because Linux has one. A bit OT now as this thread shouldn't be a micro vs monolithic kernel thread. The short answer is that you don't implement an epoll call and use a message based system that sends a notification message as soon a resource is ready. Select and epoll is not representative for a microkernel or any modern operating system that is designed for asynchronous IO.

One thing that is obvious out of this discussion is that monolithic kernel advocates lives in the Linux land of thinking while microkernel advocates prefers an message/event driven system. epoll and select for me is like, why even bother. Why recreate Linux?

Korona · Post by **Korona** » Wed Aug 30, 2017 2:16 pm

OSwhatever wrote:epoll is a Linux specific system call and I don't understand why you would bring that up. There is no law to implement epoll just because Linux has one. A bit OT now as this thread shouldn't be a micro vs monolithic kernel thread. The short answer is that you don't implement an epoll call and use a message based system that sends a notification message as soon a resource is ready. Select and epoll is not representative for a microkernel or any modern operating system that is designed for asynchronous IO.

I brought up epoll because it is well-known and widely used.

Your argument is a non-sequitur. You first claimed that microkernels and monolithic kernels require the same level of engineering. I replied that microkernels have a much harder time supporting well-established functionality. Your response, that microkernels should not support said functionality and should do everything different than 100% of the OSes that are found in the wild, does not counter my argument.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed Aug 30, 2017 2:38 pm

gungomanj wrote: I like it hard.

...

Naah, too easy.

(OK, OK, so I do like it hard. And soft. And everything in between. What's your point?)

LtG · Post by **LtG** » Wed Sep 06, 2017 3:55 pm

Korona wrote:Your list does not contain the simplest choice: A monolithic kernel.

Everything other than a monolithic kernel will require some glue code between different components. Designing a good microkernel is much harder than designing a good monolithic kernel. Exo- and nanokernels are just extensions of the microkernel concept.

I disagree with the hardness difference between monolithic and micro, I think they're pretty much the same.

With micro I think it's more important to stick to your own "rules" than with monolithic and that with monolithic it can be a bit easier to shoot yourself in the foot. Overall, of the choice given I'd go with micro (I don't consider nano to exist), or possibly exo if that floats your boat.

LtG · Post by **LtG** » Wed Sep 06, 2017 3:59 pm

Korona wrote: One point that should be considered is that performance decreases with increasing abstraction. In terms of performance monolithic > micro > exo, nano. If your kernel needs a context switch to invoke the scheduler or some memory manager, you'll need twice as many context switches as a traditional kernel.

Abstraction itself doesn't decrease performance, and in fact many times doesn't.

As an example with languages (eg. C vs asm), sure, you can achieve the same performance with asm code, but in practice you rarely do. If the abstract language is bad, then it will have bad performance, if it's good and you have a decent+ compiler then you should be able to get as good or better performance _in practice_.

I don't think there's any real world examples where good practical performance comparisons can be made between micro vs mono as no good microkernels exist. I believe micro can have similar performance to mono, but I don't have any proof of that.

I mainly object to the general "abstraction decreases performance" statement.

LtG · Post by **LtG** » Wed Sep 06, 2017 4:21 pm

I realize my three replies (including this) have been with Korona, but these aren't personal attacks =)

Korona wrote: Carefully designed microkernels should be able to recover from driver crashes. However a microkernel alone is not sufficient to contain malicious drivers. You need an IOMMU (e.g. Intel V-d) to do that. Devices like XHCI are capable of DMA from/to arbitrary memory addresses and can thus corrupt even read-only kernel code.

First, you can use a "poor man's" IOMMU for all 32-bit devices, only use 4GiB+ range for kernel/processes and drivers use the first 4GiB. Though not a very good solution.

Alternatively you could make the "kernel" (DMA process, or something) a bit more aware and trusted, and all drivers need to route their DMA request thru it, not sure how far you can get with it.

In practice IOMMU would be by far the best solution.

Korona wrote: Single-address space operating systems cannot be made safe without running trusted code (e.g. compiled from a managed language) only. At least this cannot be done without hardware assistance (e.g. something like segmentation + instrumentation, look up how Google's NaCl works).

If it can be done with hardware, it can be done with software, though for the general case the performance would be poor.

I think the main issue with "generic" languages isn't that they're unmanaged, it's pointers. So if you create a language without pointers it should be safe, and have good performance. You may need some unsafe parts (where pointers are allowed), and that part needs to be inspected or JIT'd, but JIT'ing isn't a requirement and neither is managed.

I would say that single-address space is by far the most different in general and I'm not sure if any real world implementations have ever really existed (outside research, excluding MS-DOS and alike, etc). I'm of course making the assumption here that it's _secure_, not MS-DOS like =)

LtG · Post by **LtG** » Wed Sep 06, 2017 4:23 pm

simeonz wrote:I should probably start another thread, but the more I think about the options that the OP has listed, the more confused I become. Could someone kindly shed light on some of the following questions. It would make the technical context more clear I think.

Can a microkernel perform unclean restart (after state corruption) of a driver, such as a filesystem, or even a network one, and not risk corruption of the state of user applications? Or is it meant to enable clean restarts only (i.e. deadlocked/livelocked/aged driver processes)?

Can a microkernel guarantee confidentiality of the data sent to different processes or drivers from the driver stack, without encrypting the data blocks and storing the keys outside of the normal storage, using some kind of TPM?

Can a microkernel guarantee the authenticity of the data without signing each data block that travels through its messaging system and storing the keys in a TPM?

Is the microkernel persistent security metadata for files implemented inside the filesystem driver, or inside some designated security module?

Can a microkernel schedule eagerly to decrease the servicing latency without risking to increase the number of context switches/TLB invalidations?

How similar or different are paravirtualization and microkernels that implement containers/namespaces/jails? Namely when considering that paravirtualization places higher-level drivers in separate address spaces for each guest, yet the physical space remains protected by the hypervisor, isn't that converging to microkernel architecture anyway, and what are the tradeoffs. I know that lower-level drivers used by virtualization are usually hosted in a single kernel, but I am not sure that this is fundamentally necessary and they cannot be distributed in different guests. In terms of feature set, assuming containers, filesystem snapshots, and microkernel driver processes, what additional options does paravirtualization (specifically) provide?

Can an exokernel provide simultaneously security and shared named resources on partitioned devices (memory, storage) without the use of filesystem driver or equivalent facility that is implemented in separate memory space or running in a different privilege level?

How similar or different are exokernels to virtualization using device passthrough? With device passthrough (such as SR-IOV), each guest is responsible for its own drivers that directly interact with the virtual device functions in the hardware, while the physical memory is protected by MMU and IOMMU. So, each kernel is similar to an exokernel process in a sense.

Can a single address space operating system be implemented securely without JIT-compiled/VM-running managed code? How does it compare performance-wise to microkernel designs? Can managed code be realtime without significant overhead - i.e. can garbage collection be efficient without being of the stop-the-world variety?

In general, none of the things you list are micro specific. So some microkernels might allow driver restarts while another won't. Micro really only means that you push everything out to user space (into processes).

LtG · Post by **LtG** » Wed Sep 06, 2017 4:28 pm

Korona wrote:
OSwhatever wrote:epoll is a Linux specific system call and I don't understand why you would bring that up. There is no law to implement epoll just because Linux has one. A bit OT now as this thread shouldn't be a micro vs monolithic kernel thread. The short answer is that you don't implement an epoll call and use a message based system that sends a notification message as soon a resource is ready. Select and epoll is not representative for a microkernel or any modern operating system that is designed for asynchronous IO.
I brought up epoll because it is well-known and widely used.

Your argument is a non-sequitur. You first claimed that microkernels and monolithic kernels require the same level of engineering. I replied that microkernels have a much harder time supporting well-established functionality. Your response, that microkernels should not support said functionality and should do everything different than 100% of the OSes that are found in the wild, does not counter my argument.

I think the point here is that a monolithic (Linux/POSIX) will want epoll, a microkernel would never have epoll, so the two can't be compared fairly.

For instance for microkernel you might only implement memory mapped files, so you have single message sent to VFS to map fileA to procA address space and then you just read memory, if it's not read into memory when memory is accessed then #PF -> read it into memory.

If you try to implement POSIX on top of messaging then it's probably not going to be as efficient.

The important thing here is that you need to ditch all the commonly existing stuff because the common stuff is monolithic. So you need to redesign the wheel to some extent.

Brendan for instance plans to avoid the messaging overhead to some extent by pooling many messages to be sent simultaneously thus avoiding excessive overhead. My plan is different, but still avoids issues created by POSIX. Thus epoll doesn't really apply.

Instead we should compare two potential applications (both accomplishing the same goal), where we can make some guesstimate as to their performance, one using mono and the other micro approach.

simeonz · Post by **simeonz** » Thu Sep 07, 2017 4:23 am

LtG wrote:I think the main issue with "generic" languages isn't that they're unmanaged, it's pointers. So if you create a language without pointers it should be safe, and have good performance. You may need some unsafe parts (where pointers are allowed), and that part needs to be inspected or JIT'd, but JIT'ing isn't a requirement and neither is managed.

There are quite a few options, come to think of it now. But the effect is the same - performance of a managed language. On the other hand, I agree that manually managing the resource may be more appropriate, considering that this is the only way to guarantee immediate resource reclaiming. So, you are basically right.

LtG wrote:In general, none of the things you list are micro specific. So some microkernels might allow driver restarts while another won't. Micro really only means that you push everything out to user space (into processes).

You mean that micro-kernels can target convenient development and unified APIs, or something else entirely? Because, the killer features for a micro kernel I believe are uptime and security. Let me put it this way - we don't use forks because we like the taste of metal.

OSDev.org

Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style

Re: Help me choose a kernel style