Page 1 of 4
the suitability of microkernels for POSIX
Posted: Thu Sep 29, 2016 1:58 pm
by abhoriel
There's a lot of talk about microkernels being unsuitable for creating POSIX based operating systems. This is due to the fact that the POSIX API (and POSIX programs) in general rely on synchronous system calls, such that in many cases each system call forces at least two context switches (eg. app -> VFS -> app)
Is it therefore possible to create a useful microkernel-based general purpose operating system using the POSIX API? What about QNX? Do newer features (at least on x86...) such as TLB tagging help mitigate the context switch overhead?
Could a microkernel architecture potentially scale better (on increasingly SMP systems) by reducing TLB shootdowns? Surely any changes to a monolithic kernel's memory mapping (eg. from adjusting the kernel heap size) must cause a TLB shutdown on all cores, as the monolithic kernel's memory is mapped into every application's address space. If (as in a microkernel) the operating system is divided up into several separate address spaces, TLB shootdown should be much less problematic, as eg. if the ext2 driver wants to allocates memory on the heap causing a change to its page tables, shootdown will only need to occur on the cores which are currently using the ext2 driver's address space. Are there other instances in which the microkernel architecture could be nicely scalable?
Re: the suitability of microkernels for POSIX
Posted: Thu Sep 29, 2016 4:08 pm
by Brendan
Hi,
abhoriel wrote:There's a lot of talk about microkernels being unsuitable for creating POSIX based operating systems. This is due to the fact that the POSIX API (and POSIX programs) in general rely on synchronous system calls, such that in many cases each system call forces at least two context switches (eg. app -> VFS -> app)
Is it therefore possible to create a useful microkernel-based general purpose operating system using the POSIX API? What about QNX? Do newer features (at least on x86...) such as TLB tagging help mitigate the context switch overhead?
It's entirely possible (and has been done multiple times before). The problem is that you can't mitigate all the overhead; and it's easy for people to benchmark the disadvantages for end users (performance) and hard to benchmark the advantages for end users; and this makes it hard to convince end users that the advantages outweigh the disadvantages.
abhoriel wrote:Could a microkernel architecture potentially scale better (on increasingly SMP systems) by reducing TLB shootdowns? Surely any changes to a monolithic kernel's memory mapping (eg. from adjusting the kernel heap size) must cause a TLB shutdown on all cores, as the monolithic kernel's memory is mapped into every application's address space. If (as in a microkernel) the operating system is divided up into several separate address spaces, TLB shootdown should be much less problematic, as eg. if the ext2 driver wants to allocates memory on the heap causing a change to its page tables, shootdown will only need to occur on the cores which are currently using the ext2 driver's address space. Are there other instances in which the microkernel architecture could be nicely scalable?
It is possible for a micro-kernel architecture to potentially scale better, but...
In general, there are 2 very different types of message passing (synchronous and asynchronous) that lead to micro-kernels with very different characteristics.
For synchronous message passing, when you send a request you must block waiting for a reply, which means that you can't avoid doing a task switch every time you send a request. Because task switches can't be avoided; mitigating the overhead of message passing means trying to make task switches fast. Scalability in this case is poor - when the sender and receiver are on different CPUs the overhead doubles (you do a task switch on one CPU because the sending task had to block waiting for a reply, and you have to do a task switch on another CPU because the receiving task unblocked when it received the message, so it's 2 task switches instead of one task switch from sender to receiver).
Asynchronous message passing avoids this - task switching is de-coupled from messaging passing (e.g. you can send 20 messages without blocking, then do one task switch). For asynchronous message passing you need buffers/message queues (typically in kernel space), and you have the overhead of managing those queues. Scalability can be very good (e.g. 2 tasks running on different CPUs can communicate without any task switching happening at all). However, you also need software designed specifically for asynchronous message passing (e.g. synchronous system calls, like POSIX, mean you can't avoid or amortise the task switches and just end up with the extra overhead of buffers/queues).
Cheers,
Brendan
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 12:27 am
by Boris
Posix does have async IO methods.
They are not widely used because :
- people are not yet well used to asynchronous ( just look at this forum, where people are said to replace their pooling drivers by interrupt driven ones.)
- You don't feel the need of using them. And you can always
avoid them using things like Select ( which against, pools the OS instead of giving it a callback..)
- People may think that async IO does bring overhead. It sure brings code overhead. It may even brings overhead when the data can be given immediately by the monolithic kernel ( ex : fread on a file in cache ).
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 12:48 am
by Brendan
Hi,
Boris wrote:Posix does have async IO methods.
POSIX and the standard C library have async IO alternatives for "read()" and "write()", which adds up to 2 out of about 100 functions...
For a simple example, try reading a directory's contents and see how many synchronous system calls you can't avoid; then print the directory contents to stdout to see how painful it is to avoid "synchronous write to stdout".
Cheers,
Brendan
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 1:08 am
by Kevin
Don't forget that while having AIO for everything is very nice in theory, in practice using it is a PITA and even just because of the complexity most people will avoid it if there is no really compelling reason to use it.
The other point is that in many cases there is no reason at all. If you're going to read the directory contents and then print it to stdout, you just have to read the data before you can output it. There is no parallelism in there, even with AIO you'd send the read and then wait for it to complete without doing anything else in parallel. (In theory you could split reading the directory into multiple requests and then do the output in parallel, but unless you have huge directories, that will make the requests too small and hurt performance rather than helping it.) I would guess that most I/O is actually of this type where even with an AIO interface you would wait for completion and effectively do synchronous I/O.
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 1:41 am
by Brendan
Hi,
Kevin wrote:Don't forget that while having AIO for everything is very nice in theory, in practice using it is a PITA and even just because of the complexity most people will avoid it if there is no really compelling reason to use it.
For "everything asynchronous", most of the time you end up a kind of finite state machine (with "switch(message.type)" in a loop), and once you get used to it it's not harder at all. The problem is that you do need to get used to it first.
Kevin wrote:The other point is that in many cases there is no reason at all.
In many cases you can choose between very poor performance and poor scalability, and dealing with a large number of threads and all the locking/concurrency issues to get "almost as good" (but still worse).
Kevin wrote:If you're going to read the directory contents and then print it to stdout, you just have to read the data before you can output it. There is no parallelism in there, even with AIO you'd send the read and then wait for it to complete without doing anything else in parallel.
For this trivial example; there is parallelism - you can process one piece and output it while you're waiting for the next piece to arrive (without the inconvenience of setting up a "worker thread"). More importantly, for "everything asynchronous" it's easier to do it this way than it is to do it the slower way.
Kevin wrote:(In theory you could split reading the directory into multiple requests and then do the output in parallel, but unless you have huge directories, that will make the requests too small and hurt performance rather than helping it.) I would guess that most I/O is actually of this type where even with an AIO interface you would wait for completion and effectively do synchronous I/O.
For POSIX where "asynchronous" is too painful for anyone to bother, most IO ends up being synchronous. For "everything asynchronous" where "synchronous" is too painful for anyone to bother, most IO ends up being asynchronous.
Cheers,
Brendan
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 1:55 am
by Sik
I'd imagine that parsing file formats is where it gets at its worst though. Most of the time you simply can't move forward until you've parsed the previous chunk of data, which requires waiting for it to arrive. That's not even a case of code being designed as synchronious, it's a case of the data being so. Even those formats designed to let you hoop around still require you to read the structures first. And of course when writing the files you need to write everything in sequence, you can't pass the data in a random order. Food for thought.
That said, wouldn't an entirely asynchronous be purely event driven, with no concept of main loop at all? (possibly with multiple events being fired literally simultaneously into their own threads)
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 2:17 am
by Brendan
Hi,
Sik wrote:I'd imagine that parsing file formats is where it gets at its worst though. Most of the time you simply can't move forward until you've parsed the previous chunk of data, which requires waiting for it to arrive. That's not even a case of code being designed as synchronious, it's a case of the data being so. Even those formats designed to let you hoop around still require you to read the structures first. And of course when writing the files you need to write everything in sequence, you can't pass the data in a random order. Food for thought.
Even for "worst case file formats", you can parse one piece while waiting for the next piece, and/or pre-fetch the file while you're doing other stuff and/or be parsing many files at the same time.
Sik wrote:That said, wouldn't an entirely asynchronous be purely event driven, with no concept of main loop at all? (possibly with multiple events being fired literally simultaneously into their own threads)
The simplest case involves one "main" loop, like this:
Code: Select all
for(ever) {
getMessage();
switch(message.type) {
case A:
handleA();
break;
case B:
handleB();
break;
}
}
For more complex cases you can have multiple loops (e.g. several that are only used during initialisation, plus several more for running). This is like breaking a large finite state machine into a nested "finite state machine of finite state machines".
There are also ways (e.g. callbacks) to hide the loop/s.
Cheers,
Brendan
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 3:47 am
by Kevin
Brendan wrote:For "everything asynchronous", most of the time you end up a kind of finite state machine (with "switch(message.type)" in a loop), and once you get used to it it's not harder at all. The problem is that you do need to get used to it first.
Yes, it is harder and results in less readable code. It's the difference between structured programming and gotos all over the place. And then you have some boilerplate code for stater transitions between each asynchronous operation, you don't keep your stack contents across async ops, etc.
Such event loops are great if your program is doing rather trivial things. It stops being fun when your code becomes a little more complex.
For this trivial example; there is parallelism - you can process one piece and output it while you're waiting for the next piece to arrive (without the inconvenience of setting up a "worker thread"). More importantly, for "everything asynchronous" it's easier to do it this way than it is to do it the slower way.
My point (which you ignored by tactical quote splitting) was that with common directory sizes, you would always send
one request and get
one response, and splitting them into smaller pieces in order to allow things to actually run in parallel would hurt performance rather than improving it.
In other words, while there may be some programs for which this actually matters, using AIO would be premature optimisation for at least 95% of the cases where the directory entries are read and would come with costs, but no benefits.
For POSIX where "asynchronous" is too painful for anyone to bother, most IO ends up being synchronous. For "everything asynchronous" where "synchronous" is too painful for anyone to bother, most IO ends up being asynchronous.
Please show me your API that makes emulating synchronous requests hard while making async requests easy. Unless you're rather inventive with artificial restrictions in the programming language, this seems rather unlikely to happen.
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 4:26 am
by Brendan
Hi,
Kevin wrote:Brendan wrote:For "everything asynchronous", most of the time you end up a kind of finite state machine (with "switch(message.type)" in a loop), and once you get used to it it's not harder at all. The problem is that you do need to get used to it first.
Yes, it is harder and results in less readable code. It's the difference between structured programming and gotos all over the place. And then you have some boilerplate code for stater transitions between each asynchronous operation, you don't keep your stack contents across async ops, etc.
It's more readable because you're only focusing on one "event" at a time, biolerplate code exists regardless, the stack contents are irrelevant ("state" becomes part of the the finite state machine).
Kevin wrote:Such event loops are great if your program is doing rather trivial things. It stops being fun when your code becomes a little more complex.
This is true for everything that involves programming.
Kevin wrote:For this trivial example; there is parallelism - you can process one piece and output it while you're waiting for the next piece to arrive (without the inconvenience of setting up a "worker thread"). More importantly, for "everything asynchronous" it's easier to do it this way than it is to do it the slower way.
My point (which you ignored by tactical quote splitting) was that with common directory sizes, you would always send
one request and get
one response, and splitting them into smaller pieces in order to allow things to actually run in parallel would hurt performance rather than improving it.
You're saying that a deliberately trivial example was trivial while failing to extrapolate to anything that isn't.
Kevin wrote:In other words, while there may be some programs for which this actually matters, using AIO would be premature optimisation for at least 95% of the cases where the directory entries are read and would come with costs, but no benefits.
And now the deliberately trivial example that you've failed to extrapolate from has become 95% of all cases. Awesome.
Kevin wrote:For POSIX where "asynchronous" is too painful for anyone to bother, most IO ends up being synchronous. For "everything asynchronous" where "synchronous" is too painful for anyone to bother, most IO ends up being asynchronous.
Please show me your API that makes emulating synchronous requests hard while making async requests easy. Unless you're rather inventive with artificial restrictions in the programming language, this seems rather unlikely to happen.
You really do have no idea what you're talking about.
For mine; you can receive a message from any sender at any time. You send a "open file request", you might get a "user pressed the cursor key" message, then a "kernel is running low on memory" message, then a "process you started has terminated" message, then 1234 more unrelated messages, then the "open file reply". There is no guaranteed order (e.g. X sends a message to Y and Z at the same time; when Y receives the message from X it sends a message to Z; Z can receive the later message from Y before it receives the earlier message from Z). There is also no guaranteed delivery (successfully sent does not imply it will be successfully received).
For the API, there is only "sendMessage()", "getMessage()" and "getMessageWithTimeout()".
Cheers,
Brendan
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 7:00 am
by Kevin
Brendan wrote:It's more readable because you're only focusing on one "event" at a time, biolerplate code exists regardless, the stack contents are irrelevant ("state" becomes part of the the finite state machine).
It's less readable because in an "everything aynchronous" model a single event contains almost no code, so focussing on it doesn't give you a lot of information. What's missing is the big picture, the connection between the states. It's all there, but it's less visible.
You're saying that a deliberately trivial example was trivial while failing to extrapolate to anything that isn't.
[...]
And now the deliberately trivial example that you've failed to extrapolate from has become 95% of all cases. Awesome.
Maybe what I'm saying is that a huge part of real-world applications are what you would call trivial.
For mine; you can receive a message from any sender at any time. You send a "open file request", you might get a "user pressed the cursor key" message, then a "kernel is running low on memory" message, then a "process you started has terminated" message, then 1234 more unrelated messages, then the "open file reply". There is no guaranteed order (e.g. X sends a message to Y and Z at the same time; when Y receives the message from X it sends a message to Z; Z can receive the later message from Y before it receives the earlier message from Z). There is also no guaranteed delivery (successfully sent does not imply it will be successfully received).
Great. Now how is waiting for AIO completion before you do something else (i.e. emulating synchronous operations) so hard in this model that nobody would bother doing that? It seems fairly trivial to me.
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 10:23 am
by Rusky
Async event loops are a pain because the logical flow of a single conceptual process is split up across a bunch of switch cases. This is not a huge deal with straight-line code because you can still just write it in order with "case MessageN:" sprinkled in where you would ordinarily block. On the other hand, once you have conditional or looping control flow you've thrown out the ability to use the language's native control structures and that's where the pain comes in.
This is exactly the saga the node.js community is going through:
- Much like what Brendan wants, everything is asynchronous (modulo some irrelevant edge cases), and the programming model prevents simulated synchronous I/O - your code always runs in the context of a pre-existing event loop, with no way to get a new event other than returning and getting re-called.
- Plain callbacks made this hellish (unbounded nesting of functions) so they switched to a promise-based API, to remove the nesting:
Code: Select all
do_IO_1().then(function (result1) {
return do_IO_2(result1);
}).then(function (result2) {
return do_IO_3(result2);
}).then (function (result3) { ... });
- Promises with .then(callback) methods still prevent you from using the language's control structures, so now they're adding async/await to Javascript:
Code: Select all
let result1 = await do_IO_1();
let result2 = await do_IO_2(result1);
let result3 = await do_IO_3(result2);
So we have some empirical evidence that even in situations where people can't fall back on synchronous IO
or simulate it, they still won't just "get used to it," because it is an objectively less-clear way to write a program (I'm not saying nobody will do it- for example, Nginx is written this way because C leaves no way around it). Writing event loops manually is the async equivalent of writing a program exclusively with goto statements.
Thus, async/await is a compiler feature analogous to if/while/etc structures. The source code can use typical conditional and looping control flow, but the generated code is equivalent to the hand-written switch statement. Further, the promise values that are awaited make it easy to parallelize IO by combining them with functions like whenAny() or whenAll().
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 1:27 pm
by Kevin
Thanks, Rusky.
Rusky wrote:So we have some empirical evidence that even in situations where people can't fall back on synchronous IO or simulate it, they still won't just "get used to it," because it is an objectively less-clear way to write a program (I'm not saying nobody will do it- for example, Nginx is written this way because C leaves no way around it). Writing event loops manually is the async equivalent of writing a program exclusively with goto statements.
qemu uses coroutines in order to get around the callback maze (or a single huge switch) in C, and that is a fairly nice interface to use. But of course it adds the context switch back, so I doubt Brendan will accept this one.
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 2:25 pm
by Brendan
Hi,
Kevin wrote:Brendan wrote:It's more readable because you're only focusing on one "event" at a time, biolerplate code exists regardless, the stack contents are irrelevant ("state" becomes part of the the finite state machine).
It's less readable because in an "everything aynchronous" model a single event contains almost no code, so focussing on it doesn't give you a lot of information. What's missing is the big picture, the connection between the states. It's all there, but it's less visible.
A back-end process for a spreadsheet receives a "change cell request" and has to modify that cell, re-evaluate all cells and send a list of which cells changed values back to the front-end process, using almost no code?
An arbitrary precision maths service receives a message requesting that a complex expression is evaluated, and handling this message (evaluating a complex expression with arbitrary precision rational numbers) involves almost no code?
A spell checker service receives a message requesting that a paragraph of Spanish be checked, and handling this message (trying to find all the words in a dictionary and finding "closest match" for those that are misspelt) involves almost no code?
A video driver receives nothing (a timeout expires) and it knows it needs to render the next frame, so it processes thousands of polygons to generate millions of pixels using almost no code?
The big picture is that as long as each individual message and/or timeout is handled correctly, nothing else matters.
Kevin wrote:You're saying that a deliberately trivial example was trivial while failing to extrapolate to anything that isn't.
[...]
And now the deliberately trivial example that you've failed to extrapolate from has become 95% of all cases. Awesome.
Maybe what I'm saying is that a huge part of real-world applications are what you would call trivial.
You're saying "everything is so hard that it's unreadable/unmaintainable" (above) while simultaneously saying "everything is so trivial there's no benefits" (here). Is it hard, or trivial? Pick one.
Kevin wrote:For mine; you can receive a message from any sender at any time. You send a "open file request", you might get a "user pressed the cursor key" message, then a "kernel is running low on memory" message, then a "process you started has terminated" message, then 1234 more unrelated messages, then the "open file reply". There is no guaranteed order (e.g. X sends a message to Y and Z at the same time; when Y receives the message from X it sends a message to Z; Z can receive the later message from Y before it receives the earlier message from Z). There is also no guaranteed delivery (successfully sent does not imply it will be successfully received).
Great. Now how is waiting for AIO completion before you do something else (i.e. emulating synchronous operations) so hard in this model that nobody would bother doing that? It seems fairly trivial to me.
It's impossible. The closest you can get is to handle messages asynchronously as they arrive (while waiting forever for a completion that may never arrive) by copying them into your own queue for later (until your 32-bit process runs out of virtual address space after storing 1000 or more messages that are up to 2 MiB each); while everything else that is trying to communicate with you decides you're not responding, even when one of those processes that you're failing to respond to is trying to tell you to cancel the request that caused you to need the IO operation that you're hoping might complete.
Cheers,
Brendan
Re: the suitability of microkernels for POSIX
Posted: Fri Sep 30, 2016 2:49 pm
by Brendan
Hi,
Rusky wrote:Async event loops are a pain because the logical flow of a single conceptual process is split up across a bunch of switch cases.
It makes you think about software differently and takes a little getting used to.
Rusky wrote:So we have some empirical evidence that even in situations where people can't fall back on synchronous IO or simulate it, they still won't just "get used to it," because it is an objectively less-clear way to write a program (I'm not saying nobody will do it- for example, Nginx is written this way because C leaves no way around it).
We have empirical evidence that suggests that, as a huge additional bonus, another benefit of async event loops is that it may prevent a "vocal minority" of web developers (a group of people that I've always had a very low opinion of) who failed to get used to Node.js away from my OS.
Sadly, we also have empirical evidence that suggests that a huge number of web developers were able to work with Node.Js (which has been considerably successful because it's designed for "large number of concurrent/async requests") or one of the
many alternatives that exist because it works so well, which implies that (unfortunately) async message handling alone won't be enough to keep most "web developers" away from my OS.
Cheers,
Brendan