Fully asynchronous OS design

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Fully asynchronous OS design

Post by jal »

Ok, in various other threads these things (i.e. principles of asynchronous message passing) have been discussed to a certain extent, and I've been thinking about designing a fully asynchronous micro kernel. A few design principles more or less agreed upon:
1) Code is wrapped in objects, objects have one or more public interfaces, sending a message means sending a set of parameters to a method of an interface.
2) No explicit threads. When passing a message to another object, a new thread is started, dealing with that message.
3) Fully asynchronous message passing, when sending a message, the sender continous directly. The answer to this message, if any, is sent to a method of the calling object (automatically creating another thread).
4) The only non-message 'call' that can be made is the 'send message' (and perhaps 'reply message') call to the kernel. All other communication between objects (including kernel objects) is done via message passing.

Ok, so far so good. Keep in mind, this is purely a research OS, I know it's gonna be slow as hell etc. (especially since the kernel needs to send a message to the memory manager upon receipt of a send message request from an object, to allocate some memory for the receiving proces before copying or paging in the message).

One of the question that have arisen is this: the kernel (or better, the kernel objects) provide services like synchronisation primitives (semaphores, critical sections, etc.). To request e.g. a criticial sections, a message is sent to the kernel. This is, like any other message asynchronous. But this means that before I actually get the cs, I have to wait for the answer of the kernel object. But that'll be sent to me asynchronously, in a seperate thread. So what do I do in the mean time? Spinlocking until the receiving thread has set some variable to indicate we have the cs? Seems too cumbersome. However, making requesting a cs synchronous instead of asynchronous also seems not the best solution, since I don't like some synchronicity kreeping in an otherwise asynchronous system. Your ideas?


JAL
jvff
Member
Member
Posts: 46
Joined: Sun Oct 24, 2004 11:00 pm

Post by jvff »

Hello,

I'm working in a similar design for an application framework, although with different goals. My solution to synchronization is inspired by contracts, state-machines and continuations. When you ask something that is a blocker, you also send the destination (continuation, which in general is another state of your program, designed to specifically respond to that message, and this state can be dynamically generated by code), and then let the sending state finish and die. When the kernel is done processing, it sends the result to your other state, and you continue processing. As you can see, your program becomes a state machine, in a way similar to the actor's model. It also resembles the "design-by-contracts" present in IIRC Sing# programming language from Singularity. As for speed, I plan to use some partial evaluation schemes to prevent excessive messaging (this is for the future as current plan is to make something work). Hope this helps,

JVFF
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Post by jal »

Just your lame bump, since I know there are some people out here that could possibly have some answers for me.


JAL
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Fully asynchronous OS design

Post by Brendan »

Hi,

Random notes, all "IMHO" and all not guaranteed to be correct or sane...
jal wrote:2) No explicit threads. When passing a message to another object, a new thread is started, dealing with that message.
How expensive is sending a message to an existing thread, and how much overhead would spawning an entirely new thread add to this?

How do you maintain state? For example, if threadA sends "set mode FOO" to some sort of server process and threadB sends "set mode BAR" to the same server process, then how would the server process know which mode subsequent messages from each thread are meant to be using?
jal wrote:3) Fully asynchronous message passing, when sending a message, the sender continous directly. The answer to this message, if any, is sent to a method of the calling object (automatically creating another thread).
The only good reason to use asynchronous messaging is to allow a single thread to control many things at the same time. If the reply message/s spawn an entirely new thread, then you've lost the main reason to use asynchronous messaging and you're left with no reason for asynchronous messaging at all.

For example, consider opening a file. If the "asynchronous fopen()" reply spawns a new thread then it'll be very similar (similar overhead and characteristics) to spawning a new thread and then doing a synchronous "fopen()" with the new thread. The only difference is the new thread is spawned before the "synchronous fopen()" instead of after the "asynchronous fopen()".
jal wrote:4) The only non-message 'call' that can be made is the 'send message' (and perhaps 'reply message') call to the kernel. All other communication between objects (including kernel objects) is done via message passing.
That always sounds nice in theory, but in practice it's a massive performance black hole that's continually sucking CPU cycles for "No Good Reason(tm)"... ;)

From your original message ("I know it's gonna be slow as hell", what you said about synchronisation primitives, etc) it seems obvious that you already know the design has problems. As a purely research OS, what do you hope to achieve? Do you want to spend years developing this thing and then say to yourself "I was right - it's a slow piece of crud!"?...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Fully asynchronous OS design

Post by jal »

Brendan wrote:
jal wrote:2) No explicit threads. When passing a message to another object, a new thread is started, dealing with that message.
How expensive is sending a message to an existing thread, and how much overhead would spawning an entirely new thread add to this?
I'm not looking at it for being expensive or not, but solely as an abstraction. A thread is started as the result of receiving a message, when sending a message to an object, a thread is created so the object can handle the message. When finished handling, the thread is killed again. You cannot send a message to a thread, only to an object.
How do you maintain state? For example, if threadA sends "set mode FOO" to some sort of server process and threadB sends "set mode BAR" to the same server process, then how would the server process know which mode subsequent messages from each thread are meant to be using?
I don't see any problem here that would not also exist when having a single thread accepting messages sequentially. I would expect any server proces to have a state or environment per requesting object (or perhaps per proces).
jal wrote:3) Fully asynchronous message passing, when sending a message, the sender continous directly. The answer to this message, if any, is sent to a method of the calling object (automatically creating another thread).
The only good reason to use asynchronous messaging is to allow a single thread to control many things at the same time. If the reply message/s spawn an entirely new thread, then you've lost the main reason to use asynchronous messaging and you're left with no reason for asynchronous messaging at all.
I'm not sure I agree. Say I want to query a remote database, or several different remote databases, with independent queries, and assemble something after that. Then I can asynchronously do all the queries, then wait until I got all the answers. Whether waiting for the answers involves a WaitForMultipleObjects (or whatever Windows calls it) or waiting in a different way (some semaphore set by the reply-receiving threads), that doesn't matter.
For example, consider opening a file. If the "asynchronous fopen()" reply spawns a new thread then it'll be very similar (similar overhead and characteristics) to spawning a new thread and then doing a synchronous "fopen()" with the new thread. The only difference is the new thread is spawned before the "synchronous fopen()" instead of after the "asynchronous fopen()".
Yes, but as I said, I cannot spawn a thread directly. Threads are spawned as a by-result of sending a message. If I need a worker thread, I send a message to myself.
jal wrote:4) The only non-message 'call' that can be made is the 'send message' (and perhaps 'reply message') call to the kernel. All other communication between objects (including kernel objects) is done via message passing.
That always sounds nice in theory, but in practice it's a massive performance black hole that's continually sucking CPU cycles for "No Good Reason(tm)"... ;)
I am aware that it sucks CPU cycles that aren't consumed in a more direct 'kernel calling' approach. But I'm talking about a micro/nano kernel anyway, so there's little functionality in the kernel itself. And the Good Reason(tm) is the cleanness of design.
From your original message ("I know it's gonna be slow as hell", what you said about synchronisation primitives, etc) it seems obvious that you already know the design has problems.
Only from a performance point of view.
As a purely research OS, what do you hope to achieve? Do you want to spend years developing this thing and then say to yourself "I was right - it's a slow piece of crud!"?...
I don't care about speed, really. Responsiveness is more important, and I think that's easily achievable (look at e.g. BeOS with it's massive threading, although approached in a different way). What I hope to achieve is an OS that has a design (regardless of implementation details) that conforms as much as possible to what I (and my partner in crime) consider elagant. That's not to say that what I described above is the final design, but just something that we are currently leaning towards.


JAL
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Fully asynchronous OS design

Post by Brendan »

Hi,
jal wrote:
jal wrote:3) Fully asynchronous message passing, when sending a message, the sender continous directly. The answer to this message, if any, is sent to a method of the calling object (automatically creating another thread).
The only good reason to use asynchronous messaging is to allow a single thread to control many things at the same time. If the reply message/s spawn an entirely new thread, then you've lost the main reason to use asynchronous messaging and you're left with no reason for asynchronous messaging at all.
I'm not sure I agree. Say I want to query a remote database, or several different remote databases, with independent queries, and assemble something after that. Then I can asynchronously do all the queries, then wait until I got all the answers.
You can do something very similar with synchronous messaging, by spawning many threads that each send one synchronous request and receive one synchronous reply. It's similar in that you end up with one thread per reply in both cases, and the original thread continues directly in both cases. Even though the synchronous messaging is entirely different to what you're proposing, it has extremely similar behaviour in practice.

Also (for your system and for the alternative synchronous messaging system) usually you can't "assemble" these replies without some sort of lock (to ensure the data being assembled isn't corrupted instead). This implies that only one of the many threads can really do much useful work, because only one of the many threads can have the lock at any given time. For e.g. imagine 100 threads where 99 threads are blocked on a semaphore, where the scheduler needs to do 99 thread switches (and the lock needs to be acquired and freed 100 times) to get everything done. In this case, using a single thread would be just as "un-scalable" (but much more efficient due to less thread switches, etc).

Worse, you haven't considered temporal problems yet (nothing guarantees that messages received are handled in order). For example, if your application is recieving keypresses via. messaging, the user might type "abcdefg" and the buffer might end up containing "dabecgf" instead because the order of the keypresses depends on the order that the new threads acquire the keypress buffer's lock, and doesn't depend on the order that the "keypress messages" were sent.
jal wrote:That's not to say that what I described above is the final design, but just something that we are currently leaning towards.
Before you start calling it "clean design" and "elegant", show me some pseudo-code that copies 4 files (e.g. "foo1.txt" copied to "bar1.txt","foo2.txt" copied to "bar2.txt", etc). This pseudo-code should handle files of any size, and should use your asynchronous "spawn lots of threads for no reason" messaging for file I/O instead of standard C functions like "fopen()", "fclose()", "fread()", "fwrite()", "fseek()", etc.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Re: Fully asynchronous OS design

Post by jal »

Brendan wrote:You can do something very similar with synchronous messaging, by spawning many threads that each send one synchronous request and receive one synchronous reply. It's similar in that you end up with one thread per reply in both cases, and the original thread continues directly in both cases. Even though the synchronous messaging is entirely different to what you're proposing, it has extremely similar behaviour in practice.
I agree, but that is basically what all current OSes do. Also, you'd need to manually spawn the threads. I'd like to avoid that.
Also (for your system and for the alternative synchronous messaging system) usually you can't "assemble" these replies without some sort of lock (to ensure the data being assembled isn't corrupted instead).
I'm sure there's a way around it, and even in cases where there isn't, usually the waiting for a reply takes far more time then the actual assembling. So if all threads wait in parallel for their respective replies, when they start coming in, it's really not that much of a problem to have to wait to assemble.
Worse, you haven't considered temporal problems yet (nothing guarantees that messages received are handled in order). For example, if your application is recieving keypresses via. messaging, the user might type "abcdefg" and the buffer might end up containing "dabecgf" instead because the order of the keypresses depends on the order that the new threads acquire the keypress buffer's lock, and doesn't depend on the order that the "keypress messages" were sent.
Indeed. I hadn't considered it yet (though no doubt it would've come up, we're just in the very early stages of designing), and it needs further study. Of the top of my head I'd say that 1) the keyboard is handled by a driver, which is synchronous at least for its ISR, 2) the driver can thus cache the keystrokes in synchronous order and 3) the user program can therefore receive them in synchronous order. This does mean that the driver doesn't blindly fires any keystroke at the user program: the keystroke must be handled before a new one can be send.
Before you start calling it "clean design" and "elegant", show me some pseudo-code that copies 4 files (e.g. "foo1.txt" copied to "bar1.txt","foo2.txt" copied to "bar2.txt", etc).
I said we're aspiring a clean and elegant design, not that what we have come up with in this very early stage of design is already clean and elagant. Your reply is exactly the kind we hoped for, to point out potential weaknesses and see if we can take care of this without abandoning cleanness and elegantness. If we cannot, we have to design it another way.

You see, the reason for making an OS is not that we want to make 'just another OS' - I have made a kind of primitive Windows 3.1 on top of DOS in my days (before I got a full-time job, got married, got kids, etc.) writing a scheduler, device drivers for keyboard, mouse, VGA, etc., and the guy I'm designing this OS with has actually already written an OS before, for his graduation thesis from university (together with his cousin). So for us, it's not the sport of getting that first 'Hello World' kernel, but to have something running that's not even close to mainstream (for example, his thesis OS was rediculously object oriented, with objects having code and data streams, interfaces that could be inherited, no objects with only data, fully transparent remote method invocation etc. - highly impracticle for a usable OS, but very nice as a research one).


JAL
mrvn
Member
Member
Posts: 43
Joined: Tue Mar 11, 2008 6:56 am

Post by mrvn »

Hi,

I have similar ideas for my own kernel but I don't think fully asynchronous message passing is a good thing for everything.

The obvious problem will be memory allocation. To allocate memory you send a message. Wait, where do you put that message? Lets allocate some memory for it first.

Ok, simple messages can be passed in registers. So the sender doesn't need any memory.
But the reciever (or the kernel for the reciever) does need some memory for the task/thread structure. Again you deadlock in a vicious cycle.

Other problems have been mentioned here too, like loosing state between the sending of a message and recieving of the reply.


So my design is somewhat different. First I have some special messages that are always synchronous. First among them is memory management. Basically everything the kernel has to do is special. Only the kernel has access to the hadware and can create threads, allocate physical memory, handle IRQs and such. So malloc(), free(), fork(), send(), recieve(), create_message_box() are always synchronous.

For other things the type of message box used decides what happens. So far I have 2 types planed: queued_message_box and thread_message_box. A queued message box will just buffer incomming messages which the process can then recieve(). A process can use such a message box to send a synchronous message, which means an implicit recieve() for the reply.
A thread_message_box on the other hand will start a new thread for every message it recieves and give that message as argument to the thread. Maybe I will throw in a combination of the two. Buffer the messages up to a limit and then start another thread.

All system services like filesystem or networking I plan to implement with thread_message_boxes. So opening a file sends an 'open(file)' message to the filesystem service, a new thread start, validates the open request, opens the file and replies with a file descriptor. The thread could die then and let a new thread handle subsequent read/write requests. But I think I will keep the thread a live and have the file descriptor carry a private message box to that thread. That way the filesystem can contain the state of the open file in the thread.


As for speed I don't see thread creation as a big problem. A thread would share the address space of the recieving application so you don't have to create new page tabels and such for it. You do need a stack for it so some memory allocation is involved. It might be prudent to allocate stack in bulk and recycle stacks from dying threads. Create an array of pointers of maybe size 500 (some header + 500 64bit pointer == 1 4k page). If the array is empty allocate new stacks up to half the array size, otherwise use a stack from there to create a thread. When a thread dies add the stack to the array. If it is full free half the stacks in there.

Actualy don't do this for stacks but for the full thread structure. So in most cases you will just have to zero out the structure, copy the message into it and start the thread. Quick and easy.

And instead of an array use a SLAB/SLUB/SLOB you might already have to manage a set of equal sized structures.

MfG
Goswin
Life - Don't talk to me about LIFE!
So long and thanks for all the fish.
Post Reply