Is asynchronous IPC preventing us from using parallelism?

OSwhatever · Post by **OSwhatever** » Sat Jun 02, 2012 3:57 am

Let's take an example first, we want to read from two files at the same time assuming the HW and underlying SW can handle it. In the case you have synchronous IPC you create two threads, open files, read them using synchronous IPC and the exit the thread when it has been done. In the case we have asynchronous IPS we have the possibility to do this in one thread, we can open and start read requests in the same thread. Here we must a have some kind of message loop and also we must have some mechanism to distinguish which message belongs to which file. Of course with asynchronous IPC starting two threads is possible as well however asynchronous IPC often do mind tricks with us so that we get trap in the serialization way of thinking not exploiting the possibility of starting new threads. With only synchronous IPC we would have to start new threads.

Another example, we send data to some kind of interface, it is slow like RS232 for example. Here we have use of the benefits of asynchronous IPC, serialization and non blocking send. However, since the interface is slow we can send much faster than the interface can send on its end. Now we have two possibilities, to discard data or stall until the interface is ready to accept more data. I think most of the time we don't want to lose any data so we must wait. In practice on the interface driver end, you probably have implemented this with two threads if we only have synchronous IPC. One sending thread and one thread accepting data which puts the data in a buffer. Using two threads gives us almost non blocking send (still we have task switch but the blocking is short). When the buffer is full, the buffer thread simply don't go into receive state preventing the rendevouz. Now we have implemented flow control in simple way. This is of course possible with asynchronous IPC as well, either by waiting for a reply from the interface thread or having some kind of blocking mechanism in the kernel when the message pool is full. Blocking when the message pool is full is much less trivial than the rendevouz block.

So in practice asynchronous IPC doesn't help us in that much in real scenarios and might rather discourage us from using the parallelism provided by modern CPUs. You often fall into the trap of serialization using asynchronous IPC. Also asynchronous IPC require more involvement from the kernel when in comes to message pool management and states it must support. The benefit of asynchronous IPC is less task switching but as you could see with the second example, we often don't get rid of it anyway.

Having asynchronous IPC adds to the complexity a lot and also we have to support more different scenarios that can happen with asynchronous IPC, the complexity just increases for very little benefit. My kernel supports both asynchronous and synchronous IPC but I'm starting to wonder if supporting asynchronous IPC is really worth the effort. It requires much more code and involvement from the kernel and adding security to it just makes even more complex. Mach kernel is a good example of this where the IPC was slow and complex and not received very well. People want simplicity. This is perhaps deja vu for many of you and it also reflects the development from Mach to L4 kernels which only supports synchronous IPC. What do you think (question is in the title)?

Combuster · Post by **Combuster** » Sat Jun 02, 2012 4:28 am

The error you are making here is that asynchronous IPC doesn't require you to wait for a reply. Try plotting pixels on a graphics device: you do not want to switch tasks for each individual primitive drawn because the receiver needs to acknowledge them.

You are also mixing up asynchronous IPC with threading: you don't need sender or receiver threads for performance and can handle most of your work from one event loop, which means less chance of code errors caused by improper thread synchronisation. It is much easier to synchronize asynchronous messaging, than to make synchronous messaging asynchronous - the latter can only be done via kludgy solutions that eat resources from elsewhere.

turdus · Post by **turdus** » Sat Jun 02, 2012 6:44 am

OSwhatever wrote:Now we have two possibilities, to discard data or stall until the interface is ready to accept more data. I think most of the time we don't want to lose any data so we must wait.

There are other possibilities as well: most microkernel use a kind of message queue. This gave you the freedom of passing the message to the destination thread and move on, and only block if the queue is full (happens not so often). Queues solve the problem of serialization quite well. Also, queues can be implemented lock-free reasonably easily (which outperforms any other lock based implementation).

Colonel Kernel · Post by **Colonel Kernel** » Sat Jun 02, 2012 8:11 am

OSwhatever wrote:Let's take an example first, we want to read from two files at the same time assuming the HW and underlying SW can handle it. In the case you have synchronous IPC you create two threads, open files, read them using synchronous IPC and the exit the thread when it has been done. In the case we have asynchronous IPS we have the possibility to do this in one thread, we can open and start read requests in the same thread. Here we must a have some kind of message loop and also we must have some mechanism to distinguish which message belongs to which file. Of course with asynchronous IPC starting two threads is possible as well however asynchronous IPC often do mind tricks with us so that we get trap in the serialization way of thinking not exploiting the possibility of starting new threads. With only synchronous IPC we would have to start new threads.

This example is conflating two separate ideas: Asynchronous IPC, and asynchronous I/O in general. In the case of reading from two files, you are not actually getting less parallelism from having only one thread, as long as you can issue the I/O read requests as fast as possible. In a server scenario, using synchronous/blocking I/O like in this example will not scale since every blocked thread is wasting a lot of memory (there is a whole stack just sitting there doing nothing until the I/O completes). Imagine having to handle tens or hundreds of thousands of requests per second... that's a lot of thread stacks wasting space. Async/non-blocking I/O is way more scalable in this case.

Of course, you can implement async I/O in your OS using either synchronous or asynchronous IPC, so they are mostly separate issues, but at some point the same scalability principles are going to apply.

OSwhatever · Post by **OSwhatever** » Sat Jun 02, 2012 1:41 pm

turdus wrote:
OSwhatever wrote:Now we have two possibilities, to discard data or stall until the interface is ready to accept more data. I think most of the time we don't want to lose any data so we must wait.
There are other possibilities as well: most microkernel use a kind of message queue. This gave you the freedom of passing the message to the destination thread and move on, and only block if the queue is full (happens not so often). Queues solve the problem of serialization quite well. Also, queues can be implemented lock-free reasonably easily (which outperforms any other lock based implementation).

The problem with asynchronous message queues is that if you are going mix messages that is coming in like any asynchronous system is that you have to have way to distinguish all message. Possibilities are channels, message queues all associated with channels (also called ports), these has to be instantiated for every client each time, a lot of channels. Also, then you can only receive from one channel at one time, so you lose the concurrency now. Other ways are global message id but this is mostly appropriate for embedded systems where you can at compile time assign message id for the entire system.

One message queue becomes difficult because you have no idea what comes in and can easily confuse the program. Synchronous IPC solves this as you have a determined from where you are going to receive the message from the beginning, other thread must wait.

QNX is based on synchronous messages, L4 and successors use synchronous messages discarding the asynchronous messages in Mach kernel. Why did they do this?

Brendan · Post by **Brendan** » Sat Jun 02, 2012 10:37 pm

Hi,

OSwhatever wrote:The problem with asynchronous message queues is that if you are going mix messages that is coming in like any asynchronous system is that you have to have way to distinguish all message.

Yes, normally you'd have something to identify the message. Normally I have a "sender ID" (set by the kernel and guaranteed to be "unforgeable" so that it can be used for security purposes), a "message type", a "message length" and the message data.

Software tends to have one (or more) message handling loops, which call "getMessage()" and then have a "switch(messageType)". Often there's a state machine involved too, where different cases in the "switch()" change the current state. For example:

Code: Select all

    int state = 0;
    do {
        getMessage();
        switch(message->type) {
        case FOO:
            handle_foo();
            state = 1;
            break;
        case BAR:
            handle_bar();
            state = ;2
            break;
        case TERMINATE:
            exit();
            break;
        default:
            /* Unknown or unexpected message type */
        }
    } while(state != 2);

OSwhatever wrote:Possibilities are channels, message queues all associated with channels (also called ports), these has to be instantiated for every client each time, a lot of channels. Also, then you can only receive from one channel at one time, so you lose the concurrency now. Other ways are global message id but this is mostly appropriate for embedded systems where you can at compile time assign message id for the entire system.

I have one message queue per thread, where threads just send messages to other threads (using a "thread ID"). A thread can only handle messages one at a time, but that thread can send an unlimited number of "requests" and be waiting for an unlimited number of "replies" and be managing an unlimited amount of work. For an example, here's pseudo-code for loading 1000 files at the same time:

Code: Select all

int fileStates[1000];
int filesCompleted = 0;

int initialise(void) {
    int handle;

    // Send requests to open 1000 files

    for(handle = 0; handle < 1000; handle++) {
        sendOpenRequest(handle);
        fileStates[handle] = WAITING_FOR_OPEN_REPLY;
    }

    // Message handling loop

    do {
        getMessage(messageBuffer);
        switch(messageBuffer->messageType) {
        case OPEN_REPLY:
            status = handleOpenReply(handle);
            if(status == OK) {
                sendReadRequest(handle);
            } else {
                fileStates[handle] = FAILED_TO_OPEN;
                filesCompleted++;
            }
            break;
        case READ_REPLY:
            status = handleReadReply(handle);
            if(status == OK) {
                sendReadRequest(handle);   // Keep reading until EOF
            } else if(status == EOF) {
                fileStates[handle] = WAITING_FOR_CLOSE_REPLY;
                sendCloseRequest(handle);
            } else {
                fileStates[handle] = FAILED_TO_READ;
                filesCompleted++;
            }
            break;
        case CLOSE_REPLY:
            status = handleCloseReply(handle);
            if(status == OK) {
                fileStates[handle] = COMPLETED_OK;
                filesCompleted++;
            } else {
                fileStates[handle] = FAILED_TO_CLOSE;
                filesCompleted++;
            }
            break;
        }
    } while(filesCompleted < 1000);

    // Check results

    status = 0;
    for(handle = 0; handle < 1000; handle++) {
        if(fileStates[handle] != COMPLETED_OK) {
            status++;
        }
    }
    if(status != 0) {
        printf("Failed to read %d files!\n". status);
        exit();
    }
}

For this example, the thread can only handle one message at a time, but it can be in the middle of loading 1000 files at the same time. If the VFS is multi-threaded and is talking to 10 different file systems spread across 4 different physical disks, then a lot of things would be happening (in different processes) in parallel. Of course if the VFS can keep up (e.g. file data is in the VFS's cache/s) then the VFS could be running on other CPUs and this thread may not block at the "getMessage()" - basically, under ideal circumstances, you might get zero thread switches.

To do the same with synchronous messaging you'd have to use 1000 threads, which wastes resources (1000 "thread control blocks", 1000 thread stacks, etc) and causes a lot of thread switching overhead in the scheduler.

Of course there's no reason that (for asynchronous messaging) a process can't have several threads doing this. For example, on a system with 8 CPUs you might have 4 threads loading 250 files each (rather than 1 thread loading 1000 files). In this case the VFS (and file systems, disk drivers, etc) might be running on the other 4 CPUs; and under ideal circumstances you might still get zero thread switches.

To do the same with synchronous messaging you'd have to use 1000 threads that are constantly blocking waiting for replies. If each thread does "OPEN" then blocks (thread switch to VFS then thread switch back), then does one "READ" then blocks (thread switch to VFS then thread switch back), then does "CLOSE" then blocks (thread switch to VFS then thread switch back); then under ideal circumstances you're looking at a 6 thread switches per thread/file, or a total of 6000 thread switches.

For asynchronous; if there is only 1 CPU (VFS can't be running on a different CPU so "zero thread switches" isn't possible), the thread could send 1000 "OPEN" requests, then you can do a thread switch and the VFS can send 1000 replies, then you can do a thread switch back and handle those 1000 replies, etc. In this case, under ideal circumstances, you might still only need 6 thread switches to load 1000 files.

OSwhatever wrote:QNX is based on synchronous messages, L4 and successors use synchronous messages discarding the asynchronous messages in Mach kernel. Why did they do this?

For synchronous messaging you can create wrappers that make it all look like a normal function calls; which is easier for programmers to deal with (you don't need the "message handling loop with state machine" stuff which can make it harder to see program flow) and this makes it a lot easier to support legacy crud (like the standard C/POSIX library). Mach was slow (mostly due to poor design), and people tested it on single-CPU systems (where the scalability advantages of asynchronous messaging got ignored), and they were trying to do the "legacy crud" thing (e.g. applications using the standard C/POSIX library rather than applications that take advantage of asynchronous messaging).

Cheers,

Brendan

OSwhatever · Post by **OSwhatever** » Sun Jun 03, 2012 2:20 pm

Based on you example Brendan, handleOpenReply and those function usually send messages themselves and have to wait for a reply. In order to do so with asynchronous messages, first you have to cherry pick the correct response message for your action in the queue, for example openFile to some filesystem because while you're waiting for the openFile response while other requests will be raining in. So now you have lost the concurrency because you have to wait for the response. You easily end up with send and wait for reply, even without being aware of it. In order to pick the correct reply message you have to use a specific channel or filter based message content which is another complexity to deal with.

Easiest way to solve this is of course by synchronously wait for the reply. To asynchronously handle the replies as well, the message loop has to be expanded, you have to store the requests so that you pair the reply message with each requests in order to know what to do with it. The programming convenience with asynchronous messages is easily disturbed and and gets complicated quickly.

Could we have middle way solution to this? Using synchronous messages but only having a bunch of threads waiting , let's say 8 instead of 1000 in your example. When 8 is reached the we have to stall the new requester threads in order to wait for a thread to finish. This is not maximum concurrency but a middle way of programming friendliness and concurrency.

Brendan · Post by **Brendan** » Mon Jun 04, 2012 1:47 am

Hi,

OSwhatever wrote:Based on you example Brendan, handleOpenReply and those function usually send messages themselves and have to wait for a reply.

No - only the functions "sendOpenRequest()", "sendReadRequest()" and "sendCloseRequest()" send messages; and none of these wait for a reply (they only send). The "handleOpenReply()" function doesn't send messages or receive messages (it only checks to see the file was opened or if there was an error). The "getMessage();" function at the start of the message handling loop is the only place where you wait for any messages.

Also note that messages may be received in any order. You might send a "READ_REQUEST" message for one file, then receive an "OPEN_REPLY" message from a previous "OPEN_REQUEST" (for a different file), then receive a "MOUSE_MOVED" message from the GUI, then receive the "READ_REPLY". This is why the state machine ("int fileStates[1000];" and "int filesCompleted = 0;" in the example) is needed.

Think of it like people using email. You might be arranging a party, and send 20 emails to different people inviting them to the party, one email to a caterer to arrange food, one email to hire some chairs and one email to purchase decorations. Then, after you've sent all these emails, you might wait for the replies. The replies might arrive in any order. You handle them one by one as they arrive; until you've finished arranging the party (all guests have replied, catering has been arranged, chairs have been hired, decorations have been purchased). This is the message handling loop. If you receive a reply from the caterer asking if you want pizza or hotdogs then when you're handling that email you might send another email to the caterer saying you want pizza, and then continue handling other emails from other people - eventually the caterer might send another email saying "pizza confirmed" (but you didn't stop everything just to wait for that).

The alternative would be to send one email and wait for a reply, then send the next email and wait for a reply, etc. That would be completely stupid and take far too long (but would be easier to keep track of). That would be "synchronous".

OSwhatever wrote:In order to do so with asynchronous messages, first you have to cherry pick the correct response message for your action in the queue, for example openFile to some filesystem because while you're waiting for the openFile response while other requests will be raining in. So now you have lost the concurrency because you have to wait for the response. You easily end up with send and wait for reply, even without being aware of it. In order to pick the correct reply message you have to use a specific channel or filter based message content which is another complexity to deal with.

Easiest way to solve this is of course by synchronously wait for the reply. To asynchronously handle the replies as well, the message loop has to be expanded, you have to store the requests so that you pair the reply message with each requests in order to know what to do with it. The programming convenience with asynchronous messages is easily disturbed and and gets complicated quickly.

No. You can handle messages as they arrive, regardless of which order they arrive. There's no need for cherry picking responses, or waiting for a specific response, or storing replies and pairing them with requests.

Imagine the "loading 1000 files" example, but with extra code to setup a dialog box and handle messages from the GUI ("WINDOW_MOVED", "CANCEL_BUTTON_CLICKED", etc). Every time you get a "READ_REPLY" from the VFS you might send a message to the GUI telling it to update a progress bar. Now you've got one thread that is loading 1000 files and handling a dialog box; all at the same time. Now let's add more code - when a file is finished loading, you want to calculate a checksum for that file and send a packet over the network saying the file name and its checksum. Now you've got one thread that is loading 1000 files and handling a dialog box and calculating checksums and sending data over the network; all at the same time.

Now imagine this is a service (not an application) that accepts request from applications that contain a list of file names, a "GUI reference" and an IP address. When it receives a request like this from an application, it tells the specified GUI to display a dialog box and progress bar and keeps it up to date, and reads all the files in the supplied list, calculates checksums and sends results to the desired IP address; then sends a reply back to the application saying how everything went. Of course it'd just add it to the list of things it's currently doing (it would handle many requests from many applications at the same time, and wouldn't handle requests from applications one at a time). Now you've got a single thread that can be loading many different groups of thousands of files, handling many dialog boxes on many different GUIs and sending data over the network to many different remote computers; all at the same time.

Now let's say we want to have a window containing statistics about this thread; that shows how many groups of files it's currently working on, the current progress of each group of files, plus maybe current disk throughput and current network throughput (KiB per second), etc. The thread could ask the kernel to send it a message every 100 ms, and when it receives this message it could update its statistics window. Now you've got a single thread that can be loading many different groups of thousands of files, handling many dialog boxes on many different GUIs and sending data over the network to many different remote computers, and handling its own "statistics window"; all at the same time.

Now let's say that the kernel sends a message to our thread whenever an application terminates (saying which application terminated). When an application terminates, the thread checks if it is processing a list of files for that application, and if it is it cancels that list of files (sends "CLOSE_REQUEST" for any files that are in the middle of being read, etc). Now you've got a single thread that can be loading many different groups of thousands of files, handling many dialog boxes on many different GUIs and sending data over the network to many different remote computers, and handling its own "statistics window", and monitoring application terminations; all at the same time.

Now let's say that software can ask the VFS to send a notification if a file is changed. Each time our busy little thread calculates a checksum for a file it could store the file name and checksum in some sort of cache, and ask the VFS to let it know if/when the file changes. That way if several applications want a checksum for the same file our thread can check its cache and avoid reading the file and recalculating the checksum. Now you've got a single thread that can be maintaining it's own cache and monitoring millions of files, and loading many different groups of thousands of files, handling many dialog boxes on many different GUIs and sending data over the network to many different remote computers, and handling its own "statistics window", and monitoring application terminations; all at the same time.

Our single thread can do all of this with one message loop, where it only ever waits for received messages in one place (at the start of the message loop). All we've done is add more "case" statements to the "switch(messageType)".

Cheers,

Brendan

rdos · Post by **rdos** » Mon Jun 04, 2012 3:14 am

Brendan, your solution would make sense for things like a server, but it seems like an awful lot of overhead and extra code for an application that just opens a file, reads some bytes and closes it again. You would need a wrapper so such usage of your API is simple for people that don't like message loops for simple file operations. This wrapper also needs to run at similar speed as typical non-parallel implementations, otherwise people might think your implementation is slow (even if it is highly efficient when used with many files).

Solar · Post by **Solar** » Mon Jun 04, 2012 3:37 am

rdos wrote:Brendan, your solution would make sense for things like a server, but it seems like an awful lot of overhead and extra code for an application that just opens a file, reads some bytes and closes it again.

I would expect such an application using the fopen() / fread() / fclose() API, not an async one.

Brendan · Post by **Brendan** » Mon Jun 04, 2012 3:43 am

Hi,

rdos wrote:Brendan, your solution would make sense for things like a server, but it seems like an awful lot of overhead and extra code for an application that just opens a file, reads some bytes and closes it again. You would need a wrapper so such usage of your API is simple for people that don't like message loops for simple file operations. This wrapper also needs to run at similar speed as typical non-parallel implementations, otherwise people might think your implementation is slow (even if it is highly efficient when used with many files).

It's impossible to create wrappers around "fully asynchronous services" that don't destroy all the advantages. If you don't like message loops then I refuse to allow you to destroy the advantages, and you should go away and stink up some cruddy old *nix clone instead.

Brendan wrote:Mach was slow (mostly due to poor design), and people tested it on single-CPU systems (where the scalability advantages of asynchronous messaging got ignored), and they were trying to do the "legacy crud" thing (e.g. applications using the standard C/POSIX library rather than applications that take advantage of asynchronous messaging).

Cheers,

Brendan

gerryg400 · Post by **gerryg400** » Mon Jun 04, 2012 4:13 am

The problem with asynchronous messaging in clients like this is that it might be considered an optimisation. It's pretty clear that it can provide improvement in speed and utilisation but unless the client requires improvements in those areas it may be considered not worth the effort.

For server processes on the other hand, this type of programming has long been considered a requirement.

I think improvements in languages and tools will help popularise asynchronous messaging systems.

rdos · Post by **rdos** » Mon Jun 04, 2012 5:44 am

Brendan wrote:It's impossible to create wrappers around "fully asynchronous services" that don't destroy all the advantages. If you don't like message loops then I refuse to allow you to destroy the advantages, and you should go away and stink up some cruddy old *nix clone instead.

I'm afraid most potential users will do just that.

Combuster · Post by **Combuster** » Mon Jun 04, 2012 5:55 am

Most potential users are already using a message queue. It's the accepted standard for doing GUIs.

OSwhatever · Post by **OSwhatever** » Mon Jun 04, 2012 6:03 am

rdos wrote:I'm afraid most potential users will do just that.

What I've seen working with operating systems that use asynchronous messages as the only IPC mechanism is that people tend to create send reply RPC kind of calls. Often they are wrapped in some function call does the message send and receive calls. Often you have some kind of response message, like status of the call for example. Also the calls are often chained so one request creates another requests deeper in the layer. Problem with this kind of programming is that it doesn't scale well and performance tend to be a bottleneck.

OSDev.org

Is asynchronous IPC preventing us from using parallelism?

Is asynchronous IPC preventing us from using parallelism?

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism

Re: Is asynchronous IPC preventing us from using parallelism