Msg Passing Architecture, Suggestions Welcome

elfenix · Post by **elfenix** » Thu Aug 07, 2008 5:40 pm

So, this is the current client-side API and notes from my microkernel. I have a working prototype, but am in the process of making it 'solid', well at least solid enough to code against. I suspect I'll rewrite again after. It's very similar to other archs, but with my current pet-feature - kernel side closures.

Code: Select all

error_t service_open(Connection *id, const char *URL);

Open a service with the given URL. This is handled in a near identical way to a standard open command, the URL is translated to a device node, which has an associated service registered. Return code stored in id, error, if any, is returned. All these functions return error code and save return result in pointer params.

Code: Select all

error_t service_msg_lookup(MessageType *msg, Namespace namespace, const char *method_string);

Lookup a non predefined message with an associated namespace. Services can register namespaces, or register a name in the global namespace.

Code: Select all

error_t service_msg_send(Message *msgid, Connection fd, MessageType msg, ...);

Send a message to the service associated with a given connection, store a 'msg status indicator' in msgid. The message type (kindof a method) is specified by msg. Arguments are according to the definition of the message type. This may return an error condition if the queue is plugged, but it will never block.

Code: Select all

error_t service_msg_wait(Message *msgid, Connection fd);

Wait for a given service to mark any message on its queue as complete. Stores top most ready message id in msgid.

Code: Select all

error_t service_msg_wait_sel(Connection *cid, Message *msgid, Connection fd[]);

Wait on a collection of connection descriptors for any to return a response. Stores the active connection and topmost msgid.

Code: Select all

error_t service_msg_wait_specific(Connection fd, Message msgid);

Wait for a specific message to get a response. Other messages may be received.

Code: Select all

error_t service_msg_perform_closure(Message *msg, Connection fd);

Run any ready closures without risking prolonged wait from incoming messages.

Code: Select all

error_t service_msg_resp(Connection fd, Message msg, ...);

Obtain any return results from a message without clearing. Arguments depend upon the message type.

Code: Select all

error_t service_msg_resp_clear(Connection fd, Message msgid, ...);

Obtain any return results from a message and clear the message immediately after. Arguments depend upon the message type.

Code: Select all

error_t service_msg_clear(Connection fd, Message msgid);

Clear a received message response without grabbing return data. Clearing a non-complete message results in it never being sent.

Usage notes:

Every service runs in user space.
Connections have associated send and receive queues for the client.

Before sending a message, the client must obtain a selector for the desired op.
In the global namespace, predefined selectors exist for most operations (read, write, sync, etc...)
Selectors are reusable for the lifetime of a service connection, or for global selectors, always.

Messages are placed on the send queue.
When a service handles a message, it places that message and any return on the receive queue.
Messages may be of a variable, but strictly limited, size.
Messages involving large amounts of data may use "message blocks", a built in shared memory type.

Services may define kernel side closures.
The closure is defined by giving the kernel a block of (currently) LISP code, and data types/bound variable info.
The LISP code is executed in kernel space, in context of the process sending the message, when a wait state is allowed (called to service_msg_wait_*)
Closures can not enter wait states.
Clients may request any active (non-waited) closure to run, by using service_msg_perform_closure().

This design specifically allows clients to send multiple messages without need to worry about blocking. However, the send and receive queue sizes are limited, so a client must continue clearing responses, or risk being blocked from sending new messages or receiving responses. A client will not be allowed to send a message if the maximum response size for that message is not available in the receive queue (the message will stay in the send queue but not be visible to the service).

There are no guarantees on ordering of message responses - strictly ordered messages would have to be done by waiting for each response between message send commands.

Example interaction, this code has some issues in it's queue management (makes invalid assumptions, no error checking, etc...)

Code: Select all

Connection fd;
MessageType setxy;
Message setxy_msg, write_msg;
char *buffer = "Hello World';
size_t buf_size = strlen(buffer);
int bytes_written;

/* setxy is a closure, execute immediately */
service_open(&fd, "^tty://console");
service_msg_lookup( &setxy, service_namespace(fd), "setxy");
service_msg_send( &setxy_msg, fd, setxy, 15, 15 );
service_msg_perform_closure (&setxy_msg, fd); 
service_msg_clear (setxy_msg);

/* Perform blocking write */
service_msg_send( &write_msg, fd, MSG_WRITE, buffer, buf_size); 
service_wait_specific (fd, write_msg);
service_msg_resp (fd, write_msg, &bytes_written); 
service_msg_clear (fd, write_msg);

dr_evil · Post by **dr_evil** » Mon Aug 11, 2008 1:26 am

Queued IPC designs have proven to be slow. Compare QNX (or even L4) and Mach for example. Asynchronous IPC needs lots of memory and copying. And it isn't useful most of the time because all of us are used to do sequential programming. My advice: Do syncronous, non-queued IPC.

"Messages may be of a variable, but strictly limited, size."

Uhm, huh? Is the size limited or not? A variable limit per queue?

Why "kernel side closures"?

I believe the performance of this design is not too good. Because for a call you usually (or the RTL) do a service_open then a service_msg_lookup then a service_msg_send and a service_msg_clear at the end. That's quite expensive...

elfenix · Post by **elfenix** » Mon Aug 11, 2008 4:46 pm

The philosophy of design here is to improve utilization of parallel computing in GUI, network, or some desktop apps. The research in Mach showed that the majority of cost in overhead with a messaging system is in extra context switching between apps. My operating theory is that in a parallel environment, message passing becomes more effective because context switches can be reduced. Further, "sequential design" leads to often simply limiting processes to a single core (and then inflicts locking issues of internal kernel structures across cores) - the goal here is to utilize 4 or 8 cores at a time and show that asynchronous designs can completely outclass in this scenario.

So, the design here is to allow you to send a message to an app running on another processor, continue performing whatever task you desire, and then respond to the message as needed. Each message is of variable packed size, up to a defined maximum (currently 256 bytes). The max message size is defined in such a way as to give me the ability to have knowledge as to worse-case latency for the time it takes to perform a message pack. Obviously, on a uniprocessor computer, this will all be simulated using context switching...

In developing the linux kernel, I found that most system calls fell into one of two categories: 1) those that forced wait states and context switches and 2) those that performed data manipulation for a task and returned immediately. Of interest to me was that the 2nd category includes a great many things like "what is the pid of this process?" or grabbing a piece of information, modifying an existing portion of data, etc... These items were strictly data manipulation work that could be done via some closure mechanism. L4 capitalized on this knowledge via concepts of things such as registers and better shared memory implementation.

With kernel closures, a given module can give the kernel an object layout, and operations to perform on those objects. The kernel takes the operation (currently LISP code), compiles it to a native machine code (JIT), and then allows operations on these pieces of in-kernel memory by this closure. All operations are atomic on individual kernel objects, so they are always in a known state. The allowance here is that a great many 'data ops' can be done near synchronously, with only the overhead of an additional system call, instead of 2 full fledged context switches.

"service_open" must only happen once during the entire communication to the service, and is near identical to an "open" Unix command. (There is also a service_close, similar to the unix "close" command.)

"service_msg_lookup" is only require if the service implements it's own namespace. That said, I do agree about the overhead there, and think I might need to figure out some way of preloading a large number of requested messages (reduce this to pure startup time), or similar.

Which means, in general, you'll only need to do a send and resp command. I agree and think clear was overkill, and am eliminating it from the design now.

I'm also considering placing the kernel side closures at the start of the send command, such that they can piggy back extra messages...

dr_evil · Post by **dr_evil** » Tue Aug 12, 2008 10:24 am

Do you think closures will give you performance? Are those "data-calls" the performance limiting code-paths? Which developer will ever embed LISP code in his application? I guess there are only two or three specialized cases where this will be used. Why not support those specialized cases directly with a kernel function?

And most of those calls can be implemented completly in user-space anyway, yielding even better performance. gettimeofday() would be such an example...

Parallel programming is very hard. That's why most apps and programmers stick with one thread. (On the other side: In order to use many-core computers you need many threads anyway. Why not make threads light-weight and send messages synchronously?). I started with an async-IPC design too, for the very same reasons you described. But after a while I found out that much IPC is done in the run time library. None of the RTL functions did benefit from the async design. Can you come up with a practical, regular scenario where async is useful?

QNX 6 does both things. Sync IPC for "normal" messages and queued, async IPC for what they call "pulses". Pulses are always 4 bytes in size (or something similar) and delivery is not guaranteed. They are used mostly for the implementation of GUI event notifications.

mystran · Post by **mystran** » Wed Oct 08, 2008 1:19 am

This is somewhat old topic, but since I'm reading the forum only once in a while now, I'll reply late (rather than not at all).

The ultra-simple messaging system in good-old L4 proved two things: synchronous IPC can be really fast if you look at the cost of a single messaging pass, but also that building a system on top of such primitives will force huge number of IPC calls, and what's worse, will force you to do any IPC multiplexing via multiple userspace threads.

In other words, L4 aimed at ultimate IPC performance, without concern to the big picture, and failed in system performance and simplicity, by moving the complexity out of the kernel.

I'm a supporter of queued asynchronous IPC after trying synchronous IPC in practice. An asynchronous system means you can continue to do useful work while you have requests pending (say, you are waiting for stuff from disk), multiplexing is trivial even in single-threaded case (rather nice thing, if you ever want to service multiple clients like in a webserver or whatever; there's a reason why UNIX has select() and non-blocking sockets), and a good event-driven architecture is possibly the easiest architecture to multi-thread: as long as you can divide the events into mutually independent domains (which can be designed for), you can process them concurrently with a thread-pool matching your hardware CPU count.

Yes, the queued design does complicate IPC design in the local sense of making individual IPC calls more costly, and increasing kernel complexity somewhat. But it also can greatly simplify the userspace logic, as long as you get away from the sequential UNIX-style programming, and do event-based (you could say Windows-style) programming instead. Not only is event-based programming relatively simple once you learn it properly, it also yields itself well into using theoretical tools like state machines and the like. Granted, it's not as easy to "hack together adhoc stuff" as you have to think first, but...

OSDev.org

Msg Passing Architecture, Suggestions Welcome

Msg Passing Architecture, Suggestions Welcome

Re: Msg Passing Architecture, Suggestions Welcome

Re: Msg Passing Architecture, Suggestions Welcome

Re: Msg Passing Architecture, Suggestions Welcome

Re: Msg Passing Architecture, Suggestions Welcome