OSDev.org

Posted: **Fri Nov 26, 2010 1:27 pm**

I've been looking at how operating systems seem to solve messaging IPC message allocation. You basically don't want to involve the kernel in message allocating but in that case the question is how safe it becomes.

Some operating systems have a dedicated message pool where they can be allocated. The kernel is totally responsible for allocating these chunks. Others seem to use any memory within user space like QNX for example. Now I'm not that sure if QNX supports asynchronous messages.

The usual problem with not involving the kernel in the message allocation is that a failing program can deallocate the buffer before it has been received in the other process. This might leave the message buffer unmapped which would later cause an exception in the receiver process.

The goal here is to have best security while keeping the kernel overhead as small as possible.

I've found these advantages and disadvantages with using a message pool versus not having a message pool.

Kernel allocated messages in a special pool:

+When you call send message call, the kernel knows if the memory is valid pretty easily.
+A message that have been sent cannot be deallocated by design.
+Zero copying messages by only remap pages can be done transparent.
+Setting up message memory window for copying between processes is trivial.
+”Ownership of memory” will be transferred for real meaning that a send will normally deallocate automatically on sender side and allocate on receiver side.
+Size of message doesn't need to be known since it is allocated on demand. Otherwise the receiving buffer must be large enough to receive the largest message. This unless an arbitrary allocator is provided to the run time library in user space.

-Kernel must be involved in message allocation
-Most messages are either very small or very large. Very small messages are ”double copied”. First to kernel buffer, then to target process buffer. The extra message allocation overhead in user space is therefore unnecessary. Must sender message buffer be validated before copying in its own process despite running in kernel mode?
-Messages data that is stored for a long time must be copied from the message buffer to normal user memory in order to keep the message pool free.

User space allocated messages:

+No kernel involved in message allocation. Any type of algorithm can be used.
+Direct user to user memory copying, no extra copying from message buffer needed.
+Constant memory can be directly be used.

-Message buffer must be validated, if the memory is mapped and access rights must be parsed where the message resides.
-Message might be deallocated before the message has been copied leading to the receiver buffer. What action is then supposed to be taken? Discard message and take the next one?
-Message must be able to fit in message buffer on receiver side.

With large messages you basically want to fill directly to the place in memory you want it to be without copying from message buffer. How usual do you use messages for this kind of ”memory fill” or do you just grant the pages to another process instead using a special interface similar to the map and grant API in the L4 kernel.

The basic question here, would you use a special message pool or just use user memory directly?

Posted: **Fri Nov 26, 2010 2:21 pm**

It seems that you're only considering message passing schemes where the message is copied: if you're sending messages from userspace to userspace, you can use paging to transfer large amounts of data with no copying overhead (and no validation, if only the frame addresses are copied.) Of course, this makes allocating and managing buffers harder, because of alignment issues, and requires some sort of scheme to map the messages on the receiving end.

Posted: **Fri Nov 26, 2010 2:41 pm**

NickJohnson wrote:It seems that you're only considering message passing schemes where the message is copied: if you're sending messages from userspace to userspace, you can use paging to transfer large amounts of data with no copying overhead (and no validation, if only the frame addresses are copied.) Of course, this makes allocating and managing buffers harder, because of alignment issues, and requires some sort of scheme to map the messages on the receiving end.

Not at all, when messages are big enough, remapping pages into another process address space makes sense.

+Zero copying messages by only remap pages can be done transparent.

This bullet included the benefit of having a message pool since you can basically remap the page without even the user space caring about what type method of transfer is used. However, then it still ends up in the message pool and have to copied out of the message pool into user space buffer if the memory should reside for longer times.

To only remap pages doesn't make much sense since most messages are small and then there will be a lot of waste of memory if whole pages are used for small messages. Also "double copying" small messages into kernel memory first might have a better cache usage.

Posted: **Fri Nov 26, 2010 3:00 pm**

OSwhatever wrote:I've been looking at how operating systems seem to solve messaging IPC message allocation. You basically don't want to involve the kernel in message allocating but in that case the question is how safe it becomes.

My approach, having studied kernels like QNX and Minix, is to keep it really simple, fast and elegant. Well, that's my hope anyway. In summary: asynchronous messages are queued in a pool in the kernel's memory space, synchronous messages are held in a thread's user memory space.

Asynchronous messages are small: two words supplied by the sender and then stored in a queued pool with the sender's process ID and thread ID. Threads call the non-blocking syscall msg_send_signal() to send them and call msg_recv() to empty their process's kernel-held pool of queued async messages.

Synchronous messages are arbitrary in size: a thread is blocked until its message is delivered (ie: copied from its user space into the receiver's user space) and replied to by another thread. While awaiting delivery, the message is still held in the sender's user space but a small block of two words (the sender's PID and TID) is queued in a target process's kernel-held pool. When the receiving thread next calls msg_recv the queued pool is checked and delivery occurs if the pool is non-empty.

Checks are put in place to stop receiver's buffers from being overrun and so on. The queued pool allocation, deallocation and next-in-queue-search is O(1).

FWIW, memory sharing requests and acknowledgements are also done through the IPC mechanism, allowing processes to set up their own shared memory for transferring large chunks of data without bothering the kernel with page flipping stuff.

OSwhatever wrote:The usual problem with not involving the kernel in the message allocation is that a failing program can deallocate the buffer before it has been received in the other process. This might leave the message buffer unmapped which would later cause an exception in the receiver process.

You can hint to the page fault handler that, when copying the message, a read fault is the sender's problem and a write fault is the receiver's problem, and then resolve from there.

C.

Posted: **Mon Nov 29, 2010 6:48 am**

Another IPC method is the send/reply method. The receiver will allocate some resource, and a receiver thread will loop trying to retrieve messages. A sender will send some message, and then be blocked until the receiver has processed the message and sent an answer. This kind of IPC can be used to synchronize processes as well. The message box can never become full, which makes the implementation easier. It is also possible to optimize copying of messages since the sender will always be blocked until a reply is sent. It is also possible to extend this to work over networks.

Posted: **Mon Nov 29, 2010 7:16 am**

FWIW, memory sharing requests and acknowledgements are also done through the IPC mechanism, allowing processes to set up their own shared memory for transferring large chunks of data without bothering the kernel with page flipping stuff.

I don't like page flipping either. The problem to me seems to be that a process that is sending a lot of data will leak away all its memory to a process that's receiving messages. What's the solution to that ?

Posted: **Tue Nov 30, 2010 11:16 am**

rdos wrote:Another IPC method is the send/reply method. The receiver will allocate some resource, and a receiver thread will loop trying to retrieve messages. A sender will send some message, and then be blocked until the receiver has processed the message and sent an answer. This kind of IPC can be used to synchronize processes as well. The message box can never become full, which makes the implementation easier. It is also possible to optimize copying of messages since the sender will always be blocked until a reply is sent. It is also possible to extend this to work over networks.

My requirement is asynchronous messages. Synchronous messages can be done as well but can be emulated by asynchronous messages or even a separate API. A synchronous interface can be done with belonging optimizations but that's something that is lower on the priority list right now. However, blocking send/reply methods are in general far more common than asynchronous messages so that's something worth looking into.

The solution I'm leaning to now is having a per process message pool that is allocated by the kernel. The sender must always use the kernel allocated messages. This is a compromize that assumes that sending buffers are often temporary memory allocations in time. However, the receiver can either choose to receive in its own message pool or directly to a user space buffer. This can easily be supported since the actually copying to the receiver is done in while being in the receiver context. If you get a page fault you just kill the process. It is more difficult to kill another process that is currently not running. An exception often suggest that you either fix the problem and come back and try to re-execute the failing instruction or not at all. If the fault is in another process and you're going to continue in the receiver process, you're required to adjust the return address to something appropriate. It can be done, but it is a bit tricky and differs from case to case what you're supposed to do. In this case if a copy fails you have to "jump over the copying" and return to the process which requires some careful assembly code, unwinding the stack and so on.

My solution so far is:
Kernel allocated messages on sender side. Sender side messages are always "safe".
Receiver can choose to receive in message pool or user buffer. If message larger than user space buffer, the message will be discarded.
Message transfer over process boundaries is always done in receiver context.

The problem I see with solution can be that the sending process can be become full if other processes locks up and never frees the memory for the sending processes. This can lead to a system wide stale mate. Due to this, copying and freeing sender's message buffer on receiver side might be a worse option.

Posted: **Tue Nov 30, 2010 12:51 pm**

OSwhatever wrote:My requirement is asynchronous messages. Synchronous messages can be done as well but can be emulated by asynchronous messages or even a separate API.

I think the latter is best. Synchronous messaging has some specific optimizations that cannot be well implemented on top of asynchronous messaging. I have no asynchronous messaging. I instead have a memmap interface that can be used for between process sharing of information, that also can be complemented with synchronization primitives.

In kernel, I don't use complex messaging at all. Everything in kernel is implemented with critical sections and signals. A signal is a simple message from one thread to another. The state of a thread can either be signalled or non-signalled, and there is one function for setting the signal (and potentially waking up the thread) that can be called from ISRs and another function that blocks a thread until it is signalled. More complex multi-wait primitives are built on top of this API. I'm sure it would be possible to build a asynchronous messaging API as well on top of it.

OSwhatever wrote:A synchronous interface can be done with belonging optimizations but that's something that is lower on the priority list right now. However, blocking send/reply methods are in general far more common than asynchronous messages so that's something worth looking into.

Yes. I've even registered a network protocol on top of IP that is called "Simple Messaging Protocol" with protocol number 0x79 that is fully implemented in RDOS for IPC over networks. I didn't want to use TCP as transport as it is not socket-oriented, and using UDP would give too much overhead. The only problem (or maybe feature) is that this protocol will not usually work on the Internet as routers mostly block it.

Posted: **Tue Nov 30, 2010 2:01 pm**

In this case if a copy fails you have to "jump over the copying" and return to the process which requires some careful assembly code, unwinding the stack and so on.

It's not really that difficult. Because the thread is blocked, it's state is well known. You can do a setjmp just before beginning the copy and longjmp back there from the exception handler if there is a fault.

Posted: **Tue Nov 30, 2010 4:29 pm**

gerryg400 wrote:The problem to me seems to be that a process that is sending a lot of data will leak away all its memory to a process that's receiving messages. What's the solution to that ?

The solution would be to send a message and block. The thread is woken up if delivery fails or the receiver replies. Using this as a synchronisation tool, you can avoid burning off lots of address space.

C.

Posted: **Tue Nov 30, 2010 5:08 pm**

The solution would be to send a message and block. The thread is woken up if delivery fails or the receiver replies. Using this as a synchronisation tool, you can avoid burning off lots of address space.

Yeah, I know. I agree with you and prefer a synchronous system with the kernel doing a process-to-process copy. I was asking the other guys about page flipping though. If the sender keeps sending big messages to a receiver by page flipping, how does he get his memory back ?

OSDev.org

Message IPC w/wo dedicated message pool

Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool

Re: Message IPC w/wo dedicated message pool