OSDev.org

Posted: **Tue Dec 03, 2013 12:29 pm**

I am in the process of designing an implementing the IPC mechanism of my hobby microkernel project and I need some help mostly regarding design considerations.

I faced the first problem when I tried to implement the communication between an usual userspace process and the VFS server because I need to pass a lot of data between them when the process wants to read or write some kind of file, etc. At first I implemented my IPC messages with fixed size (6 * 8 bytes) to avoid queueing big message data inside the kernel. I thought that this problem can be solved with the help of creating a shared memory region per (to stick with the previous example) file descriptor. This region could be used to put the data there that should be written and vice versa. I started to worry about my design when I thought about what happens when multiple threads are trying to access the same file at the same time, the simple shared memory region will not be enough.

I have some ideas how to solve the issue but before trying to implement another method I would ask you for your opinions.

Idea #1: One shared memory region could be used per thread in the user application. This way it would not be a problem if multiple threads would like to write the same file at the same time. However this method requires a lot of support in the C (or whatever) library to maintain these informations. One more bottleneck that it still requires a copying when the data is put from the given user buffer into the shared region.

Idea #2: Extend the IPC messages to be able to contain any kind of data that will be cached by the kernel until the other end of the IPC link reads it. This seems to be the worst solution for me. It requires allocation of dynamic memory regions inside the kernel while I tried to avoid that.

Idea #3: Pass a pointer with a size to the ipc_send() function containing the memory location associated to the message from the sender process (e.g. the buffer to write into a file). Before receiving the message at the other end of the link the kernel maps the selected memory region to the address space of the other process. For now this seems to be the best solution I could think of however it still has some problem. I am worrying because the buffers passed to read() and write() are not page aligned in 99% of the cases. This means that the receiver process could access some of the sender's memory that is not even involved in the current operation.

What other options do I have to solve this problem?

Posted: **Tue Dec 03, 2013 1:38 pm**

There are many ways to solve this.

Synchronous messages works fine here as long as both sender and receiver knows how much data will be coming before hand. The receiver buffer can reside anywhere in user space and do not require any particular alignment, also synchronous messages do not need any kind of buffering in the kernel. This method is however slow compared to giving physical pages directly. For real speed you should look at L4 kernel map and grant way to do it, basically giving a page from one process to another.

I don't recommend special shared memory areas because it is just an unnecessary complication and you will need to copy it to the location anyway.

For memory mapped files: use map/grant way.
For reading file to anywhere in the user memory: use synchronous messages of arbitrary length.

In the file system hierarchy, you use map/grant almost exclusively except in the last step to the user process that reads the file.

Posted: **Tue Dec 03, 2013 1:49 pm**

I worked with a message passing kernel once that had a mechanism for calling the code in another process space in your own context. This mechanism was used whenever sending your data as signal payload was too time consuming. The company that implemented the kernel knew that sending messages wasn't always going to be the best solution. I would say don't get locked into a design that is uniform just for uniformity's sake.

Posted: **Tue Dec 03, 2013 2:08 pm**

OSwhatever wrote:Synchronous messages works fine here as long as both sender and receiver knows how much data will be coming before hand. The receiver buffer can reside anywhere in user space and do not require any particular alignment, also synchronous messages do not need any kind of buffering in the kernel. This method is however slow compared to giving physical pages directly. For real speed you should look at L4 kernel map and grant way to do it, basically giving a page from one process to another.

Two questions here...

At the receiver side I should know somehow the size of the arbitrary data before trying to receive the message to give a proper sized buffer to the kernel. Unfortunately without asking for the size of the message I could not really know it however this would require one more system call beside the receiver one. One more option would be to define the absolute maximum size of these messages (let's say 1Mb) and the sender should not send more than this at a time. This way the receiver can have a buffer that is suitable for receiving any kind of message. Any other way to solve this issue?

Let's say at first I would only implement synchronous IPC calls for communication between different processes in the system. Using this synchronous IPC for the whole communication between app<->VFS and VFS<->FS would lead into a problem in case of there is only one thread handling the requests inside the VFS because while serving the request of app1 is waiting for the underlying FS driver to complete the operation would prevent a concurrent request from app2 to get executed. How could I solve this? Use multiple threads to process requests in a server? ... but how many?

OSwhatever wrote:For memory mapped files: use map/grant way.
For reading file to anywhere in the user memory: use synchronous messages of arbitrary length.

In the file system hierarchy, you use map/grant almost exclusively except in the last step to the user process that reads the file.

Thank you for this tip, I will look into the details of it.

Am I right if I think that this could be used with asyncronous communication in VFS, FS drivers, etc. and it would also solve the second issue I mentioned before?

Posted: **Tue Dec 03, 2013 2:41 pm**

giszo wrote: At the receiver side I should know somehow the size of the arbitrary data before trying to receive the message to give a proper sized buffer to the kernel. Unfortunately without asking for the size of the message I could not really know it however this would require one more system call beside the receiver one. One more option would be to define the absolute maximum size of these messages (let's say 1Mb) and the sender should not send more than this at a time. This way the receiver can have a buffer that is suitable for receiving any kind of message. Any other way to solve this issue?

Code: Select all

struct OS_signal
{
   unsigned int size;
   unsigned int signal_number;
   char payload[1];
}

struct OS_signal *
signal_alloc(unsigned int size, unsigned int signal_number)
{
    struct OS_signal *siggie;

    siggie = malloc(sizeof(struct OS_signal) + size - 1);
    if(!siggie) 
    {
        log_error();
        return NULL;
    }

    siggie->signal_number = signal_number;
    siggie->size = size;
    return siggie;
}

giszo wrote: Let's say at first I would only implement synchronous IPC calls for communication between different processes in the system. Using this synchronous IPC for the whole communication between app<->VFS and VFS<->FS would lead into a problem in case of there is only one thread handling the requests inside the VFS because while serving the request of app1 is waiting for the underlying FS driver to complete the operation would prevent a concurrent request from app2 to get executed. How could I solve this? Use multiple threads to process requests in a server? ... but how many?

Use non blocking send. Use a signal queue. Block senders on queue full. Provide blocking receive and receive_with_timeout.

EDIT: For your signalling you want to create diagrammes like you see in this document: ITUT Interworking for SS7

Posted: **Tue Dec 03, 2013 2:41 pm**

giszo wrote:Two questions here...

At the receiver side I should know somehow the size of the arbitrary data before trying to receive the message to give a proper sized buffer to the kernel. Unfortunately without asking for the size of the message I could not really know it however this would require one more system call beside the receiver one. One more option would be to define the absolute maximum size of these messages (let's say 1Mb) and the sender should not send more than this at a time. This way the receiver can have a buffer that is suitable for receiving any kind of message. Any other way to solve this issue?

Let's say at first I would only implement synchronous IPC calls for communication between different processes in the system. Using this synchronous IPC for the whole communication between app<->VFS and VFS<->FS would lead into a problem in case of there is only one thread handling the requests inside the VFS because while serving the request of app1 is waiting for the underlying FS driver to complete the operation would prevent a concurrent request from app2 to get executed. How could I solve this? Use multiple threads to process requests in a server? ... but how many?

Usually, each file read will give you any length equal or less the read length you passed as parameter and in that way you can prevent any overflow. My kernel will pass the size of message that arrives and that way the user program will know it too.

Yes, if you have only one thread serving user application with file system operations using synchronous messages, it is likely to only be serving one client at a time. The solution is often to use multi threaded file services. You can use one thread if you want but it would likely to be an implementation mess with a horrible state machine. My Vfs creates a new thread for every file system request for example.

Thank you for this tip, I will look into the details of it.

Am I right if I think that this could be used with asyncronous communication in VFS, FS drivers, etc. and it would also solve the second issue I mentioned before?

There is nothing wrong with asynchronous messages but it is usually bad for sending large amount of data. Often asynchronous messages must end up in a special message pool which later requires you to copy the data to its final destination but there are many ways to implement asynchronous messaging. Asynchronous data does not necessarily solve you concurrency problems either.

Posted: **Tue Dec 03, 2013 3:00 pm**

OSwhatever wrote:Usually, each file read will give you any length equal or less the read length you passed as parameter and in that way you can prevent any overflow. My kernel will pass the size of message that arrives and that way the user program will know it too.

That is fine, my problem is the other way around when the user process want to write to a file. It passes a buffer with its size to the appropriate IPC method to send the request. The receiver (VFS) should provide a buffer large enough to store the sent data, however it does not really have an idea about the size of the sent message.

Posted: **Tue Dec 03, 2013 3:40 pm**

giszo wrote:That is fine, my problem is the other way around when the user process want to write to a file. It passes a buffer with its size to the appropriate IPC method to send the request. The receiver (VFS) should provide a buffer large enough to store the sent data, however it does not really have an idea about the size of the sent message.

Then you can split up the data in several IPC transfers. The VFS server and its interface both knows the maximum receive buffer in the VFS and can split up the transfer accordingly. You get the penalty of having several IPC messages and task switches but if the buffer is large enough it shouldn't be too bad. The file system itself usually have some kind buffer caching which also determines a maximum size which you can receive at a time.

Posted: **Tue Dec 03, 2013 4:48 pm**

The way I got around this in Rhombus was to allow IPC messages to reference a sequence of pages for bulk data. When a message was sent, those pages were unmapped from the sending process. When the message was received by the other process, it was granted the ability to map those pages into its own address space. This gives you the performance of shared memory (minus the overhead of the additional mapping call) while enforcing the message passing synchronization, and not filling up kernelspace with tons of copied data.

Posted: **Wed Dec 04, 2013 4:23 am**

I would have one more topic that is not strongly connected to the original subject of the topic however we already touched it before, so I'm asking here. What would be a good way for implementing request handling in system services like the VFS server?

If I want to use the same synchronous IPC method I am going to use between apps and the VFS it would require one thread per request in the VFS server. OSwhatever mentioned before he avoided this problem by starting a new thread for each request. Is this really a good way?

The other way I could think of is to use asynchronous IPC with some kind of method like NickJohnson pointed out to pass memory pages between user processes. This way I could avoid storing message data inside the kernel when messages are queued at an IPC port because the data is already stored by the user process. However I am still worried because this method still requires a lot of support in the servers to be able to keep track of pending requests.

Posted: **Wed Dec 04, 2013 12:24 pm**

giszo wrote:I would have one more topic that is not strongly connected to the original subject of the topic however we already touched it before, so I'm asking here. What would be a good way for implementing request handling in system services like the VFS server?

If I want to use the same synchronous IPC method I am going to use between apps and the VFS it would require one thread per request in the VFS server. OSwhatever mentioned before he avoided this problem by starting a new thread for each request. Is this really a good way?

The other way I could think of is to use asynchronous IPC with some kind of method like NickJohnson pointed out to pass memory pages between user processes. This way I could avoid storing message data inside the kernel when messages are queued at an IPC port because the data is already stored by the user process. However I am still worried because this method still requires a lot of support in the servers to be able to keep track of pending requests.

Storing message data for large messages inside the kernel can be avoided for any kind of message system. You can copy message data inside the kernel from one process to another directly. There are optimizations where messages are temporarily stored in kernel because mapping parts of another process in the kernel space pollutes the TLB. This is mostly used for small messages. This is also known as the "double copy" in QNX.

When I mentioned that I started a thread for each file system operation, that's a truth with modification. If there are 200 file system requests, 200 threads are of course not created. A maximum set of threads (depends how many CPUs you have and what is most optimum for that particular operation) are created and the remaining pending file system requests are queued. This is usually called a thread pool.

Posted: **Wed Dec 04, 2013 1:18 pm**

OSwhatever wrote:When I mentioned that I started a thread for each file system operation, that's a truth with modification. If there are 200 file system requests, 200 threads are of course not created. A maximum set of threads (depends how many CPUs you have and what is most optimum for that particular operation) are created and the remaining pending file system requests are queued. This is usually called a thread pool.

I was worried about creating a thread per request because it would mean a lot of allocation inside the kernel per request. Using a thread pool for this purpose makes sense however it is still a good question how you decide the optimal size of this pool.

Do you use synchronous IPC everywhere in your project?

Posted: **Wed Dec 04, 2013 5:08 pm**

giszo wrote:I was worried about creating a thread per request because it would mean a lot of allocation inside the kernel per request. Using a thread pool for this purpose makes sense however it is still a good question how you decide the optimal size of this pool.

Do you use synchronous IPC everywhere in your project?

I use a mix of synchronous and asynchronous as I see fit. The two variants are good for different things. If you look at QNX and L4, they only support synchronous messages in the kernel and you can come a long way with only synchronous. You can actually emulate asynchronous messages with synchronous messages.

Asynchronous, good for buffering and queuing requests and data. Doesn't block sender at all.
Synchronous, good for sending large amount of data to the location you need it. If you look a most interfaces, you will notice that they are of synchronous send and reply type meaning that sender must wait for the response.

It's really up to you how what kind of programming model you want to use and what type is the most beneficial for you.

Posted: **Thu Dec 05, 2013 2:43 am**

giszo wrote:I am in the process of designing an implementing the IPC mechanism of my hobby microkernel project and I need some help mostly regarding design considerations.

I faced the first problem when I tried to implement the communication between an usual userspace process and the VFS server because I need to pass a lot of data between them when the process wants to read or write some kind of file, etc. At first I implemented my IPC messages with fixed size (6 * 8 bytes) to avoid queueing big message data inside the kernel. I thought that this problem can be solved with the help of creating a shared memory region per (to stick with the previous example) file descriptor. This region could be used to put the data there that should be written and vice versa. I started to worry about my design when I thought about what happens when multiple threads are trying to access the same file at the same time, the simple shared memory region will not be enough.

If two threads are trying to access the same file descriptor at the same time, lock the region with a mutex...

Don't try to invent "One grand unified solution" if it doesn't make sense. As I'd do it:

Small IPCs might go through a memory mapped ringbuffer shared between the ends of the communications channel (With the kernel providing the ability to "prod" or "wake up" the other end)
Large IPCs happen by some form of per-IPC page mapping

Now, for the special case of actual files, you might decide to go for the latter; in particular, have all file I/O be handled, in the background at least, by memory mapping.

If you look at L4's IPC:

The receiver must explicitly allocate the buffers it is to receive into. If the sender attempts to "string copy" more data, or map more pages than is allowed, the IPC is denied
The sender and receiver always get the option of not waiting for an IPC

This leads to an interesting dynamic. In general, L4 IPCs take a very much procedure-call style: for most services, it is mandatory that the caller be waiting for the response at exactly the moment that it is sent (else it is lost/an error occurs). You might notice a race condition here between the calls to IPC Send and IPC Receive; this is why L4 contains an L4_SendAndReceive system call. This turns out to be a very useful symetric system call: it both functions as "IPC_Call" and an "IPC_ReturnAndGetNextCall" function.

Where you might want to differ from L4 (certainly I take a lot of inspiration from L4; this is one area where I differ) is message addressing: L4 addresses messages at thread IDs. You might want to add some notion of "IPC ports" or similar.

Posted: **Thu Dec 05, 2013 2:28 pm**

Owen wrote:Don't try to invent "One grand unified solution" if it doesn't make sense. As I'd do it:

Small IPCs might go through a memory mapped ringbuffer shared between the ends of the communications channel (With the kernel providing the ability to "prod" or "wake up" the other end)

The shared memory ring buffer is great as it is a zero copy solution and allocation can in some cases be done lock free but this requires that you map at least one page for each open channel. The VFS for example is likely to have many clients, in the hundreds even which consume a lot of virtual and physical address space. Each program is also likely to use several other services and not only the VFS so the amount mapped channel pages must be quite large or am I exaggerating the problem? Do you see this as a problem or is the extra memory used here worth it?

OSDev.org

Microkernel IPC with arbitrary amount of transferred data

Microkernel IPC with arbitrary amount of transferred data

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat

Re: Microkernel IPC with arbitrary amount of transferred dat