RPC message size, handling oversized messages

zloba · Post by **zloba** » Thu Sep 22, 2005 1:45 pm

If you have implemented RPC or Message Passing in your OS, how have you dealt with this problem?
Or how are you planning to?
Here's what I think:

Suppose we have a client sending messages (requests) to a server over a channel, and getting back messages (responses). Messages represent method calls of a particular interface.

The receiver doesn't know how much data will arrive.
It can malloc a receiving buffer about large enough for most cases, but it may happen to be not enough, short of allocating all address space, maybe not even then. (try GetHardDiscContents)
The larger the buffer, the less likely that will be, but also more wasteful on average, and never completely guaranteed. (Or maybe a reasonable upper bound can be chosen? How?)

We can deliver a partial message, making use of whatever buffer is available, and indicating that "there is more where that came from", and how much.

We can respond with a "not enough buffer, need XXXX" error, so that the sender can retry or abort. (That won't work for Very Large Messages which can't be delivered in one piece)

For partial response, it would probably be a good idea to have a fixed-size "table of contents" (things like fixed-size fields, sizes of the non-fixed ones), so that the receiver could be guaranteed to receive it in the first round, examine what it already got and make some decisions, maybe selectively receiving some sections of the message (ok, now this is getting messy).

Maybe IDL or encoding can be structured so that a useful, fixed- or bounded-size portion of the message comes first, and the minimum (maybe also maximum) buffer size can be known.

Some things can be limited, either in IDL or at runtime (like max filename size), so that the encoding routines can determine a message size.

Very Large or even Unbounded Things (such as HardDiscContents) can be passed as an object handle (a Stream), fixed-size, to be consumed separately in whatever chunks the receiver prefers.

So, I'm starting to think that determining a limit based on the nature of a particular message is the way to do it.

When working with interface messages (as in COM):
When the Client makes a call, it is calling a specific method, and it knows what kind of data can come back.
When the Server is waiting for a call, it doesn't know which method will get called, so it can allocate the maximum size among all methods, enough to receive any method.

What do you think?

JoeKayzA · Post by **JoeKayzA** » Thu Sep 22, 2005 2:36 pm

Hi!

Even though my designs about userspace ipc are not yet finished, I tend to go with fixed size messages, which can contain 'indirect strings'.

This way, small messages can be sent in one operation (just by filling the fixed-size space), and larger amounts of data are retrieved in a second operation. The length of an indirect string is entered in the fixed size part, so a buffer of appropriate size can be allocated. Then, the receiver calls the kernel again to copy the string into this buffer.

To keep this mechanism efficient, the fixed size part must not be too large however, probably not more than 8 integers. After the values of this area are extracted (i.e. copied to some other data structures, or used to retrieve a string), the fixed-size buffer could be reused for the next message.

Note that 'string' does not only refer to char-arrays

, but to any sequence of raw data (just to make sure). And yes, this is somehow inspired by L4's ipc system.

cheers Joe

Kemp · Post by **Kemp** » Thu Sep 22, 2005 3:59 pm

My planned system is quite simple. Messages are fixed size (most are two 32bit values, some exceptions apply) and they can only contain a straight value, no messing around is involved. Passing large amounts of data is accomplished via shared memory (simply mapped into both application's address space). System services get a much larger message passing ability (both to and from, and this is the exception noted above) where particular values are treated as addresses and the area they point to is mapped into the receiving application's address space and then the address is fixed up to point to the right place before the receiving app sees it.

Crazed123 · Post by **Crazed123** » Thu Sep 22, 2005 9:56 pm

[edit] Sorry, just noticed you were only talking about message-based RPC. Post deleted.

FlashBurn · Post by **FlashBurn** » Mon Sep 26, 2005 3:46 am

I also will use a fixed size msg system.

When you want to send a msg which is larger then the fixed size you will send a msg which is a multiplier of the fixed size. the receiver receives a fixed size msg and there it sees that the msg is larger than that and it calls a 2nd time with the number msgs it wants to get.

So you can receive the fixed size msgs in 1 syscall and if there is a larger msg you make a 2nd call.

The call would be:

Code: Select all

int receive(uint number, msg_ptr *msg)
int send(uint number, msg_ptr *msg)

For saving mem I will save such a large msg as 2 msgs, 1 fixed size and 1 as a multiplier of the fixed size.

You will only receive as much as you want to receive, but the msg will be deleted when you received it. Also if you don?t get all of it!

Another way would be to have fixed size msgs and you can receive as much as you want and the msg will be deleted when you have received the whole msg! So you save the mem for the 2 msgs in my 1st solution.

I hope you can understand my system!

Pype.Clicker · Post by **Pype.Clicker** » Sat Oct 01, 2005 3:13 am

i've been thinking at a moment of doing something like "jumbo messages", but i'm now going backward. There are several difficulties about the semantic you can expect on the data (e.g. how long the strings contained in a message are valid ? etc.) so i'd rather go for a streams-based RPC mechanism. Messages are still there and available, and they can be used to exchange addresses of pages to be mapped, amount of data sent over a stream and stream identifier, but there will not be a 'transparent, zero-copy, any-sized message mapping' because all i could come with that fits the definition is actually ... a mess.

For more on my attemp of defining such message passing scheme, see http://clicker.sourceforge.net/wiclicke ... erMessages.

JoeKayzA · Post by **JoeKayzA** » Sat Oct 01, 2005 3:32 am

I noticed a few days ago, that when you have a messaging system with variable sized messages, and you completely let the receiver decide on how many bytes it wants to receive (which is neccessary when it has to allocate the buffer itself, AFAICS), then you'll end up with a stream-based IPC system. (and this is also what Pype has already mentioned some time ago, IIRC)

This is giving me headaches....;D

cheers Joe

Pype.Clicker · Post by **Pype.Clicker** » Sat Oct 01, 2005 4:50 am

well, on the basis, the L4 messaging system is interresting and appealing, indeed. It nicely helps, for instance, to pass along a message that contain a filename to be resolved or something alike.
- if you have zero copies (e.g. actually do temporary mapping), how do you decide "how temporary" the mapping could be ? and how can you ensure privacy of datas ? if you're copying them first in a "exportable" area before using them for a message, that's not really zero-copies.
- if you have single-copy at sender-side, it actually looks much like stream-writing. Single-copy at receiver side has to be done at kernel level because you cannot trust the receiver not to try to inspect more than what it's entitled to do and that will put probably much more complexity in your microkernel than what you're willing to support.

So finally, the solution is (imho) to use streams as soon as more complex protocols (and especially RPCs) come into play.

FlashBurn · Post by **FlashBurn** » Sat Oct 01, 2005 5:43 am

Maybe one of you can explain what a stream as you mean it is and how it works!?

JoeKayzA · Post by **JoeKayzA** » Sat Oct 01, 2005 6:54 am

FlashBurn wrote: Maybe one of you can explain what a stream as you mean it is and how it works!?

I see the most basic stream-system is the one linux uses in it's low level IO layer (and I hope this is also what Pype meant, otherwise tell me):

There is a 'stream' which connects two processes (or domains or threads - whatever). There is always a writer and a reader.

The writer issues a system call and passes a pointer to the data area it wants to write, as well as the number of bytes it wants to write. The reader also does a system call, it passes a pointer to a buffer area as well as the length of this area.

So when there is data offered on the stream (from the writer side), and there is someone who wants to consume data from the stream (the reader), then the kernel simply takes the writer's data and copies it to the reader's buffer. This is easy when both sides expect the very same amount of data.

When the amount of data differs (for some reason), for example the writer offers more data than the reader wants to consume, the kernel will only copy as much data as will fit into the reader's buffer. To make this work, the kernel tells both sides how many bytes it has copied, so the writer can see that the reader has not yet got all data. Note that this can also be the other way round (the reader expects more data than the writer offers).

I hope this was not too screwed up - but this is my version of how streams work.

cheers Joe

JoeKayzA · Post by **JoeKayzA** » Sat Oct 01, 2005 7:12 am

Pype.Clicker wrote: - if you have zero copies (e.g. actually do temporary mapping), how do you decide "how temporary" the mapping could be ? and how can you ensure privacy of datas ? if you're copying them first in a "exportable" area before using them for a message, that's not really zero-copies.

That's one reason why I'm not an advocate of 'zero copy', since arbitrary data will hardly reside in page-aligned areas, imo. I think you are better off anyway with single-copy.

Talking of temporary mapping: I think the only sensible way is to map it copy-on-write in this case, then you won't have problems. However, mapping buffers directly into user space is only possible with page aligned data/buffers which are a multiple of PAGE_SIZE, and then this starts to look like shared memory rather than a messaging system, IMO.

cheers Joe

zloba · Post by **zloba** » Sat Oct 01, 2005 9:41 am

IMO, about shared memory - I agree with what has been said above.
In addition, I don't think it will be so much more efficient overall - a constant-factor improvement (and a small one, at that).

Consider the "writing 1 Mb of data to disk" example from the OSFAQ: That would be an improvement if the data just magically happened to be there by itself, with no other processing, but what if it didn't? What if it came over the network or from say a disk, maybe even a floppy? In any case, it will have to be processed at least once, so eliminating 1 pass of processing will offer a 0.5 factor improvement - that's in the Best Case! If there is any non-trivial processing to be done, the factor becomes negligible.
That's ignoring disk writing speed which will certainly be there, big time.
Add to that the overhead and complexity of dealing with the page mappings, in the OS, sender and receiver, syscalls needed for notifications, etc.
Think big-Oh notation.
(Correct me if I'm wrong)

For smaller, typical cases, it will be nothing but trouble.

Of course, there are special cases such as video buffer, where the situation is different - there are large amounts of data, possibly modified in-place and frequently submitted. In such cases, shared memory may be the solution for data transfer, with messages for notifications.

Pype.Clicker · Post by **Pype.Clicker** » Sat Oct 01, 2005 11:15 am

@zloba: i'm not 100% sure i followed what you said. Whether some data should be made available through shared memory, whether it should come through a 'stream' or whether it should be part of a message is usually part of the design.

zloba · Post by **zloba** » Sat Oct 01, 2005 5:22 pm

2 Pype:

i'm not 100% sure i followed what you said

I was trying to say that using shared memory to eliminate the cost of copying data may not be the best optimization in the general case.

From Wiki on "Message Passing":

If copying a 4-word message forth and back doesn't imply excessive processing cost, it will be very different for the 1MB of data you send to a disk server. In that case, it is suggested to toy with paging in order to map the real data from the emitter to the receiver's address space.

This seems to imply that the problem with large messages is the cost of copying them, and the solution is shared memory, ignoring all other aspects of said transfer of 1Mb.

Whether some data should be made available through shared memory, whether it should come through a 'stream' or whether it should be part of a message is usually part of the design.

I agree (I was trying to say that in the last paragraph, that was part of my point).

Messages and shared memory can work together, with notifications delivered via messaging and data via shared memory.

Brendan · Post by **Brendan** » Sun Oct 02, 2005 12:29 am

Hi,

zloba wrote:From Wiki on "Message Passing":
If copying a 4-word message forth and back doesn't imply excessive processing cost, it will be very different for the 1MB of data you send to a disk server. In that case, it is suggested to toy with paging in order to map the real data from the emitter to the receiver's address space.
This seems to imply that the problem with large messages is the cost of copying them, and the solution is shared memory, ignoring all other aspects of said transfer of 1Mb.

The "toy with paging" doesn't always imply shared memory.

zloba wrote:
Whether some data should be made available through shared memory, whether it should come through a 'stream' or whether it should be part of a message is usually part of the design.
I agree (I was trying to say that in the last paragraph, that was part of my point).

Messages and shared memory can work together, with notifications delivered via messaging and data via shared memory.

For my OS messages vary in size from 8 bytes to 32 MB. For each thread there's a 32 MB area of the address space reserved for the thread's message buffer. When a message is sent it is moved (not copied) to the receiver's message queue, which exists in kernel space. When a thread asks for it's next message, the next message is moved from it's message queue in kernel space into it's message buffer.

For small messages (1 KB or less) I copy the data and then free any pages in the sender's message buffer. For larger messages (between 1 KB and 2 MB) I move page table entries from the message buffer to the message queue, and then (later) from the message queue to the recevier's message buffer. For huge messages I move page directory entries.

This means, for the largest message possible I actually move 32 bytes from/to the page directory. In this case invalidating TLB entries is where the overhead is.

It's not the most efficient IPC system, but it's clean and consistant - software does exactly the same regardless of how much data is being sent/received, there's no security problems with extra data being transfered (the message buffers are empty after a message is sent), software never needs to allocate any buffers, and everyone knows how much can be sent in advance, so messaging protocols can be designed to work within the 32 MB limit.

The 32 MB limit was chosen to allow an entire screen full of video data to be sent/received in one transaction. For my OS, I can't use shared memory because it's intended to be a distributed OS, where the sender and receiver may be running on different computers without their knowledge (without the programmer needing to care or allow for it).

Cheers,

Brendan

OSDev.org

RPC message size, handling oversized messages

RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages

Re:RPC message size, handling oversized messages