OSDev.org

Posted: **Thu Jun 30, 2005 9:48 am**

Neon IPC-system used to be a fairly compilicated single-function send/receive synchronous rendezvous thingie.

Anyway, there where four practical difficulties:
- correct, not to mention efficient use of the facility would have been hard. I mean, hard as in "hard even if you designed it".
- implementation of such a system efficiently (and correctly) is tricky, especially if things like realtime concerns and user pagers become part of the mess
- any kind of asynchronous operation whatsoever would require an additional helper process
- the whole mess was almost impossible to use from kernel space

So what happened was that sometime yesterday, I thought: "ok, who cares if I queue messages in kernel space". So what I have now is possibly the most simple message passing system that could possibly work, and still be reasonably flexible and secure:

1. There are "objects" which are really <dispatchFn,userdata> pairs.
2. There are also "handles", which are like remote pointers to objects. You can only use "handles" for sending message to the object. Within a single process, one can use object pointers directly in place of handles.
3. Each message is a 4-tuple <object, op, w1, w2> where op is a hashed operation code (generated with 'nifgen'). W1 and w2 are two paramaters. The first is (currently) always an integer, while the second can hold (currently) either an integer, a pointer to message data structure, or a handle.
4. Each thread has a message queue. Each object is owned by some thread. Any messages to an object are stored in it's owner's message queue.
5. One can either get any kind of message, or filter by specific object, or by specific message code. Messages are received in FIFO order. When filtering is used, non-matching messages are kept in the queue for the future.

The whole system is actually a bit like a cross-breed between Windows PostMessage/GetMessage-stuff, and COM like architecture.

On user level, the main interface looks like this:

Code: Select all

// Send a message
int SendMessage(HNDL hTarget, MWORD op, MWORD w1, MWORD w2);

// dispatcher function
typedef int (*dispatchFn)(Object *, MWORD, MWORD, MWORD);

The op-code is a combination of a hash, and few bits of type-information, generated by 'nifgen', such that the system automatically knows what type of information w2 contains.

The most interesting part is that lots of stuff in kernel that was previously special cased is now trivially implemented in terms of the message system. For example, the "sleep" function really just installs a timer, and waits for timer completion message.

The system is more or less implemented, and more or less seems to work. From theoretical standing point it is nowhere near as pure as the old one, but at least it's not totally awful either. And it sure makes things quite easy.

Posted: **Fri Jul 01, 2005 12:25 am**

It makes things really simple. *gg* to have store-and-forward message passing inside the kernel. you can have the threads send messages and either wait for a reply or stroll off and do something else.

is this handle you mention a pointer to a message box?

Posted: **Fri Jul 01, 2005 5:17 am**

Yep. I had myself the messaging system replaced a few monthes ago (i could point you to my wiki, but it is doomed once again. Damn') ...

Now, unfortunately, i'm facing a quite nasty issue of exchanging data of variable size through those messages. E.g. the name of something, etc. I somehow wanted it "copyless" but it turned into a nightmare (e.g. i only have partially working stuff, which is usually worse than having nothing at all)

I suppose i'll just go for implementing "streams" and see if i can come with a system where streams contain informations and process can be waiting for either a message or data on a stream (possibly with a kernel-generated message saying "hi there, you have data in stream X"

Posted: **Fri Jul 01, 2005 5:46 am**

the copyless stuff would include shared memory, woudln't it?

I daresay, that would also require some kind of user library to use alongside with this shared memory - and semaphores or so.

What would give me more of a thrill would be: take a page, write your message to it and send that page (page array) to the requesting process - for large data. That would be a real nice brain teaser.

But as an intermediate solution, you could implement something like "user_copy", which transfers data of any size from one address space to an other one.

as for the streams: I've abstracted them to connection points/files. if you want to read from stdin: open a "console file" and read from it. nearby unix way. As the fs service takes care of "one after the other2" due to the message passing, there isn't any sync. problem yet.

What I'm unsure is the "hi, there is data for you" idea. It might not be read if the thread strolls in other regions of address space. Therefore I would only send data if data has been explicitly requested (sync or async), but thats just my opinion.

Posted: **Fri Jul 01, 2005 9:16 am**

beyond infinity wrote: the copyless stuff would include shared memory, woudln't it?

I daresay, that would also require some kind of user library to use alongside with this shared memory - and semaphores or so.

What would give me more of a thrill would be: take a page, write your message to it and send that page (page array) to the requesting process - for large data. That would be a real nice brain teaser.

Yeah, i had indeed something like this: when you request for a lookup of "system.device.vga0.vendor", the page containing "nVidia Corporation" is resolved locally, made available for export, its "exportable address" (e.g. phys. address of the holding table:(offset in table+offset in page+number of pages) is passed as a regular message and then "imported" by the requesting process.

I even made it so that if you request the same page several times (e.g. you don't only need ".vendor", but also ".model", ".modes.*", etc.) it will not need to be mapped several times but instead you reuse the previous mapping (and start nightmares if you want to know where the mapping can be gone).

However:
- for current cases (swapping names, mainly), i fear that the cost of hashing, lookups in local caches, mapping, etc. of the page overkills the memory copy. (e.g. is it *really* interresting to have "vendors" string passed as a page reference ? i'm not quite sure)
- if you imported the string "800x600x24bpp" and you want to pass it back, you might not be able to tell it's already in the address space you're communicating with: you have to export what you have imported and have it imported again.

But as an intermediate solution, you could implement something like "user_copy", which transfers data of any size from one address space to an other one.

as for the streams: I've abstracted them to connection points/files. if you want to read from stdin: open a "console file" and read from it. nearby unix way. As the fs service takes care of "one after the other2" due to the message passing, there isn't any sync. problem yet.

What I'm unsure is the "hi, there is data for you" idea. It might not be read if the thread strolls in other regions of address space. Therefore I would only send data if data has been explicitly requested (sync or async), but thats just my opinion.

Posted: **Fri Jul 01, 2005 11:03 am**

Hi,

Just thought I'd throw my 2 cents into the pot

...

For my OS I use message buffers at fixed addresses, and FIFO message queues. To send a message a thread builds the message in it's message buffer, then calls the kernel function to send the message (sort of "unsigned int send_message(receiverID)"). The kernel function transfers the message (more on this transferring later) to the receivers message queue.

When the receiving thread calls the kernel's "get message" ("unsigned int get_message(void)") or "check for message" ("unsigned int check_message(void)") function the kernel checks the thread's message queue and returns the first message that's there. This invloves moving the message from the queue into the threads message buffer.

For simplicity, every thread always has a single message buffer at the top of it's part of the address space. This implies that a thread's "message exchange ID" or "message port ID" is always exactly the same as it's "thread ID".

When a message is transferred, the kernel looks at the message size and may "compress" the message. For small messages (less than 1024 bytes) the kernel will use "rep movsd" to copy the message as is. For medium sized messages (less than 2 MB) the kernel copies page table entries instead, and for large messages (up to the artificial 32 MB limit) the kernel copies page directory entries. This means that for the maximum sized message (32 MB) the kernel actually copies 8 dwords from the page directory to the message queue, and later copies those 8 dwords from the message queue into the receivers page directory. The main cost here is invalidating the TLBs.

Because every thread's message buffer is always at the same fixed linear address (due to my "thread spaces"), the message passing is quite efficient - the addresses to copy to/from are hard coded, and the part of the linear address reserved for the message buffer is treated differently by the linear memory manager (allocate on demand, no swapping, no memory mapped files, no DMA, no memory mapped IO, etc) to remove the need to check anything (for e.g. transferring a page that's being used for DMA would be a very bad idea!).

There is no memory sharing in my OS - it's a distributed OS and physical pages can't be shared adequately across seperate computers without having serious performance implications. Instead, when a thread sends a message it's message buffer is cleared, and when a thread receives a message the new message overwrites everything that may have been in it's message buffer. From the thread's perspective, messages are always moved and not copied.

[continued next post]

Posted: **Fri Jul 01, 2005 11:11 am**

[continued from previous post]

This has an additional benefit - when a thread calls the kernel's "get message" or "check for message" function the kernel knows that the thread expects it's message buffer to be wiped by a received message, and the kernel can safely free any memory that may be allocated to this area. Most of the time most threads are waiting for a message to be received, so this automatic memory freeing reduces the amount of memory in use at any given time. Combined with the "allocation on demand" (which allocates zeroed pages only) this contributes to the illusion that the entire 32 MB message buffer is moved to the receiver's address space.

In no case should a message contain any pointers (data only). The reason for this is that it makes a mess of the memory manager (shared pages, problems mentioned by Pype), creates re-entrancy locking problems (any data that can be accessed by more than one thread must be locked to prevent it from being modifed while another thread is reading it), and it violates my "thread is an OOP object where private data is private" ideals.

As for naming the "message exchange" or "message port", most of them aren't named and have no need to be found by other software. Some threads provide a public service and must be named - in this case the thread creates a small (4 byte) file containing it's "thread/exchange/port ID" so that other software can "fopen(name, "r")" the relevant file and obtain the "public" thread ID. The VFS code handles just about all of this (preventing 2 threads with the same name, allowing directory searches for "/sys/port/myService???", etc) and adds the ability to use the file's access permissions to restrict access to the message exchange/port. This name is only used to obtain the static thread ID (or message exchange/port ID) so it'd only be looked up once (rather than doing a lookup every time a message is sent).

I refuse to implement streams, "stdin", "stdout", etc. The problem with it all is that intermediate software (e.g. the kernel) has no idea where one "transaction" begins and another ends. This leads to inappropriate bufering, and (in the worst possible case) can cause 2 thread switches for each byte sent/recieved. It's probably fine for ASCII data where buffering can be controlled via. the linefeed character, but I'm transferring all sorts of different types of data. To illustrate consider the following:

Code: Select all

Thread A:
   for(j = 0; j < length; j++) fputc(fp, output[j]);
   fclose(fp);

Thread B:
   while( (c = fgetc(fp)) != EOF) input[i++] = c;
   fclose(fp);

I expect an experienced programmer (e.g. Pype.Clicker) could find several glaring problems with this code, yet I see similar code often (usually because I'm the worst offender ::)).

Cheers,

Brendan

Posted: **Fri Jul 01, 2005 4:01 pm**

beyond infinity wrote: is this handle you mention a pointer to a message box?

Actually, a "handle" in the new Neon design can be one of several things. First of all, HNDL is typedef'd as an Object *, which means any local Object* is automatically a valid HNDL.

What exactly is Object* depends on whether you are in kernel or userspace, but basicly it is a dispatch function, and some user data. For userspace objects that's more or less all. For kernelspace objects there is also some additional stuff for book-keeping. And if you send a local HNDL to another process, then a "shadow object" is created in kernel, which is simply a kernel object that forwards data to userspace object in the relevant process.

Now, while every valid Object* is a valid HNDL, not every HNDL is a valid Object*. There is a requirement for objects to be 4-byte aligned, so that the two lower bits are free for dynamic typing. Any "remote handles" have the lowest bit set, and the other bit is reserved for now. When you send a message to a remote object, a system call is made, and kernel takes care of message transfer. Local objects (if it looks like a pointer) can be done in userspace completely.

Remote HNDL are not really pointers at all, but contain a code, which can be used to lookup the real "Handle". The real Handle can then tell which process owns it, and what Object it points to.

This also means that when a HNDL is part of the message, the kernel can look at it, and determine whether it's a local or a remote HNDL. If it's remote, then it's looked up and copied for the target process. If it's local, then a new "shadow object" is created, and handle to that is passed.

As for the original question: no. There is no message queue associated with either handles or objects. Each object is owned by some thread, and each thread receives messages for it's object into it's own message queue. There is exactly one message queue for each thread.

Posted: **Fri Jul 01, 2005 4:35 pm**

Pype.Clicker wrote: Yep. I had myself the messaging system replaced a few monthes ago (i could point you to my wiki, but it is doomed once again. d*mn') ...

Unless you've come up with something new after the 'events' stuff then I think I more or less know what you have. I've been reading other people's websites every once in a while after all; it's nice to see how others progress.

Now, unfortunately, i'm facing a quite nasty issue of exchanging data of variable size through those messages. E.g. the name of something, etc. I somehow wanted it "copyless" but it turned into a nightmare (e.g. i only have partially working stuff, which is usually worse than having nothing at all)

Well, I've been thinking of this too, and I think it's pretty painful to do nice on a asynchronous system. My ideas so far are basicly:

1. have a separate message area for each thread (or process) which is totally kernel managed. Copy incoming messages to this area. If the receiver wants to keep a message, it can make a copy of it into manually managed region. This needs 1 copy, plus possibly an extra copy, but makes kernel a lot more complicated.

2. have some maximum message size, and have receiving thread provide a buffer of this size when it requests for a new message. This needs 1 (to kernel) + 1 (to receiver) = 2 copies. You need quite big buffers, but I think most messages aren't kept after receiving (so the buffer can be reused) and shrinking a big buffer after receiving a message doesn't need a copy if the malloc design takes this into account. This is easy to implement, so I'm probably going with it first.

3. alternative to the previous is to put incoming messages into stack, and adjust stack-frames suitably. If the message is immediately dispatched, it only needs to be copied if it needs to be kept when the created stack-frame returns. Still need 2 copies though, unless the messages are delivered asynchronously (like interrupts). In the latter case userland still needs to copy the message again if it wants a messageloop, but at least it can check the message's validity and/or forward it automatically without the extra copy.

4. Finally, as long as the sending thread is waiting in the kernel, there is no need to copy message data into kernelspace. So when the receiving thread is ready to receive the message, and is of higher priority than the sender, we have the final buffer before the sender needs to return to userspace. In this case we can do a direct copy. This is also a case if a send/wait is combined into a single system call, but in such case we can possibly save a copy for the reply as well. The idea with this is more or less "provide queueing, but if the messaging happens to look like rendezvous, then take advantage of that and do single copy".

I think it's good idea to assume that most messages are small to medium size. For large messages it's better idea to use "shared" memory; it need not really be shared, since a shared memory object could be unmapped in the sender before the message containing it is delivered, but in any case we are sending mapping rights to a set of pages.

I suppose i'll just go for implementing "streams" and see if i can come with a system where streams contain informations and process can be waiting for either a message or data on a stream (possibly with a kernel-generated message saying "hi there, you have data in stream X"

I'm not going to. This is trivially solved as a combination of short messages, and a shared memory object. Just send a message "new data here" or "this can be overwritten" to keep the two parties aware of each other's actions.

Posted: **Fri Jul 01, 2005 4:52 pm**

Brendan wrote:

Code: Select all

Thread A:
   for(j = 0; j < length; j++) fputc(fp, output[j]);
   fclose(fp);

Thread B:
   while( (c = fgetc(fp)) != EOF) input[i++] = c;
   fclose(fp);

Ok, let's see. If the stream is buffered, then this will only cause a thread switch when the buffer needs to be flushed as it fills. Since we are using the stdio interface, then stream is likely to be buffered. On Unix it will also be flushed after each newline character, unless that was turned off, but even then unless every other character is newline, we are sending more than one byte at a time. For random 8-bit data, average 1/256 characters is newline. Even with ~4 flushing characters, we still get 4/256 which is 64.

Second, the above was only the internal stdio buffering. In addition, on a typical system OS also does some buffering. So unless we have a pre-emptive scheduler and the receiver runs at higher priority, we only switch when either the OS buffer becomes full, empty, or sender run's out of time quantum.

If you disable buffering, the above ofcourse makes no sense. But the buffering is there is because reading/writing character at a time can make programming easier.

Oh, and you have a potential buffer overflow in there because you don't keep track of when you are going to overflow your input buffer.

But really: if you live in the Unix world with buffering at both OS and libC level, there is absolutely nothing wrong with reading/writing one character at a time. This is why stdio is usually a Good Thing(tm). Using read()/write() with one character at a time is much more stupid thing to do (although the OS buffering is still there).

Posted: **Sat Jul 02, 2005 2:34 am**

Hi,

mystran wrote:Ok, let's see. If the stream is buffered, then this will only cause a thread switch when the buffer needs to be flushed as it fills. Since we are using the stdio interface, then stream is likely to be buffered. On Unix it will also be flushed after each newline character, unless that was turned off, but even then unless every other character is newline, we are sending more than one byte at a time. For random 8-bit data, average 1/256 characters is newline. Even with ~4 flushing characters, we still get 4/256 which is 64.

It's more a problem of not knowing when the transaction ends. As an example, consider a thread that sends icon data that's 32 * 32 bytes to a receiver that is meant to display the icons on the screen. If the sender sends 1024 bytes without any terminator (the most efficient method), then the sender must explicitly flush to ensure that all data is actually sent (rather than having half of it remain in the output buffer while the receiver is waiting for it). It also means the receiver would need to cache received data, counting bytes until the entire icon has been received ready to display.

Now add to this that something may go wrong and the sender may be terminated after the first half is sent (ie. sender's buffer was too small to hold all the data, or the data is too large for the underlying network protocol - e.g. 2 MB of data over ethernet), leaving the receiver waiting for a second half that never comes. For my OS this problem could even be caused by a dial up connection where the ISP has a 3 hour time limit, or some-one turning off a computer between the sender and the receiver.

Using descrete transactions (messages) all layers (the sender, kernel/s, networking protocols, network device drivers and receiver) can ensure that either nothing is received or the entire message is received - there is no possibility of "partially received". This means the receiver is always able to immmediately process any received data without waiting for any remaining data.

The other problem is that it's impossible to use streams to receive data from multiple senders. In this case you're left with handling multiple streams and the "select()" function, which complicates things further.

By carefully adjusting buffer sizes and doing explicit flushing most of the problems with streams can be avoided, but it's hardly ideal (messaging is far simpler). I'd expect this is the reason why there's also "sendto" and "recvfrom" functions (datagrams) used for sending and receiving packets of data over streams. IMHO these functions seem like a hack, designed to avoid the problems caused by pretending that IPC is the same as file access when clearly there are differences.

Cheers,

Brendan

Posted: **Sat Jul 02, 2005 3:54 am**

I wasn't trying to say that Unix mindset is necessarily the right one. I was simply trying to say, that the specific code is not necessary that bad if you are running in an Unix environment.

But ok, the point about transaction semantics is a good one. I agree with that. On Unix there is little you can do, other than switch to (datagram?) sockets. Then again, the type of code that actually does character at-a-time processing will usually not care about transaction semantics at all.

That said, sending a message on Neon will transfer that message, or not transfer that message. Whether the message happens to be part of a protocol that implements a stream is totally irrelevant to the message transfer mechanism.

Like I said, every message is just inserted into some threads message queue. For all practical purposes the message transfer itself is totally connection less. Something that might be worth mentioning, is that Neon provides no mechanism for the receiver to identify sender (beyond "someone with a handle"). This isn't exactly a problem though, because handle's don't grow in trees (no EnumAllObjects-style functions like on Windows..

). So basicly, sending a message is pretty much like calling a method that returns void. The object just happens to be remote. The other party can reply by sending another message, provided it received a suitable handle.

Posted: **Sat Jul 02, 2005 7:26 am**

Hi,

mystran wrote:Like I said, every message is just inserted into some threads message queue. For all practical purposes the message transfer itself is totally connection less. Something that might be worth mentioning, is that Neon provides no mechanism for the receiver to identify sender (beyond "someone with a handle"). This isn't exactly a problem though, because handle's don't grow in trees (no EnumAllObjects-style functions like on Windows.. ). So basicly, sending a message is pretty much like calling a method that returns void. The object just happens to be remote. The other party can reply by sending another message, provided it received a suitable handle.

How about if I tried something like:

Code: Select all

for(;;) {
      for(handle = 0; handle < MAX_HANDLE; handle++) {
         a = rand();
         b = rand();
         c = rand();
         SendMessage(handle, a, b, c);
      }
  }

Would it fail to send any messages?

Cheers,

Brendan

Posted: **Sat Jul 02, 2005 12:14 pm**

Brendan wrote:How about if I tried something like:
Code: Select all
for(;;) {
      for(handle = 0; handle < MAX_HANDLE; handle++) {
         a = rand();
         b = rand();
         c = rand();
         SendMessage(handle, a, b, c);
      }
  }
Would it fail to send any messages?

First of all, you start with "handle = 0", which is the NULL handle, which is not going to work (by design). Second, the value of MAX_HANDLE (if defined) would have to be the value of 0xffffffff. Third, any handle with least-significant-bit clear, will be interpreted as local pointer. Finally, since the remote handles get more or less random numbers, you'll be hitting an awful many invalid handles before you see any real ones.

Oh, and using a totally random method code will cause problems, because the two least-significant-bits of the method code (ATM anyway, this may change) are used to identify the type of the last parameter; if you're sending a random value, then you probably want it sent as an immediate.

Posted: **Sat Jul 02, 2005 2:19 pm**

mystran wrote:
Pype.Clicker wrote: Yep. I had myself the messaging system replaced a few monthes ago (i could point you to my wiki, but it is doomed once again. d*mn') ...
Unless you've come up with something new after the 'events' stuff then I think I more or less know what you have. I've been reading other people's websites every once in a while after all; it's nice to see how others progress.

Yep. That's basically it. However, i've been extending it so that it somehow support "jumbo messages" ... despite it worked almost copyless, it raise a important number of issues about which attachment should actually be kept. Depending on the situation, the "sender" might wish to get rid of its copy of the data once it's sent or to keep it aswell. Same for the receiver: it could "receive and consume" or "receive and store".

I suppose i'll try to make those pages management stuff more explicit in the "streams" approach.

Things are:
- i would like to avoid the need for a "malloc" prior to 'message' reception
- i would like to keep kernel simple and buffers small
- i would like to keep control on what's been exported by process and what should be kept "private".

The stack (for instance) is a wrong place to prepare a message, since it'll be used by plenty other things. Probably what i'll try to have is a sort of "region" where pages are explicitly meant to contain messages (a sort of outbox) and where process write data to be sent (so that we know there's nothing else but messages there) and once the data are ready, they're swapped to the destination address space. The emitting process will automatically receive "fresh" pages instead (if there are still some in the stream's pool of pages).

When a process receives data from a stream, they're initially mapped "somewhere they won't last", but you could then require for a portion of a receive stream to be stored (e.g. move it to some virtual addresses that are part of the "heap" and have unused portions free'd if needed).

Probably my error #1 was to try to make some memory sharing appear as a message exchange. it actually makes little sense: if both parts are willing to keep local copy of a dataset, then it's not a message, it's shared memory. period.

The reason why i'm coming with stream is probably because i also hope it will make a more natural place to store information about temporary mappings, "flying pages" and things alike ... managing it with only ports make things go wild because you can hardly say that a medium-sized message (a deferred page) is bound to a "message port"...

OSDev.org

Neon IPC and why it was ... replaced.

Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.

Re:Neon IPC and why it was ... replaced.