My design is basically a micro-kernel. Right now I have user space working and I am using L4 / QNX style synchronous IPCs. This works well. I do not have I/O or a file system yet, which is what I want to tackle soon. But first, I want to move to asynchronous IPCs because synchronous doesn't scale (SMP). While I am looking at that, I want to see if I it makes sense to unify IPC and async I/O.
- I want async I/O. I am a fan of Windows IOCP and the similar Solaris ports.
- I like registered I/O buffer - the idea being that you share a memory block between your app and the I/O driver to get zero-copy I/O.
- I am not sure I like Linux's io_uring. I see the point but I don't think this would work in a micro-kernel (you'd need to setup ring buffers between each app and each service, doesn't really scale).
Everything I read about IOCP and ports in general seems to suggest that they were added after the fact. And it works well because they can unify unrelated things (files, sockets, timers, signals, etc) through a common interface. This works because these unrelated things all exist in the same context (the kernel) which is not true in a micro-kernel. I mean you could get it to work by having the ports themselves in the kernel or some other service, but that's going to create a lot of IPCs that should be avoidable with a better design.
Any thoughts on this? How would you unify async IPC and async I/O in a microkernel? I am looking for some ideas here... What is your experience here?
Unifying async IPC and I/O?
Re: Unifying async IPC and I/O?
How completion ports work (Windows IOCP / Solaris):
1) You create a port (which is basically a set of handles that you can wait on, handles for files, sockets, timers, etc).
2) You associate handles with the port
3) You use async calls with the handles to do I/O (read/write)
4) You call a function on the port to wait for completion of any I/O
5) Completion (success or error) will be reported to the port and unblock any waiting thread
The main idea here is that you are not polling for I/O readyness, instead you are issuing commands and later checking on their completion.
Some of the primitives needed look very similar to what you would want for an async IPC API:
1) Create port --> IPC open channel / find service
2) Associate handles/events with ports --> no real equivalent?
3) Async I/O on handles --> IPC send()
4) Check on completion --> IPC wait()
And you can imagine that the I/O manager or driver would process I/O requests and reply by posting something to the completion port. In IPC terms, you would have something like an IPC reply() primitive.
So lot of the above seems to map to the IPC API I already have (which is synchronous, but I want to change it to be asynchronous):
1) IPC_Send(payload) --> send a message, do not block, not expecting a reply
2) IPC_Call(payload, receive buffer) --> send a message, block and wait for a reply
3) IPC_Receive(receive buffer) --> block and wait for a message
4) IPC_ReplyAndWait(payload, receive buffer) --> reply to a caller and wait for a message
I am thinking that in a micro-kernel, issuing I/O Read/Write commands are just done using normal IPC messages. You still need file handles to identify files. But there must be a way to get away with not having an explicit IOCP construct in the kernel or anywhere else. Or maybe it is built-in into every thread as an array of pending/completed IPC round-trips. I am not sure.
1) You create a port (which is basically a set of handles that you can wait on, handles for files, sockets, timers, etc).
2) You associate handles with the port
3) You use async calls with the handles to do I/O (read/write)
4) You call a function on the port to wait for completion of any I/O
5) Completion (success or error) will be reported to the port and unblock any waiting thread
The main idea here is that you are not polling for I/O readyness, instead you are issuing commands and later checking on their completion.
Some of the primitives needed look very similar to what you would want for an async IPC API:
1) Create port --> IPC open channel / find service
2) Associate handles/events with ports --> no real equivalent?
3) Async I/O on handles --> IPC send()
4) Check on completion --> IPC wait()
And you can imagine that the I/O manager or driver would process I/O requests and reply by posting something to the completion port. In IPC terms, you would have something like an IPC reply() primitive.
So lot of the above seems to map to the IPC API I already have (which is synchronous, but I want to change it to be asynchronous):
1) IPC_Send(payload) --> send a message, do not block, not expecting a reply
2) IPC_Call(payload, receive buffer) --> send a message, block and wait for a reply
3) IPC_Receive(receive buffer) --> block and wait for a message
4) IPC_ReplyAndWait(payload, receive buffer) --> reply to a caller and wait for a message
I am thinking that in a micro-kernel, issuing I/O Read/Write commands are just done using normal IPC messages. You still need file handles to identify files. But there must be a way to get away with not having an explicit IOCP construct in the kernel or anywhere else. Or maybe it is built-in into every thread as an array of pending/completed IPC round-trips. I am not sure.
Re: Unifying async IPC and I/O?
I've spend a few hours on this tonight and came to the conclusion that IPC and completion ports can't be combined. It doesn't makes sense to try to merge the concepts of IPC endpoints and completion ports. Even though the APIs might look similar, they are orthogonal concepts.
So what I think needs to happen here is basic IPC primitives:
- ipc_create_endpoint()
- ipc_send(endpoint, message, lenMessage) - async call
- ipc_wait(endpoint, buffer, lenBuffer, timeout) - blocking call, open wait
- ipc_receive(endpoint, message, lenMessage, timeout) - blocking call, closed wait
And then have IO port primitives:
- port_create()
- port_wait(port, completion_info*, timeout) - blocking call waiting for completion(s) on the port
- port_post(port, status, completion key) - post completion status to the port
There is no need to associate file (or other) handles with completion ports: where to post the completion status can just be part of the payload of the "read file command".
For example:
- The client would send the FileReadCommand() using ipc_send() to the file server.
- The file server would receive the request using ipc_wait().
- The file server would read data from the file into the specified buffer.
- The file server would then post the result to the completion port (status + completion key) using port_post().
- The client can retrieve the completion status at any timeusing port_wait().
- If the client doesn't care about the completion status, it could just set completionPort to 0.
Anything missing here and/or possible simplifications?
Right now I would need 8 registers for the system call (6 for the structure members + 1 for the endpoint + 1 for the syscall number). It would be nice if this could be reduced as the x86_64 ABI only allows 6 parameters in registers. Perhaps "buffer + lenBuffer" can be replaced to a handle to a registered I/O buffer. Maybe other things can be done as well.
So what I think needs to happen here is basic IPC primitives:
- ipc_create_endpoint()
- ipc_send(endpoint, message, lenMessage) - async call
- ipc_wait(endpoint, buffer, lenBuffer, timeout) - blocking call, open wait
- ipc_receive(endpoint, message, lenMessage, timeout) - blocking call, closed wait
And then have IO port primitives:
- port_create()
- port_wait(port, completion_info*, timeout) - blocking call waiting for completion(s) on the port
- port_post(port, status, completion key) - post completion status to the port
There is no need to associate file (or other) handles with completion ports: where to post the completion status can just be part of the payload of the "read file command".
For example:
Code: Select all
struct FileReadCommand
{
int command;
int file;
int lenBuffer;
void* buffer;
int completionPort;
void* completionKey;
};
- The file server would receive the request using ipc_wait().
- The file server would read data from the file into the specified buffer.
- The file server would then post the result to the completion port (status + completion key) using port_post().
- The client can retrieve the completion status at any timeusing port_wait().
- If the client doesn't care about the completion status, it could just set completionPort to 0.
Anything missing here and/or possible simplifications?
Right now I would need 8 registers for the system call (6 for the structure members + 1 for the endpoint + 1 for the syscall number). It would be nice if this could be reduced as the x86_64 ABI only allows 6 parameters in registers. Perhaps "buffer + lenBuffer" can be replaced to a handle to a registered I/O buffer. Maybe other things can be done as well.
Re: Unifying async IPC and I/O?
Mmm maybe I don't need a completion port. Maybe I can just have the file server send back an IPC to the client's endpoint to indicate completion:
- Client creates an endpoint for completion events
- Client calls the server to read from a file and says it wants replies on the endpoint created above
- Server reads file then post result to the client's endpoint.
- Effectively, the IPC endpoint can be used as a completion port.
What seems important is that the IPC endpoint is separate from threads (i.e. there is no 1:1 relation between them). The same IPC endpoint can be used by multiple threads.
Please help
- Client creates an endpoint for completion events
- Client calls the server to read from a file and says it wants replies on the endpoint created above
- Server reads file then post result to the client's endpoint.
- Effectively, the IPC endpoint can be used as a completion port.
What seems important is that the IPC endpoint is separate from threads (i.e. there is no 1:1 relation between them). The same IPC endpoint can be used by multiple threads.
Please help
Re: Unifying async IPC and I/O?
This needs to be managed by the kernel, so that a client can trust that the completion key is valid and corresponds to a request sent to the server that sent the reply.
In Mach, you have these "port sets", where you can add any receive port to a port set and then specify the port set when receiving, which will receive messages on any port that was added to the port set.
In Windows it's a bit more complicated. An application can associate a completion port with an ALPC port, specifying a completion key. After receiving a completion notification it still needs to do a second call to retrieve the incoming message. However, a server can allocate an user buffer where messages will be delivered as they arrive. Either way, each outstanding message is identified by an ID, and can have a context associated with it by both the client and the server. When replying, one specifies the message by port, message ID and (optionally) sequence number. The receiving party then immediately knows what the message is about by examining the context field. This would correspond to a "message completion key".
In Mach, you have these "port sets", where you can add any receive port to a port set and then specify the port set when receiving, which will receive messages on any port that was added to the port set.
In Windows it's a bit more complicated. An application can associate a completion port with an ALPC port, specifying a completion key. After receiving a completion notification it still needs to do a second call to retrieve the incoming message. However, a server can allocate an user buffer where messages will be delivered as they arrive. Either way, each outstanding message is identified by an ID, and can have a context associated with it by both the client and the server. When replying, one specifies the message by port, message ID and (optionally) sequence number. The receiving party then immediately knows what the message is about by examining the context field. This would correspond to a "message completion key".
Re: Unifying async IPC and I/O?
I've been thinking about using ring buffers between clients and servers, not unlike Linux's io_uring. Basically one would create a communication channel with a server using a system call that would setup two ring buffers: one for sending commands (IPCs) to the server - the command buffer and a second one for storing completions - the completion buffer.
It makes sense to me to have one command buffer per client to send commands to the server... But I am not sure how to handle priorities on the server side: it would have to look at possibly lot of command buffers and know something about client priorities. Having a shared command buffer for all clients seems a very bad idea (security) and would involve locking, which I do not desire for performance reasons (locking on IPCs could defeat the performance gains of using shared memory).
On the client side though, I would rather not have to deal with multiple completion buffers. A client might be interested in doing a blocking wait on pending commands from different servers. I am specifically thinking about completion-port style functionality here... But maybe I am overthinking it and should just ignore what I know of IOCP for now.
It makes sense to me to have one command buffer per client to send commands to the server... But I am not sure how to handle priorities on the server side: it would have to look at possibly lot of command buffers and know something about client priorities. Having a shared command buffer for all clients seems a very bad idea (security) and would involve locking, which I do not desire for performance reasons (locking on IPCs could defeat the performance gains of using shared memory).
On the client side though, I would rather not have to deal with multiple completion buffers. A client might be interested in doing a blocking wait on pending commands from different servers. I am specifically thinking about completion-port style functionality here... But maybe I am overthinking it and should just ignore what I know of IOCP for now.