Since I am tinkering with an idea that is related to this topic, I want to contribute a few of my thoughts to this thread. In my operating system, rather than the traditional approach of passing the arguments through the registers and issuing a system call to switch to the kernel, I am experimenting with the approach of using a command buffer that is mapped in userspace to which multiple system calls can be written. For clarity, and as both are used in this idea, I want to distinguish both by using the terms direct system calls and indirect system calls respectively.
The conventional system calls would end up being the indirect system calls that end up being written to the command buffer. The simplistic approach to this would be to write the system call number followed by its arguments in sequential order to the command buffer. Then once all the indirect system calls have been scheduled, the operating system has to process the command buffer and schedule the next task. To indicate that were are done writing to the command buffer (either because the buffer is full or because there simply are no more indirect system calls to schedule), we have to issue a direct system call that tells the operating system that we want to give up our time slice, and optionally mark our task as idle until one of the indirect system calls has completed. Let's say these system calls are bit modelled like the UNIX system calls, where you have open(), read(), write() and close(), then each of these system calls may end up returning a result (e.g. a file descriptor or a status code). To solve this the kernel maps in an area for to tell which system calls have completed in addition to the command buffer.
Unfortunately, if we want to open() a file and then read() the results from it, we end up in the same situation as with synchronous system calls. We first have to issue open(), then wait for it to complete so that we have access to a file descriptor before we can actually read() anything from it. That's because the model is too simplistic: there is currently no way to describe dependencies between the system calls. To solve this issue I want to introduce a set of registers so that the result of each system call can be stored in a register. The format now ends up being something like: the register the system call uses to store the result, the system call number and the arguments (which can either be a register or a constant). Now the aforementioned situation can be written in a single pass as follows:
Code: Select all
%0 = open("foo.txt");
%1 = read(%0, buf, 1024);
Which solves one of the problems, but what if there is a dependency between two system calls where the result of the one isn't used as an argument for the other. For instance, what if we want to open a file, read some data from it and then close it afterwards. The solution would be to extend the command buffer format a little bit so that we can add the dependencies of each system call as follows:
Code: Select all
%0 = open("foo.txt");
%1 = read(%0, buf, 1024);
%2 = close(%0) waits on %1;
Since I am still playing around with the idea, I don't know how well it will work in practice and what kind of problems remain to solve. However, I can already see a few major benefits with this. One of them is that if you care about POSIX compatibility, that it is fairly easy to introduce a compatibility layer that allows you to run a lot of existing applications at the cost of performance that simply issues a single system call and waits for it to complete. Functionality such as epoll() or asynchronous I/O are also easy to implement and as multiple system calls can be issued and managed in userspace, such an implementation would end up with fewer context switches. Another benefit is that these system calls tend to be a lot more portable in the sense that the command buffer can be formatted in a portable fashion using e.g. variable length encoding, rather than having a different ABI per architecture, which easily ends up being a mess (e.g. ptrace() on Linux for 32-bit/64-bit SPARC systems).
But to me one of the major benefits seems to be that this way of scheduling system calls easily allows for the support of green threads as used by programming languages like Erlang, where you essentially end up with a programming model that feels more synchronous and thus more natural to some people as the calls may block within the scope of such a thread. The userspace scheduler is simply a co-operative scheduler that gets called whenever a thread performs a blocking operation, so that it can handle the next completed operation by switching to the appropriate thread (or yield, if all threads are idling). Furthermore, the cost of a context switch in userspace is much cheaper: you essentially end up pushing registers on the stack, switch the stack and pop the registers from the stack.
Neverthless, while the idea does sound promising, I do believe that there is no free lunch: the interface is obviously not as straightforward (it only is in terms of portability) and may end up consuming a lot more resources than synchronous system calls, but I do think that there are many cases where asynchronous system calls shine. Also, on the topic of ptrace(), the idea needs to be worked out a lot more. Supporting something like system call tracing is not as straight-forward with such an interface than when you are using synchronous system calls.
So to answer the OP: yes, I think it can be worthwhile to support something like POSIX through the form of a compatibility layer offered to the applications that require it at the cost of performance, so that you can at least use an existing userspace on your system, but do keep in mind that you probably don't want your native applications to be POSIX-compliant, as that would mean that your microkernel design stricly depends on POSIX and the restrictions and complications it brings with it. To me that would mean that there aren't a whole lot of benefits of using your operating system to any other POSIX-compliant operating system, at least not performance-wise.
Yours sincerely,
Stephan.