An idea for implementing system calls and IPC

Gnome · Post by **Gnome** » Sun Aug 22, 2004 5:55 pm

I had an idea when I was writing the VM system for my OS, and that's to use page faults as an IPC and system call mechanism. Here's a paper I wrote up describing the system. I'm curious what other people think of it.

http://flyswatter.dyndns.org/~michael/ipc-pagehack.pdf

I would attach it, but it's not one of the allowed file types, and it's too big.

Pype.Clicker · Post by **Pype.Clicker** » Mon Aug 23, 2004 12:48 am

you might prefer the latex "article" documentclass for such kind of papers. But i'll get a look at it asap.

Candy · Post by **Candy** » Mon Aug 23, 2004 12:56 am

Gnome wrote: I had an idea when I was writing the VM system for my OS, and that's to use page faults as an IPC and system call mechanism. Here's a paper I wrote up describing the system. I'm curious what other people think of it.

http://flyswatter.dyndns.org/~michael/ipc-pagehack.pdf

I would attach it, but it's not one of the allowed file types, and it's too big.

Ehm... you talk about adjusting EIP as being a normal thing to do. It's something very complex and requires good coverage of X86 basic concepts, and is hard to get right, especially counting the additions of codes you don't even know about. It doesn't scale to future processors without work.

Pushing the result(s) to the stack of the calling process is awkward, since you'd need a special wrapper for that kind of access, you're messing with the stack where the C compiler doesn't expect it.

How do you pass arguments to a page fault? I can only think of pushing them on the stack, but that's just as hackish as trying to pop the result off the stack.

Your conveats are caveats.

You clobber the address space (as you say yourself) where you can define your own with for example syscall, using eax as the function number. This allows you 4G functions without clobbering a byte of address space.

The performance hit is more than you seem to assume. It is switching to a full ring0 environment (about 10 cycles I assume), pushing all the registers (you don't know whether it's a syscall or a page fault at that time), differing between page faults and syscalls, handling the syscall (taking an amount of time it would for the syscall), calculating the EIP offset (significant), and returning, again popping a load of registers off the stack. You almost remind me of some people who have a patent on perfect compression, the ability to reduce two bits to one

. They say that you apply a function F to them, after which you end up with 1 bit and some neglible overhead. The overhead of course is at least one bit

.

Getting the exports truly persistent requires a lot of memory, and is likely to be clobbered with old exports when you remove a program. Also, what use is sharing functions between all programs (think namespace collisions) when they don't even want to use each others functions? Say, a shell and a game?

You might also want to check out Pype's (clicker.sf.net) notes on his module system, which should allow the same functionality of dynamic linking. I also have some preliminary thoughts written down in my rants, at http://www.atlantisos.com/index.php?id= ... =2&pagid=4 and http://www.atlantisos.com/index.php?id= ... =2&pagid=3

As a concluding thing, the asynchronous syscalls is a good idea I think. You might want to write a paper on that, since it's a thing I haven't heard about up to now. It is very close to my unreleased tcall API, although tcall is intended for user-level no-wait functions.

Brendan · Post by **Brendan** » Mon Aug 23, 2004 2:12 am

Hi,

I took a look and there's a few things I thought I'd mention..

When the call completes the kernel has to adjust the return address, which would involve checking which instruction was used to cause the fault and figuring out how long that instruction is - "mov byte [eax],0", "mov [0x1234567],ebx", "mov [0x1000000+eax*4+ebx],ebp" or "fild qword [0x1234567]" all have different lengths. Finding the correct length of the instruction would be almost as much work as writing a disassembler, and would add to the time it takes to handle the IPC.

Then there's situations where the software is buggy, messes up a pointer or something and accidentally accesses the wrong address.

To minimize both of the problem above you could restrict the allowable instructions, so that (for e.g.) only "mov byte [0x1234567],0" is allowed to work, and anything else is considered a page fault.

Despite this IMHO it'd be more effecient to use a normal software interrupt instead, where the function to route the call to is transferred in a register (e.g. EAX) instead of using CR2. In this case it'd be almost the same, except the overhead involved with determining if a page fault was a page fault or not (and calculating the size of the instruction used to trigger the page fault) would be gone. Also you could have over 4 billion functions.

Another thing that worries me is whether the called function will always be able to handle the call immediately. If the called function could be called at the wrong time (while it's already in the middle of running) it may need to have re-entrancy locks. In this case the function may already be locked when it's called.

To avoid the re-entrancy issues you could put the call into a buffer or queue, so that the called function does one call at a time. When it completes one call it'd try to get the next call from the buffer/queue, and it could block until another call is made when no calls are in the buffer/queue.

If you did use a software interrupt instead of the page fault handler, and if the kernel buffered/queued the calls to avoid re-entrancy problems, then you'd have a form of synchronious messaging (not unlike other OS's), but you could still have the advantages (export persistance, lazy function binding, etc).

Cheers,

Brendan

Pype.Clicker · Post by **Pype.Clicker** » Mon Aug 23, 2004 2:59 am

Hmm ... sounds nice at first reading.
@candy, concerning the 'advancing to the next operation': it could be easier to deal with considering that there should be only *calls* here (and thus presumabely only a few opcodes like CALL <near, absolute>). Any other opcode can be handled like a 'real' fault as they fail to observe the protocol for IPC invokation.

@gnome: despite the mechanism is interresting, you might wish to reconsider the concept of "exporting functions" to the world. Try to model the use of IPC: you may have process that expose "objects" to the world (like a window, a network connection, a file handle or things alike), but you usually want to avoid a remote process to tell you where in your own memory things stands, which is the great difference between RPC and LPC ...

Solar · Post by **Solar** » Mon Aug 23, 2004 3:45 am

OT warning:

Pype.Clicker wrote: you might prefer the latex "article" documentclass for such kind of papers.

Pype, if that is meant to encourage posting LaTeX source, I'd like to disagree. It might not be "free", but PDF is a very much accepted format for distributing documents, and I very much prefer papers posted in a format that doesn't require further preprocessing to look at. (Which also rules out Postscript source.)

Candy · Post by **Candy** » Mon Aug 23, 2004 3:58 am

Pype.Clicker wrote: @candy, concerning the 'advancing to the next operation': it could be easier to deal with considering that there should be only *calls* here (and thus presumabely only a few opcodes like CALL <near, absolute>). Any other opcode can be handled like a 'real' fault as they fail to observe the protocol for IPC invokation.

hm... that makes this proposition that more interesting... Even though the speed is not the maximum, it's very easy to implement. If you disallow other opcodes, I think you're looking at a simple way to do it.

Pype.Clicker · Post by **Pype.Clicker** » Mon Aug 23, 2004 7:40 am

Solar wrote: ... if that is meant to encourage posting LaTeX source ...

not at all. Posting PDF was a nice move. Using the "book" document class (thus introducing 'chapter X' on a new page while a single \section would have been enough makes the document larger than it needs to be ...

Gnome · Post by **Gnome** » Mon Aug 23, 2004 9:59 am

Wow, I go to sleep for the night and there are 7 replies when I wake up

Here we go:

@candy, concerning the 'advancing to the next operation': it could be easier to deal with considering that there should be only *calls* here (and thus presumabely only a few opcodes like CALL <near, absolute>). Any other opcode can be handled like a 'real' fault as they fail to observe the protocol for IPC invokation.

I was figuring that this would only be valid for CALL instructions, and that data would not be shared between processes. This is for style reasons (violating encapsulation, like exposing a class's data members in an OO language), and for simplicity, because the MOV instruction and all its variants would be nearly impossible to support (as Brendan said)

You clobber the address space (as you say yourself) where you can define your own with for example syscall, using eax as the function number. This allows you 4G functions without clobbering a byte of address space.

You're right, this is a disadvantage of this technique. But, I figure that most programs would only have a few hundred to a few thousand (at most) imports, so this would only take up one or two pages. Thus, it shouldn't be much of a problem.

Using the "book" document class (thus introducing 'chapter X' on a new page while a single \section would have been enough makes the document larger than it needs to be ...

Actually, it's the "report" class, but yeah, I'll switch to "article" in future versions of the document.

Pushing the result(s) to the stack of the calling process is awkward, since you'd need a special wrapper for that kind of access, you're messing with the stack where the C compiler doesn't expect it.

How do you pass arguments to a page fault? I can only think of pushing them on the stack, but that's just as hackish as trying to pop the result off the stack.

How about this: In an export, the function lists how many bytes the arguments take up. This could be calculated from the function definitions by the same tool that writes the imports headers and whatnot. When it's called, the kernel initializes the process' stack, copying that data into the top of the stack. Upon return, the kernel copies the data from the start of the stack up to the callee's %esp (the return value), back to the caller's stack(overwriting the arguments) and increments the caller's %esp accordingly. Hackish, yes, but it should work just fine.

The issue of calling conventions arises here, but as long as the two ends can agree on a convention (or maybe, specify that all functions must use one), and the arguments are passed on the stack (no registers!), this shouldn't a problem.

@gnome: despite the mechanism is interresting, you might wish to reconsider the concept of "exporting functions" to the world. Try to model the use of IPC: you may have process that expose "objects" to the world (like a window, a network connection, a file handle or things alike), but you usually want to avoid a remote process to tell you where in your own memory things stands, which is the great difference between RPC and LPC ...

Hmm... that's a valid point about exporting objects vs. exporting functions.

About the remote process telling you what's in your own memory, I don't follow. The address that you call for any specific function is determined by the offset of that function in the process' import table. In the export table, a process lists the actual address of the function in its own address space, but that is only exposed to the kernel.

Could you clarify?

As a concluding thing, the asynchronous syscalls is a good idea I think. You might want to write a paper on that, since it's a thing I haven't heard about up to now. It is very close to my unreleased tcall API, although tcall is intended for user-level no-wait functions.

;D

Although it's a few years away, I'll have the option of writing a thesis in the 4th year of my CS program, and I might do it on this whole system. Failing that, the asynchronous calls might work too. We'll see what happens.

To avoid the re-entrancy issues you could put the call into a buffer or queue, so that the called function does one call at a time. When it completes one call it'd try to get the next call from the buffer/queue, and it could block until another call is made when no calls are in the buffer/queue.

Good idea. I suppose I could add another flag in the export list to control this behaviour. If a process wants to allow reentrancy, it should be allowed to. By default though, they're non-reentrant.

Thanks for the input everyone

Pype.Clicker · Post by **Pype.Clicker** » Mon Aug 23, 2004 10:36 am

well, let's say you have a process that offers an object abstraction (a window, for instance, so that you can .import window_create, window_close and window_redraw functions)

since the process creates many window, it needs an internal mechanism to know which window is targetted in which function call. The cleanest way you can achieve this is by passing a *handle*, but still it forces your 'server' to check the correct client called with the correct handle (so that you cannot close other app's windows)

see the point ?

concerning arguments, you're facing the same problem as every IPC, and indeed the function name (C++ mangling ?) could encode how arguments are expected (at least if they are strings, structures, atoms or whatever). From there, a clever decoder can copy useful bits into a 'message-passing buffer' of the target space.

Colonel Kernel · Post by **Colonel Kernel** » Mon Aug 23, 2004 12:40 pm

Maybe I'm missing something here, but what problem does this proposed technique solve? I mean, besides being yet another mechanism for system calls and dynamic linking, what does it have that existing practices don't?

Gnome · Post by **Gnome** » Mon Aug 23, 2004 1:58 pm

Pype.Clicker:
Yes, I see what you mean now...

I just thought of a possible solution. I was thinking that the callee needs a way to associate data with a specific caller. Querying some sort of identification (the caller's PID, for example) would be awkward. Instead, if the kernel provided a mechanism for storing arbitrary caller-specific data, such as a list of windows owned by that process... This would likely be referenced by name, number, etc. to get a pointer to the data, possibly in an idiom something like this:

Code: Select all

Handle *callerHandles = getCallerVar("Caller Handles");

or, since string comparisons are slow, something like this:

Code: Select all

const int CALLER_HANDLES = 0;
...
Handle *callerHandles = getCallerVar(CALLER_HANDLES);

getCallerVar() would be a system call exported by the kernel, which would return NULL if no variable has been allocated with that name/number. There would also be a function to allocate and free, caller variables.

After that snippet, you would then search for the handle in the callerHandles array to verify that it actually belongs to the caller.

The pointers returned come from the callee's address space, so the pointers could be written into data structures and be valid even when called by someone else. So, you could put that value, for example, into a queue of events to process later.

Colonel Kernel:
This is simply another way of doing system calls and RPC. The advantage it has is that the calls appear to be normal function calls on both ends. It's not a replacement for dynamic linking in most cases. Possibly, it could make RPC calls a more appealing substitute for dynamic linking in some cases (especially in the case of asynchronous calls), but not replace it.

Pype.Clicker · Post by **Pype.Clicker** » Tue Aug 24, 2004 1:09 am

@colonel kernel: the main difference i can see is that all stub/proxy code is at kernel level. You don't either *need* to know that the function is not local.

Common alternatives consist of building a shared library that will contain stub code for the object (see RPC, COM and even CORBA iirc)

mystran · Post by **mystran** » Tue Aug 24, 2004 1:52 am

Pype.Clicker wrote: Common alternatives consist of building a shared library that will contain stub code for the object (see RPC, COM and even CORBA iirc)

Yep, CORBA needs no kernel support whatsoever beyond TCP/IP if you want to go beyond process boundaries.

I agree that it's better idea to give user-level libraries something general like pipes on top of which they can build whatever they need..

Legend · Post by **Legend** » Tue Aug 24, 2004 1:58 am

And for performance, a CORBA ORB may "speak" other "protocols" next to IIOP, of course a compliant ORB has to be able to use IIOP. You might get some % more speed with other ways then going through a TCP/IP loopback device

I guess using this page fault syscall interface in a micro kernel (where you would do more RPC then in a monolithic environment) would however be some sort of contradicition, as one purpose of a micro kernel would be to push more code to the user level, and stub code should remain then there, too, in my humble opinion!

OSDev.org

An idea for implementing system calls and IPC

An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC

Re:An idea for implementing system calls and IPC