Kernel requests via page faults

NickJohnson · Post by **NickJohnson** » Tue Sep 14, 2010 2:52 pm

I just got this interesting idea, and wondered how practical it would be. What if, instead of using direct software interrupts, system calls were made by causing page faults at certain addresses? It would almost be like memory mapped I/O to the kernel: instead of calling a function, data could be read and written from the kernel just like addressing an array. For simple system calls which access indexed information, such as calls for memory management, or process metadata access, this would be a very nice interface. If the type of system call were encoded in the frame address bits, and these pseudo-frames were themselves subject to memory management, it would even be possible to pass them around like capabilities, granting other processes the ability to make system calls as if they were the granting process. Even further, the kernel could offer routing of page faults from shared pseudo-frames back to the original process, allowing any process to provide a memory mapped interface to itself. I'm sure there are quite a few interesting things that could be derived from this mechanism.

Anyway, the major hurdle would obviously be deciding to/from which register or memory location the result/argument for the "system call" should be stored/loaded. The pseudo-memory-access has to be transparent, so it could be done from ordinary code. Unless I'm mistaken, the faulting instruction would have to be decoded to figure this information out. How many different instructions does the x86 have to read/write memory?

Are there any other issues with this (other than the fact that it would be easy to accidentally make a series of a hundred system calls)?

NickJohnson · Post by **NickJohnson** » Tue Sep 14, 2010 4:39 pm

Well, I can't imagine it would be much slower than a normal system call, since it's basically the same mechanism (excluding the decoding of the faulting instruction). It would only be useful if you only had to send or receive one register of data at a time, which would still be fine for most system calls about memory and process management; anything more complex would warrant a normal software interrupt.

Brendan · Post by **Brendan** » Tue Sep 14, 2010 9:14 pm

Hi,

NickJohnson wrote:Unless I'm mistaken, the faulting instruction would have to be decoded to figure this information out. How many different instructions does the x86 have to read/write memory?

Surely you'd use CR2 to determine which address was used to cause the fault?

NickJohnson wrote:Well, I can't imagine it would be much slower than a normal system call, since it's basically the same mechanism (excluding the decoding of the faulting instruction).

Compared to SYSCALL and SYSENTER, a software interrupt is slow. An exception would be slower than software interrupts, because you need to figure out whether the cause of the exception was a programming error or was deliberate (e.g. several potential branch mispredictions before you can know it was a kernel API call) and you have to assume it was caused by a programming error until you know it wasn't (e.g. use an interrupt gate rather than a trap gate, store CR2 somewhere in case a second page fault occurs and trashes CR2, etc).

You forgot to say what you thought the advantage might be. Are there any?

NickJohnson wrote:It would only be useful if you only had to send or receive one register of data at a time, which would still be fine for most system calls about memory and process management; anything more complex would warrant a normal software interrupt.

Why? You'd be able to use all registers to pass data to the "syscall" and use all registers to return data from the "syscall".

Cheers,

Brendan

NickJohnson · Post by **NickJohnson** » Tue Sep 14, 2010 9:42 pm

What I meant was not to have a fault at a specific address simply be a way of numbering system calls (just one software interrupt is enough), but instead to be a way of transparently giving and taking data from the program, just like memory mapped I/O. For example, if you tried to load from one of these special addresses into register ECX, the kernel would not only realize that an fault at that address meant a system call, but also would store the result of that system call in ECX without modifying the other registers. That way you could write C code that seems to be simply dereferencing pointers but is actually performing system calls behind the scenes.

I suppose it really doesn't do much other than allow "renumbering" of system calls by moving around pages. However, I still don't think it would be significantly slower than software interrupts (SYSENTER/SYSRET obviously beat it, though), because decisions about which sort of fault it is could be made with a single branch on the page fault error code. I guess it's not so great either way.

(Edit: typo in last sentence)

Candy · Post by **Candy** » Tue Sep 14, 2010 11:23 pm

That allows you to hide syscalls behind memcpy, memset... For non-assembly programmers though, how do you pass arguments to a "syscall" ?

qw · Post by qw » Wed Sep 15, 2010 2:09 am

Interesting idea. I figure it would be something like this:

Code: Select all

#include <special_memory_locations>

int open(const char *name, int flags, ...)
{
    va_list ap;
    mode_t mode;
    int fd;

    va_start(ap, flags);
    mode = va_arg(ap, mode_t);
    va_end(ap);

//  Special memory locations serve as arguments for system "call"

    *ARG1 = name;
    *ARG2 = flags;
    *ARG3 = mode;

//  Extra special memory location causes fault. Handler makes sure that a new file descriptor is returned

    fd = *CALL_OPEN;

//  In case of error, handler sets another special memory location

    if (fd < 0)
        errno = *ERRNO;

    return fd;
}

NickJohnson · Post by **NickJohnson** » Wed Sep 15, 2010 4:18 am

@Hobbes: Yeah, that is the gist of it, but I originally envisioned it for something that needs no arguments/return values other than that single register (so, not as much for fopen(), but your implementation still makes sense).

I was thinking something more like this: you reserve a 4MB area of user memory that pretends to be all of the current process' page table contents laid out sequentially. If the process tries to write to it, the kernel catches it and performs a safe write to the real page tables (a special value could be used to request a frame to be allocated), and if the process tries to read it, only the non-sensitive information would be read unless the process has permissions for the physical address, etc. This would be the whole interface to the memory manager. A similar thing could work for things like process structures or thread structures, allowing the process to think they are modifying them directly, but still keeping things secure.

It could also be used to turn the x86 I/O space into what would appear to be memory mapped ports, although that may be too slow: every port access would cause an interrupt.

Solar · Post by **Solar** » Wed Sep 15, 2010 4:43 am

Possible, yes.

Desirable?

You'd be leaving any abstraction of memory access, thread handling etc. to the application / user space.

Unless you're considering something along an exokernel, that is usually not what you want. But that's just me.

qw · Post by qw » Wed Sep 15, 2010 6:20 am

I agree with Solar. Though I applaud you for thinking outside of the box so thoroughly, you'll loose all abstraction. What if you want to change the mentioned structures in later versions?

Combuster · Post by **Combuster** » Wed Sep 15, 2010 8:41 am

And besides, its generally faster to use a syscall than to handle an exception. (you can also concatenate many changes in one privilege change instead of having a privilege change for each change a series of consecutive accesses...)

qw · Post by qw » Wed Sep 15, 2010 9:22 am

Nevertheless, you could implement it as a "proof of concept" and see how it works out in practice. My admiration guaranteed.

Roel

Owen · Post by **Owen** » Wed Sep 15, 2010 1:42 pm

NickJohnson wrote:I suppose it really doesn't do much other than allow "renumbering" of system calls by moving around pages. However, I still don't think it would be significantly slower than software interrupts (SYSENTER/SYSRET obviously beat it, though), because decisions about which sort of fault it is could be made with a single branch on the page fault error code. I guess it's not so great either way.

The processor detects exceptions much later in instruction processing than it does in the processing of software interrupts, SYSCALL and SYSENTER. Call gates are probably slightly slower these days (because they're somewhat less predictable).

rdos · Post by **rdos** » Fri Sep 24, 2010 1:58 pm

I think the fastest way to do syscalls on x86 is to allocate a callgate with every entrypoint. This will leave all CPU-registers available (no need to use & copy the stack in most (all) cases). It doesn't need to setup function numbers on entry, and it doesn't need decoding functions in the kernel, and eventually to do a call / jmp to the real entrypoint. The only drawback is that GDT selectors are a limited resource.

Owen · Post by **Owen** » Fri Sep 24, 2010 2:53 pm

rdos wrote:I think the fastest way to do syscalls on x86 is to allocate a callgate with every entrypoint. This will leave all CPU-registers available (no need to use & copy the stack in most (all) cases). It doesn't need to setup function numbers on entry, and it doesn't need decoding functions in the kernel, and eventually to do a call / jmp to the real entrypoint. The only drawback is that GDT selectors are a limited resource.

By the time you have even passed through a call gate on a modern processor, you could pretty much have been through syscall/sysret twice.

lemonyii · Post by **lemonyii** » Sun Sep 26, 2010 3:29 am

however, nice idea.
its good for non-assembly programming, and easy to implement. but i didnt consider the speed.
any way, it's just an entrance , it varies from different platforms.
my opinion is, keep the central part of code unchanged, and choose the most practical (fastest, easiest decoding, least exceptions......) entrance on the platform.
and of course, we may have many entrance, but we dont need it i think.

OSDev.org

Kernel requests via page faults

Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults

Re: Kernel requests via page faults