Safe kernel user memory copy

OSwhatever · Post by **OSwhatever** » Tue Oct 25, 2011 9:00 am

My kernel calls aren't safe in any meaning. If you pass a structure to the kernel there is no check that the kernel can read or write to that part. The question arises which is the most convenient way to do this. I've seen a few ways to do this.
Windows way, copy the structure to a known place that the kernel has mapped and the user has no control over.
Copy inside kernel, if you get a page fault you know that it was the user process fault and you kill it. In order to this, the copying must take place so that you can abort without causing locks or incomplete operations inside the kernel.

How did you solve this and what do you think is the most convenient?

rdos · Post by **rdos** » Tue Oct 25, 2011 9:12 am

OSwhatever wrote:My kernel calls aren't safe in any meaning. If you pass a structure to the kernel there is no check that the kernel can read or write to that part. The question arises which is the most convenient way to do this. I've seen a few ways to do this.
Windows way, copy the structure to a known place that the kernel has mapped and the user has no control over.
Copy inside kernel, if you get a page fault you know that it was the user process fault and you kill it. In order to this, the copying must take place so that you can abort without causing locks or incomplete operations inside the kernel.

How did you solve this and what do you think is the most convenient?

The first problem is that applications could forge kernel pointers, and thus overwrite kernel data. I solve this by requiring applications to load all pointers into (segment) registers, and then kernel would treat pointers as 48-bit pointers (even if the come from a flat-mode application). This way an application can never gain access to kernel data-areas. Regarding read/write attributes, I do not check those, rather rely on protection / access validation as application-supplied data is accessed in kernel. Exceptions in kernel is not a problem in my design. They are like exceptions in applications. In the retail distribution, any unhandled exceptions, in applications or kernel lead to automatic reboots. In the debug distribution, exceptions can be inspected in kernel-debugger.

OSwhatever · Post by **OSwhatever** » Tue Oct 25, 2011 9:52 am

rdos wrote:The first problem is that applications could forge kernel pointers, and thus overwrite kernel data. I solve this by requiring applications to load all pointers into (segment) registers, and then kernel would treat pointers as 48-bit pointers (even if the come from a flat-mode application). This way an application can never gain access to kernel data-areas. Regarding read/write attributes, I do not check those, rather rely on protection / access validation as application-supplied data is accessed in kernel. Exceptions in kernel is not a problem in my design. They are like exceptions in applications. In the retail distribution, any unhandled exceptions, in applications or kernel lead to automatic reboots. In the debug distribution, exceptions can be inspected in kernel-debugger.

Can't you do that just by checking that the pointers are within the user part of the virtual memory, that would simply solve the problem with bad pointers pointing into kernel memory?

Brendan · Post by **Brendan** » Tue Oct 25, 2011 10:20 am

Hi,

OSwhatever wrote:How did you solve this and what do you think is the most convenient?

I'd have a "validate_user_buffer(start_address, size)" function that does 3 things:

A simple bounds check - check that "start_address + size" doesn't overflow, and that both ""start_address" and ""start_address + size" are within user space
An initial scan of the paging structures to make sure that all pages can be accessed by the kernel, which also locks pages of RAM (to prevent the OS from sending them to swap after the check but before the kernel has used them). This initial scan would also determine if a secondary scan is needed.
A secondary scan of the paging structures to fetch any pages from disk (swap space or memory mapped files) into RAM and lock them too.

I'd also have a "release_user_buffer(start_address, size)" function which unlocks the pages again after the kernel is finished with them (if they weren't locked before "validate_user_buffer()" was called) so that the OS can send them to swap again afterwards.

Cheers,

Brendan

egos · Post by **egos** » Tue Oct 25, 2011 10:29 am

I use separete entry point for user-space calls. Applications can use only it. When this entry point is used the kernel keeps this state locally, i.e. saves and restores it when recursive calls occur. If this state is active the service routines make additional check for rejecting kernel pointers and read-only regions (if data will be written). Besides they make check for rejecting buffer "holes" and lock the buffer.

OSwhatever · Post by **OSwhatever** » Tue Oct 25, 2011 10:34 am

Brendan wrote:Hi,

I'd have a "validate_user_buffer(start_address, size)" function that does 3 things:

A simple bounds check - check that "start_address + size" doesn't overflow, and that both ""start_address" and ""start_address + size" are within user space

An initial scan of the paging structures to make sure that all pages can be accessed by the kernel, which also locks pages of RAM (to prevent the OS from sending them to swap after the check but before the kernel has used them). This initial scan would also determine if a secondary scan is needed.

A secondary scan of the paging structures to fetch any pages from disk (swap space or memory mapped files) into RAM and lock them too.
I'd also have a "release_user_buffer(start_address, size)" function which unlocks the pages again after the kernel is finished with them (if they weren't locked before "validate_user_buffer()" was called) so that the OS can send them to swap again afterwards.

Cheers,

Brendan

What if you have a zero terminated string as a parameter, then you don't know the size without accessing the actual data.

gerryg400 · Post by **gerryg400** » Tue Oct 25, 2011 10:41 am

OSwhatever wrote:What if you have a zero terminated string as a parameter, then you don't know the size without accessing the actual data.

In that case force the application to pass the string length as a parameter. The strlen will be done in user space.

Also have you considered using some type of fault handling to prevent the need to walk the page tables manually ? There are a couple of ways to do this.

1. When a kernel calls begins you can set a PARAMETER_CHECK flag. If a trap or fault occurs while this flag is set you can potentially return an error immediately to the application to indicate that the system call failed. If it's a page fault on, for example, an uncommitted piece of stack you can commit some pages and try to continue. Once all parameters are verified you can clear the PARAMETER_CHECK flag.

2. Use something like setjmp at the beginning of the system call and the longjmp to recover from the fault. The good thing about this is that you might be able to figure out if you have faulted while holding locks or refcnts on kernel objects and be able to drop them before returning an error code to user space.

Like this

Code: Select all

    lock(object);
    if (setjmp() !=0) {
        unlock(object);
        return ERR;
    }
    copy_from_user();
    unlock(object);
    return OK;

And in the fault handler you would have a longjmp.

egos · Post by **egos** » Tue Oct 25, 2011 11:08 am

Kernel knows max lengths for all strings and buffers to prevent executing multi-gygabyte requests. Additionally for user-space strings kernel can get alternative max length as kernel space base minus string pointer.

OSwhatever · Post by **OSwhatever** » Tue Oct 25, 2011 11:17 am

gerryg400 wrote:In that case force the application to pass the string length as a parameter. The strlen will be done in user space.

Also have you considered using some type of fault handling to prevent the need to walk the page tables manually ? There are a couple of ways to do this.

1. When a kernel calls begins you can set a PARAMETER_CHECK flag. If a trap or fault occurs while this flag is set you can potentially return an error immediately to the application to indicate that the system call failed. If it's a page fault on, for example, an uncommitted piece of stack you can commit some pages and try to continue. Once all parameters are verified you can clear the PARAMETER_CHECK flag.

2. Use something like setjmp at the beginning of the system call and the longjmp to recover from the fault. The good thing about this is that you might be able to figure out if you have faulted while holding locks or refcnts on kernel objects and be able to drop them before returning an error code to user space.

Like this
Code: Select all
    lock(object);
    if (setjmp() !=0) {
        unlock(object);
        return ERR;
    }
    copy_from_user();
    unlock(object);
    return OK;
And in the fault handler you would have a longjmp.

Yes, I have thought about this solution and a few variants. Your SetJmp in this case I assume that you also store the stack pointer as well so that you can find yourself out from kernel again. Otherwise it's quite interesting solution since you must return during a fault in order to unlock the object which is quite consistent I think.

rdos · Post by **rdos** » Tue Oct 25, 2011 2:30 pm

OSwhatever wrote:
rdos wrote:The first problem is that applications could forge kernel pointers, and thus overwrite kernel data. I solve this by requiring applications to load all pointers into (segment) registers, and then kernel would treat pointers as 48-bit pointers (even if the come from a flat-mode application). This way an application can never gain access to kernel data-areas. Regarding read/write attributes, I do not check those, rather rely on protection / access validation as application-supplied data is accessed in kernel. Exceptions in kernel is not a problem in my design. They are like exceptions in applications. In the retail distribution, any unhandled exceptions, in applications or kernel lead to automatic reboots. In the debug distribution, exceptions can be inspected in kernel-debugger.
Can't you do that just by checking that the pointers are within the user part of the virtual memory, that would simply solve the problem with bad pointers pointing into kernel memory?

No. I also support segmented applications (both 16-bit and 32-bit), and the application API can be (and is) used from kernel/device-drivers that use a segmented memory model. It is even so that there are two entry-points for API-functions that have pointers (one for 16-bit offsets and one for 32-bit offsets). They can also (at least theoretically) be called from V86 mode (DOS applications). The API is defined with all pointers being far, either 32-bit or 48-bit. Since the flat selector of an application is not 4G, but rather only the size of application-addressable memory, bad offsets that point to kernel would protection-fault when referenced.

Part of the design-goal in RDOS is that there should be no parameter validation, rather that any needed enforcement of rules should be done in the API itself. That means that all parameters are passed in registers (passing parameters on stack is not supported), and that structures are not allowed in the API (there can be no hidden pointers that can be forged). The only few structures that are allowed are filled-out by kernel. Passing structures from applications to kernel is strictly forbidden. Also, swapping memory to disc is not supported. If physical memory gets exhausted, the filesystem buffers are cleaned up first, and if this in not enough, there will be faults in kernel if there is no physical memory. Therefore, locking parameters is not required.

Add to that the OpenWatcom supports very powerful pragmas to define functions. Here is an example of what it can do:

Code: Select all

#define CallGate_add_wait_for_signal 0x67 0x9a 37 1 0 0 3 0    // automatically generated

void RDOSAPI RdosAddWaitForSignal(int Handle, int SignalHandle, void *ID);

#pragma aux RdosAddWaitForSignal = \
    CallGate_add_wait_for_signal  \
    parm [ebx] [eax] [ecx];

All of it will be inlined at the place of the call.

Another design-feature has to do with error-codes. There are generally no error-codes in the system, but many APIs return with CY set if they fail. Because all calls are defined by register-context, if a particular API-function has no binding in kernel, kernel will simply return to the caller with CY set indicating error. The caller cannot tell if the kernel-build does not support the function, or if it failed for other reasons. It is generally safe to discard error-returns as well, because this would simply give more error-returns as all APIs are designed to work even if previous calls failed (often by using handles). That means that application code does not need to contain massive amounts of error-code checking.

gerryg400 · Post by **gerryg400** » Tue Oct 25, 2011 4:19 pm

OSwhatever wrote:Yes, I have thought about this solution and a few variants. Your SetJmp in this case I assume that you also store the stack pointer as well so that you can find yourself out from kernel again. Otherwise it's quite interesting solution since you must return during a fault in order to unlock the object which is quite consistent I think.

The setjmp is quite standard and does contain %rsp. It saves the registers that need to be preserved across a function call (%rbx, %rsp, %rbp, %r12-%r15 and %rip for x86_64). There is a single jmpbuf per core so the mechanism cannot be nested. Remember that all this is happening on the kernel stack (I only have one kernel stack per core). The user context is safely back on the ring3 stack. There is also an unsetjmp function to turn off the mechanism after the copy to/from user is done.

OSwhatever · Post by **OSwhatever** » Tue Oct 25, 2011 6:03 pm

gerryg400 wrote:The setjmp is quite standard and does contain %rsp. It saves the registers that need to be preserved across a function call (%rbx, %rsp, %rbp, %r12-%r15 and %rip for x86_64). There is a single jmpbuf per core so the mechanism cannot be nested. Remember that all this is happening on the kernel stack (I only have one kernel stack per core). The user context is safely back on the ring3 stack. There is also an unsetjmp function to turn off the mechanism after the copy to/from user is done.

When you are in executing an exception and discover that there was a page fault in a system call parameter, how do you do then? Do you flatten the exception stack (I use ARM which remembers exception mode stack) and jump to setjmp save point or do you go all the way back to the exception entry point and modify the exception stack before the return so that it ends up in setjmp?

gerryg400 · Post by **gerryg400** » Tue Oct 25, 2011 7:26 pm

OSwhatever wrote:
gerryg400 wrote:The setjmp is quite standard and does contain %rsp. It saves the registers that need to be preserved across a function call (%rbx, %rsp, %rbp, %r12-%r15 and %rip for x86_64). There is a single jmpbuf per core so the mechanism cannot be nested. Remember that all this is happening on the kernel stack (I only have one kernel stack per core). The user context is safely back on the ring3 stack. There is also an unsetjmp function to turn off the mechanism after the copy to/from user is done.
When you are in executing an exception and discover that there was a page fault in a system call parameter, how do you do then? Do you flatten the exception stack (I use ARM which remembers exception mode stack) and jump to setjmp save point or do you go all the way back to the exception entry point and modify the exception stack before the return so that it ends up in setjmp?

I just call longjmp(jmpbuf, err); and jump to the setjmp. This basically discards the entire exception stack frame. Later on when I support demand paging I will maybe commit more pages and restart the instruction with iret. There is no need to unwind the exception stack on x86, you can discard it. But remember I handle my exceptions on the same stack that I perform the system call.

OSDev.org

Safe kernel user memory copy

Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy

Re: Safe kernel user memory copy