Sealed process architecture + paging = hard

Colonel Kernel · Post by **Colonel Kernel** » Thu Jul 27, 2006 7:01 pm

I am currently designing my OS' memory manager and I'm running into a very difficult problem. My OS is a microkernel, and I am trying to design it in accordance with the idea of a [ftp=ftp://ftp.research.microsoft.com/pub/tr/TR-2006-51.pdf]sealed process architecture[/ftp]. Not the fancy kind like Singularity with all its compiler trickery, but rather a more conventional microkernel that relies on hardware protection but also avoids things like shared memory and dynamic linking (I think Brendan's OS, BCOS, follows these principles as well, but I'm not sure).

The problem I'm having is designing an interface to an external pager outside the microkernel. I can't get around the problem that the external pager (and any other process with which it communicates) must be trusted. So far the only trusted part of the system is the kernel itself.

The trust issue exists because a (malicious) pager can manipulate other process' memory however it wants. Even if I use some kind of "handle" to represent each page so that the pager never sees its contents (i.e. -- and there were a system call to do DMA that accepted these "handles"), the pager could easily write it to an area of the disk that another process can read from and write to -- the kernel can't prevent this unless it has its own disk driver, which might as well be in userspace, which might as well be a trusted pager. Argh.

I realize that if a malicious pager can be installed, then there are much bigger problems afoot. However, it would be nice to be able to avoid this scenario altogether.

I get the feeling that this is a really hard problem and that's why Singularity doesn't support paging to disk yet. If you were in my shoes, how would you proceed?

chasetec · Post by **chasetec** » Thu Jul 27, 2006 8:16 pm

I have to ask, why wouldn't your pager either be trusted or part of the microkernel itself? After reading that pdf it seems that the pager would have to be one of those two.

And if you're removing all the compiler trickery and verification and all the other crazy stuff it sounds like the best way to "seal a process" would be to design an OS that provided a new virtual machine per process(which is really funny considering how much they talked bad about JVMs).

As far as the paging to disk being a problem, is that really anything more then a file locking/permission problem? Just have a per process page file(or disk blocks) and implement process level ACLs in your VFS or Disk block provider.

The biggest problem to me would IPC. I mean if you go the heap exchanger route like they talk about that just seems to further suggest your pager should be trusted/part of the kernel. And to me the whole idea of IPC doesn't seem to fit well with process sealing. I mean if it's considered a bad thing to have shared memory shouldn't IPC be considered bad too?

I just read the pdf (need to fix your url btw) so maybe I'm missing something here....

Colonel Kernel · Post by **Colonel Kernel** » Thu Jul 27, 2006 9:43 pm

Chase@OSDev wrote:I have to ask, why wouldn't your pager either be trusted or part of the microkernel itself? After reading that pdf it seems that the pager would have to be one of those two.

This is the crux of my question -- is there a way to make an untrusted external pager? Sort of like how making a type-safe garbage collector is still an open research question.

I can at least imagine how Singularity might use type-safety to prevent the pager from reading or modifying the pages that it's spitting out to disk, but it's non-trivial to prevent it from say, reading in the wrong disk block on a page fault. Whether this should be classified as malicious behaviour or just a really bad bug, I don't know...

And if you're removing all the compiler trickery and verification and all the other crazy stuff it sounds like the best way to "seal a process" would be to design an OS that provided a new virtual machine per process(which is really funny considering how much they talked bad about JVMs).

I'm not using any VM-type stuff, just good ol' separate address spaces and C code. The main invariants of a sealed process architecture (fixed code, state isolation, explicit communication, & closed API) actually don't require anything fancier -- it's just that an implementation based on managed code works out better in practice.

As far as the paging to disk being a problem, is that really anything more then a file locking/permission problem? Just have a per process page file(or disk blocks) and implement process level ACLs in your VFS or Disk block provider.

You've hit something here. Maybe self-paging is part of the answer...

The biggest problem to me would IPC. I mean if you go the heap exchanger route like they talk about that just seems to further suggest your pager should be trusted/part of the kernel.

The Exchange Heap is just an implementation detail of Singularity, not a necessary part of a sealed process architecture. My OS will use synchronous copying IPC with page-table flipping optimizations for page-granularity messages and few other tidbits. The external pager has nothing to do with page-transfer IPC, since the kernel itself is in control of all the page tables.

And to me the whole idea of IPC doesn't seem to fit well with process sealing. I mean if it's considered a bad thing to have shared memory shouldn't IPC be considered bad too?

Without IPC you can't get anything done.

The point of the sealed process architecture is that explicit IPC (i.e. -- point-to-point agreed-upon communication) that follows a protocol is much safer than twiddling bits in shared memory, potentially without the benefit of synchronization or the knowledge of what other processes are doing with that memory.

I just read the pdf (need to fix your url btw) so maybe I'm missing something here....

I fixed the link.

chasetec · Post by **chasetec** » Thu Jul 27, 2006 11:23 pm

What about 2 pagers? One would be the trusted pager that did nothing but physical memory allocation/locking to a process and the other would be the process pager that the application interfaced with. The process pager would interface with the physical memory pager and the VFS/Disk provider to manage the process's memory map. Basically the process pager implements application level malloc and free, the only tricky part would be making sure that the process pager couldn't manipulate the page tables to gain access to other pages. Kinda hard to describe what I'm thinking....does this make any sense? Process pager per process == self paging.

I'm not using any VM-type stuff, just good ol' separate address spaces and C code. The main invariants of a sealed process architecture (fixed code, state isolation, explicit communication, & closed API) actually don't require anything fancier -- it's just that an implementation based on managed code works out better in practice.

I guess I should have actually called it something else. I was thinking more along the lines of Solaris Zones but most people don't know them that well. Might be something like Xen that I'm reaching for here but I'm not that farmilar with Xen.

Without IPC you can't get anything done. The point of the sealed process architecture is that explicit IPC (i.e. -- point-to-point agreed-upon communication) that follows a protocol is much safer than twiddling bits in shared memory, potentially without the benefit of synchronization or the knowledge of what other processes are doing with that memory.

Well there would have to be some type of message passing but what I was trying to say is that should 2 user process ever be able to pass messages to each other? A process would have to pass messages to service providers or whatever you want to call them(trusted code) and provider to provider messages would be required but I think traditional IPC should get thrown out the window to implement process sealing. Even with approved communication channels between processes you'd still be opening up processes to buffer overflows. It'd require disallowing multi-process applications but eveyone knows that multi-threaded apps are the only way to go

Maybe IPC should be Inter Provider Communication

Colonel Kernel · Post by **Colonel Kernel** » Thu Jul 27, 2006 11:39 pm

Chase@OSDev wrote:What about 2 pagers? One would be the trusted pager that did nothing but physical memory allocation/locking to a process and the other would be the process pager that the application interfaced with. The process pager would interface with the physical memory pager and the VFS/Disk provider to manage the process's memory map.

I think the first "pager" you described accurately describes the responsibilities of my in-kernel virtual memory manager. It implements physical memory and virtual address space allocation mechanisms and policies.

What I'm aiming for is sort of the opposite of what many microkernel projects like L4 strive for -- I want VM policy in the kernel, but the mechanism in user-space (instead of the other way around). Basically, the system I'm aiming for looks superficially like Mach's external pager. Its only purpose in life is to service in-page and out-page I/O requests from the kernel.

Basically the process pager implements application level malloc and free, the only tricky part would be making sure that the process pager couldn't manipulate the page tables to gain access to other pages. Kinda hard to describe what I'm thinking....does this make any sense? Process pager per process == self paging.

I don't see what malloc and free have to do with it... I'm thinking along the lines of another server process, not something that lives in each process...

Also, while thinking about the self-paging route I realized that it isn't a good idea to let apps page their own code in and out of memory.

I guess I should have actually called it something else. I was thinking more along the lines of Solaris Zones but most people don't know them that well. Might be something like Xen that I'm reaching for here but I'm not that farmilar with Xen.

Are you suggesting using hardware virtualization...? I've never heard of Solaris zones. I'd just be using separate address spaces as my protection mechanism (i.e. -- each process gets its own page directory).

Well there would have to be some type of message passing but what I was trying to say is that should 2 user process ever be able to pass messages to each other?

Sure, why not? For example, imagine that your web browser is a process, and each plug-in you want to run is a child process of the browser.

A process would have to pass messages to service providers or whatever you want to call them(trusted code)

One of the goals of a sealed architecture should be to minimize trusted code as much as possible. Typical service providers should not be trusted. In terms of my OS, this means they run in separate address spaces in ring 3.

Even with approved communication channels between processes you'd still be opening up processes to buffer overflows.

Good point... I don't see how this can easily be avoided with message-passing based on copying. Every message pass would need to be a "page flipping" operation in order to avoid that scenario, but this means lots of TLB shootdowns... yuck.

I like Singularity's way better, but I don't have time to design my own language and write my own compiler and bytecode verifier.

chasetec · Post by **chasetec** » Fri Jul 28, 2006 1:19 pm

Colonel Kernel wrote: I think the first "pager" you described accurately describes the responsibilities of my in-kernel virtual memory manager. It implements physical memory and virtual address space allocation mechanisms and policies.

What I'm aiming for is sort of the opposite of what many microkernel projects like L4 strive for -- I want VM policy in the kernel, but the mechanism in user-space (instead of the other way around). Basically, the system I'm aiming for looks superficially like Mach's external pager. Its only purpose in life is to service in-page and out-page I/O requests from the kernel.

How would you enforce memory protection if the paging mechanism was in user-space? Unless you cause page faults on all user space memory access....

I don't see what malloc and free have to do with it... I'm thinking along the lines of another server process, not something that lives in each process...

Slightly off topic but, how are you implementing your system calls without shared memory? Ints? Will you have read only kernel segments of some type?

Also, while thinking about the self-paging route I realized that it isn't a good idea to let apps page their own code in and out of memory.

That's what I was trying to say when I was talking about malloc and free. You want allocation and deallocation but not actual page level control from a user space API. But I think you could still have a paging provider/service per process still.

Are you suggesting using hardware virtualization...? I've never heard of Solaris zones. I'd just be using separate address spaces as my protection mechanism (i.e. -- each process gets its own page directory).

Think of it like v86. Most instructions/code execute natively still. When you have 2 process running in Solaris zones there is no way they can do IPC or even know that the other process exists, they even have their own in memory copies of OS processes. It also leverages some feature of FreeBSD jails/chroot environments to restrict FS access. Not saying this fits with what you're working on but just trying to read the intent from that pdf it seems the easiest approach would be to provide a limited virtualized environment per program/service.

Sure, why not? For example, imagine that your web browser is a process, and each plug-in you want to run is a child process of the browser.

But why does a web browser with a plugin have to be a multi-process app? Web browser plugins are usually so tightly integrated with the browser that even if they are in their own address space that when they crash the browser isn't going to continue working. I don't think that process seperation is going to fix this because the plugins require such tight integration. Why can't a plugin just be an extra thread in a browser app?

One of the goals of a sealed architecture should be to minimize trusted code as much as possible. Typical service providers should not be trusted. In terms of my OS, this means they run in separate address spaces in ring 3.

But isn't everything below your HAL going to have to be trusted since you aren't doing the funky verifed stuff? What levels do you have besides trusted and user?

Good point... I don't see how this can easily be avoided with message-passing based on copying. Every message pass would need to be a "page flipping" operation in order to avoid that scenario, but this means lots of TLB shootdowns... yuck. I like Singularity's way better, but I don't have time to design my own language and write my own compiler and bytecode verifier.

Without dropping message passing(between user processes) I don't see a fix either.

Colonel Kernel · Post by **Colonel Kernel** » Fri Jul 28, 2006 6:20 pm

Chase@OSDev wrote:How would you enforce memory protection if the paging mechanism was in user-space? Unless you cause page faults on all user space memory access....

I think you misunderstood me. I'm talking about paging I/O, not page table manipulation. The kernel does page table manipulation, the external pager actually writes evicted dirty pages to disk and reads pages from disk on "hard" page faults. The point of it being an external pager is that if someone were so inclined, they could implement something like distributed virtual memory for example (even though I think this is a kind of a nutty idea myself).

Slightly off topic but, how are you implementing your system calls without shared memory? Ints? Will you have read only kernel segments of some type?

I'm not doing anything exotic, just using the "int" instruction right now. I plan to optimize it by using sysenter/sysexit and friends later on. The kernel is mapped into the top 0.5 GB of each address space in a bunch of supervisor-level pages. This is all really standard stuff -- nothing fancy going on.

That's what I was trying to say when I was talking about malloc and free. You want allocation and deallocation but not actual page level control from a user space API. But I think you could still have a paging provider/service per process still.

Yeah, I think I get it, although it would be crazy to have a dedicated server process for every user process. To me, self-paging means the process itself does its own paging I/O by talking to the file system.

Think of it like v86. Most instructions/code execute natively still. When you have 2 process running in Solaris zones there is no way they can do IPC or even know that the other process exists, they even have their own in memory copies of OS processes. It also leverages some feature of FreeBSD jails/chroot environments to restrict FS access. Not saying this fits with what you're working on but just trying to read the intent from that pdf it seems the easiest approach would be to provide a limited virtualized environment per program/service.

This is isolating groups of processes from each other completely, which is not the point -- you might as well use virtualization for that. The point is to prevent all the evil ways in which malicious and buggy code can harm running processes.

But why does a web browser with a plugin have to be a multi-process app? Web browser plugins are usually so tightly integrated with the browser that even if they are in their own address space that when they crash the browser isn't going to continue working. I don't think that process seperation is going to fix this because the plugins require such tight integration.

I see no reason for Firefox to crash if Flash stops working, or if QuickTime stops working, or if WMP stops working, or if Adblock stops working, etc...

Why can't a plugin just be an extra thread in a browser app?

Because then you'd be dynamically loading foreign and potentially buggy and malicious code into your browser. The whole point of a sealed process architecture is to avoid doing things like this for the sake of dependability.

But isn't everything below your HAL going to have to be trusted since you aren't doing the funky verifed stuff?

Yes.

What levels do you have besides trusted and user?

Well, there are different kinds of "levels"... There are of course, the two hardware "levels" -- kernel & user modes (ring 0 and ring 3 on x86). Only the kernel itself (and its HAL) run in kernel mode, and all processes including device drivers run in user mode. I suppose drivers will need a bit of extra permissions to use any I/O APIs the kernel might expose (I haven't decided how these will work yet -- whether they'll be system calls, or something else). These are the physical levels.

The logical levels are basically trusted and not. Anything that's trusted cannot have bugs, or the system will behave unpredictably and should crash. Trusted code also cannot, ever, be replaced by malicious code without dire consequences. So far, in the absence of external paging, the kernel is trusted and the first running process (yet to be named) is also trusted since it manages all security and namespace concerns.

Without dropping message passing(between user processes) I don't see a fix either.

I thought it through (again) and I see what you mean... As long as there is no bounds checking, a process might read too much data sent by another process into some buffer -- even if that data was delivered by the kernel via page re-mapping. Maybe this totally dependable kind of system is only possible with type-safe languages...

Ryu · Post by **Ryu** » Sun Jul 30, 2006 2:57 pm

What about one page file per process, while this is backed by one server "device paging manager". Since the server is trusted, the kernel can assign a internel handle to each process created. So any paging I/O preformed is isolated per process. But I may be missing the whole point as I've just briefly read through the post..

Colonel Kernel · Post by **Colonel Kernel** » Sun Jul 30, 2006 6:40 pm

I think the point is mainly that I've become too paranoid.

I can't see a way around the need to trust the external pager, but I guess that's ok...

About buffer overruns, I completely forgot about the NX/XD bit.

OSDev.org

Sealed process architecture + paging = hard

Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard

Re:Sealed process architecture + paging = hard