
Has anybody else found a good solution to this?
IIRC, there was a nice discussion around this in relation to the L4/HURD project. They came to the conclusion that the only way to implement a full, secure capability system on top of L4 is to involve a supervisor process, that intercepts _any_ messaging traffic and forwards it to the appropriate receiver. This means that every single message would need two context switches at leastmystran wrote: What this means for the total system performance? If you try to implement a capability system on top of L4, you end up with lots of extra IPC calls. Not surprisingly, if the required number of total IPC calls is taken into account, L4 might not be that fast anymore.
As a side note: I've been thinking of putting some drivers in my microkernel in order to allow it do things like managing a "default swapfile" where it can swap pages if it's out of space, without needing to communicate with userspace pagers. This way the kernel can safely overcommit physical memory up to the size of default swapfile, while still avoiding the need for an OOM-killer.
If two processes decide to share memory they want to cooperate and therefore have to trust each other to some extend. If a task revokes a page that it has given to another task earlier, the latter gets a message that notifies it about this event and can simply stop using this page (the cooperation is finished). In my opinion this is all it can expect as there's no point in trying to force the two tasks to share memory. I also don't see how this should work in any other systen..mystran wrote:For what I know, if you want to let a process share a page with another process, you have two options: either you let the first process map a page to another, and force the second to figure out what to do if the first revokes it, OR you have a common pager for all the parties.
I'm not familiar with Spring, but in my opinion there's a great danger that such a design eventually leads to a monlithic kernel. By including the VMM into the kernel, some of the policy has been taken from the pagers because they now have to use a (still very low-level, I admit..) abstraction rather than the raw hardware. The problem is that once you've started including policy in the kernel it's impossible to stop it. Why shouldn't the VMM use an abstraction that a bit more high-level ? Since it's the same trade-off as before between a possibly (!) higher performance and a clean design that allows the pagers more liberties, the answer also has to be the same if you want to build a consistent system. This is why I believe that the only answer can be to go for the lowest abstraction possible which is the raw hardware...mystran wrote:That need not mean we put the policy in the kernel. Spring is a nice example. There is one VMM per machine which lives in the kernel (AFAIK). There can be any number of pagers. VMM is responsible for dealing with memory (cache objects), and pagers are responsible for dealing with backing store (pager objects). There's then a coherency protocol to make sure the caches and pagers stay in sync.
Hmm, not _only_ IPC. The capabilities, which form the base of EROS' security system, can either be (IIRC) an entrypoint (a channel to transfer IPC messages) or a memory page (that can then be mapped into the address space that holds it). In the original EROS design (which was a somewhat monolythic design anyway), there were also capabilities for devices, since they were managed in the kernel as well.gaf wrote: Although not being familiar with the EROS design at all I can't help the feeling that it uses a drastically different approach to implement security. Am I right that EROS builds its whole security on IPC (checking which app is allowed to send what to whom) ?
The point is, I think, that not every ressource can be multiplexed and secured in the same way. A graphics display can be split, or switched on user input (linux vc's), a storage device can be divided into slices, a network adapter's bandwith can be shared...You got me? So the only sane place where these checks should go is, IMHO, the device drivers, which know the most about the hardware. I have to admit, btw, that I still haven't got how the 'real' exokernels today handle this issue...I think that this would actually be a pretty bad idea as it doesn't differ at all from the traditional way as it's done in unix or windows. In such a system security checks are be spread accross several levels of applications and the complexity easiliy leads to bugs and vulnerabilities.
Exokernels (L4) take a different approach by only making security checks at the lowest layer. Every high level operation will eventually result in an access to the system hardware which is where the exokernel checks if the application is allowed to use the hardware resource.
If I got it right this sounds pretty much like the 'normal' ?kernel design. The problem with it in general is that the almighty kernel of a monolithic operating system was just replaced by a almighty user-space manager. This might make the design a bit more flexible because this user-space manager can be exchanged, but for the application it really doesn't make a big difference as it still has to use the abstractions a higher instance imposes on it.JoeKayzA wrote:But I still believe that these two abstractions (entrypoints and pages) are a relatively good decision for a pure microkernel: A device (storage/network/graphics/whatever) is managed by a usermode device driver. Every process that accesses the device gets it's own set of capabilities to it, and the driver then multiplexes them in a way that is suitable for the device, and for the policy in the system (priorities, quotas and the like).
Yep, that's roughly how it's done: Each device has a driver whose job is to multiplex, that means allowing multiple apps to use the device in a secure way. To do so the device has to be virtualized as you descriped above and although it's not trivial to do so for some devices (e.g character devices) there's always a single solution that makes most sense. The driver then sets up a capability tree and gives its root manager full access to the device. This manager can then either split it's capability and share it among other managers or directly offer it to the applications through a simple interface.JoeKayzA wrote:The point is, I think, that not every ressource can be multiplexed and secured in the same way. A graphics display can be split, or switched on user input (linux vc's), a storage device can be divided into slices, a network adapter's bandwith can be shared...You got me? So the only sane place where these checks should go is, IMHO, the device drivers, which know the most about the hardware.
Whether a kernel is monolithic or not doesn't depend on the privilege level the drivers are using, but on the internal design. It's therefore perfectly possible to write a ?kernel system that runs totaly in kernel-space, although some scripting language might then be necessary to ensure that the modules keep to the rules.JoeKayzA wrote:So when you put all that information about the hardware into the kernel, isn't that a monolythic kernel then???
Why does this really matter so much? Couldn't it be argued that an application that requires that much specialization (e.g. -- an embedded control system) should have a specialized OS (e.g. -- VxWorks, QNX) to run on anyway?gaf wrote: What really matters is how much control the applications have over the system policy.
But how does the exokernel decide what division of resources is fair? Isn't that exactly what policy is? I see a contradiction here...Exokernels therefore try to keep the user-space managers to a minimum that is needed to ensure some system-wide fairness (eg: tasks must be kept from allocating all resources for themselves). The real policy decisions now take place in the apps..
It's not like only highly specialized apps could benefit from being able to choose their policy. The kernel policy means overhead for everybody and I don't think that it's very practical to write a new OS for every single application, especially as this wouldn't really solve the problem either. The idea is therefore to allow applications to decide about their policy according to their own needs as long as no other apps are degraded (system-wide policy in the managers). The exokernel OS (kernel + user-space servers) can therefore be seen as a minimum platform that ensures fairness while the apps (reads: the libs they're linked to) forms a specialized libOS.Colonel Kernel wrote:Why does this really matter so much? Couldn't it be argued that an application that requires that much specialization (e.g. -- an embedded control system) should have a specialized OS (e.g. -- VxWorks, QNX) to run on anyway?
The kernel doesn't decide about fairness, that's the server's job, it only ensures that two applications can share a device without conflicts. In order to find an appropriate way of dividing the device, it's usefull to categorize it first:Colonel Kernel wrote:But how does the exokernel decide what division of resources is fair?
The kernel only exports a mechanism that doesn't include any (significant) policy. Unlike a policy the output of a mechanism only depends on the immediate input and does not include any internal decisions.Colonel Kernel wrote:Isn't that exactly what policy is? I see a contradiction here...
Not every single application needs its own OS, that's the point. This would just be going too far the other way. What I mean is, is it necessary to have this mix of different policies on a single machine, swappable at run-time? In the real world, aren't most application mixes pretty predictable based on the role of the machine (controller, desktop, gaming rig, DB server, router, etc.)?gaf wrote: The kernel policy means overhead for everybody and I don't think that it's very practical to write a new OS for every single application, especially as this wouldn't really solve the problem either.
Ok, got it. The user-space servers provide system-wide fairness.The idea is therefore to allow applications to decide about their policy according to their own needs as long as no other apps are degraded (system-wide policy in the managers). The exokernel OS (kernel + user-space servers) can therefore be seen as a minimum platform that ensures fairness while the apps (reads: the libs they're linked to) forms a specialized libOS.
But the kernel has to assign an initial set of resources to those servers in the first place. If this initial division is static, it makes things awkward... Does this mean that the user-space servers have to quarrel negotiate amongst themselves to re-allocate resources of the same type that they share? Or does there have to be a root server for each type of resource (c.f. sigma0)?The kernel doesn't decide about fairness, that's the server's job,Colonel Kernel wrote:But how does the exokernel decide what division of resources is fair?
I agree that the kernel shouldn't be beholden to user processes to give it memory voluntarily, but I see two problems with this approach.gaf wrote: It therefore in my opinion has the right to forcefully revoke access to pages from user-level pagers. This is straight forward as the kernel only has to pick the page and then do an unmap() on it. The pager(s) or app holding that page will then be informed that they have lost a page and are given the chance to save the contents somewhere.
Assuming that the 'user level pager' is located inside the app (or the libOS) now, I still can't see the benefit of letting it decide which page should be paged out on a low-memory-situation: When a page is less used, it will get paged out first, when it is used often, it won't. But the system must still enforce that a page gets swapped out, otherwise the other apps would be degraded, because they don't get their memory. Or did I get that completely wrong?Using these two mechanims (which are proteted in a real exokernel of course) a pager has full access to the hardware and can decide about the policy to be used freely. Policy decisions include which task gets how much memory and which page should be paged-out if there's a low memory situation.
I start to like that ideaColonel Kernel wrote: I think I have a better solution though -- have the kernel "tax" process memory for thread creation. Let's say that it costs the kernel about 2.1 pages to create a new thread (TCB + kernel stack... not sure if 2.1 pages is a reasonable number, but you get the idea). When a thread is created, the kernel will create it out of whatever memory it owns. If this memory is getting low, it "taxes" the address space that made the create-thread system call by stealing four pages. That covers the cost of the thread, and leaves some leftovers for other things (new mappings, page tables, etc.). That way at least the page stealing (ok... "taxing"
) is fair because the address space/task requesting new kernel resources is the one paying for them (at least during tough times). If we want to make things a bit nicer, we can let the task decide what pages it wants to sacrifice by specifying them in the create-thread system call.
In fact the x86 is meant as a general purpose architecture and you can run a huge variety os apps on it. You've enumerated some of them yourself already and my idea would be to provide programmers with a number of libOS that are specialized for such systems.Colonel Kernel wrote:What I mean is, is it necessary to have this mix of different policies on a single machine, swappable at run-time? In the real world, aren't most application mixes pretty predictable based on the role of the machine (controller, desktop, gaming rig, DB server, router, etc.)?
Why, if not due to its flexibe design, should an exokernel be any faster than a traditional OS ? Security largly depends on the user-space manager design, and if they are monolithic it's not any better either.Colonel Kernel wrote:I think in reality the benefits of exokernels are more in the areas of security and performance rather than flexibility and generality...
Yep, the kernel sets up a capability that spans the whole device and sends it to a root-manager. The root-manager can then split this capability and pass it on to other lower level managers.Colonel Kernel wrote:Does this mean that the user-space servers have to quarrel negotiate amongst themselves to re-allocate resources of the same type that they share? Or does there have to be a root server for each type of resource (c.f. sigma0)?
The pager itself will hardly be more than a few megs in size, paging it out therefore doesn't make any sense. You would just take one of the pages it has passed on to its apps which can then also be dirty..Colonel Kernel wrote:Does this mean that each "paging server" actually has its own pager thread for itself, whose pager thread is in turn in a higher-level "paging server"? The fact that pagers are threads sometimes throws me for a loop...
Wouldn't it be easier if the task had to pay the tay right way ? In my opinion the idea requires too much book-keeping (every time a resource in allocated/delocated) just to make sure that the kernel won't run out of memory, but it should nevertheless work. If you just want to prevent DOS attacks, I'd be more practical to require caps for the creation af tasks etc..Colonel Kernel wrote:I think I have a better solution though -- have the kernel "tax" process memory for thread creation.
Well.. finding the right page is what it's all about and there are many ways of trying to do this (working set, LRU approximation, global policy <-> local policy). Apart from that the pager might also decide which app has to evict a page and can thus, for example, spare an important app. Maybe MM just wan't the best example in the first place as it's quite theoretic..JoeKayzA wrote:Assuming that the 'user level pager' is located inside the app (or the libOS) now, I still can't see the benefit of letting it decide which page should be paged out on a low-memory-situation
There's no fundamental difference between microkernels and exokernels, it's a qualitive matter. In a ?kernel the user-space pager would decide about the whole paging policy itself while in an exokernel the pager only decide about the minimum policy needed and leave the rest to the app.JoeKayzA wrote:Assuming that the user level pager is a process of its own, this still looks like a normal, pure microkernel system to me...maybe we just have a terminology problem?
If you expand the CoyotOS scheme a bit it's roughly how my own scheduler will workJoeKayzA wrote:And, yes, when you put fairness and priority issues into the system again, what piece of 'policy' will still remain for the apps?
There's no fundamental difference between microkernels and exokernels, it's a qualitive matter. In a ?kernel the user-space pager would decide about the whole paging policy itself while in an exokernel the pager only decide about the minimum policy needed and leave the rest to the app.