Minimalist Microkernel Memory Management
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:Minimalist Microkernel Memory Management
I was thinking it over today and realized: If sigma0 hands out all the physical memory to the top-level pager(s) at boot time, that means the kernel must pre-allocate all the physical memory it thinks it will ever need and keep it out of sigma0, right? And since the kernel itself never unmaps any pages (AFAIK), that means it's stuck with the amount of memory it's got... This seems kinda icky to me, but it also seems like a general problem for microkernels. I've read research papers on having user-level pagers manage kernel memory, but they give me nightmares.
Has anybody else found a good solution to this?
Has anybody else found a good solution to this?
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Re:Minimalist Microkernel Memory Management
Hello,
I don't like user-level pagers for kernels either because they somewhat reverse the hierarchical order of the OS. After all the kernel is the most privileged instance and can also be seen as root of all memory as it includes a "memory driver" (grant, map, unmap system).
It therefore in my opinion has the right to forcefully revoke access to pages from user-level pagers. This is straight forward as the kernel only has to pick the page and then do an unmap() on it. The pager(s) or app holding that page will then be informed that they have lost a page and are given the chance to save the contents somewhere.
If the kernel wants to be a bit friendlier it could also send a message to sigma0 stating that it needs a certain amount of pages. Sigma0 can then try to find some solution by communicating with the user-level pagers or, if this doesn't work out, just pick a page and evict it using unmap(). This is possible because it still owns that page, the user-level pagers have only been given the permission to use it..
regards,
gaf
I don't like user-level pagers for kernels either because they somewhat reverse the hierarchical order of the OS. After all the kernel is the most privileged instance and can also be seen as root of all memory as it includes a "memory driver" (grant, map, unmap system).
It therefore in my opinion has the right to forcefully revoke access to pages from user-level pagers. This is straight forward as the kernel only has to pick the page and then do an unmap() on it. The pager(s) or app holding that page will then be informed that they have lost a page and are given the chance to save the contents somewhere.
If the kernel wants to be a bit friendlier it could also send a message to sigma0 stating that it needs a certain amount of pages. Sigma0 can then try to find some solution by communicating with the user-level pagers or, if this doesn't work out, just pick a page and evict it using unmap(). This is possible because it still owns that page, the user-level pagers have only been given the permission to use it..
regards,
gaf
Re:Minimalist Microkernel Memory Management
One interesting point some EROS people (IIRC) made on some mailing list at some point (sorry, I don't remember better) was that L4 might be almost as fast as it can, but a system built on it might still be suboptimal.
Say, L4 doesn't have that good security model for IPC. EROS has a capability system. That means EROS needs several more memory operations on it's IPC fastpath when compared to L4. This will make the total latency of EROS worse. But if you think of it, the capability system is still very little overhead, and futhermore, it can pretty easily be used to emulate any other security policy. Some policies might be harder than others, but most can be mapped on capabilities (they are general enough) just fine.
What this means for the total system performance? If you try to implement a capability system on top of L4, you end up with lots of extra IPC calls. Not surprisingly, if the required number of total IPC calls is taken into account, L4 might not be that fast anymore.
I'm not saying that L4 people are wrong, or that the EROS people are right, but I agree that L4 people seem to be a bit too obsessed with getting their kernel fast, instead of making a kernel that allows a system to be fast. There's a difference.
Now, how this related to memory management? The L4 system might be pretty simple to implement for the kernel, but does it make it easy for easy for the rest of the system? For what I know, if you want to let a process share a page with another process, you have two options: either you let the first process map a page to another, and force the second to figure out what to do if the first revokes it, OR you have a common pager for all the parties.
But wait, now we have a common pager, that we are going to need if we are to share memory. So what reason do we have for not putting that logic into kernel? Suddenly we can also do copy-on-write and stuff like that without any IPC overhead.
That need not mean we put the policy in the kernel. Spring is a nice example. There is one VMM per machine which lives in the kernel (AFAIK). There can be any number of pagers. VMM is responsible for dealing with memory (cache objects), and pagers are responsible for dealing with backing store (pager objects). There's then a coherency protocol to make sure the caches and pagers stay in sync.
Now you have a VMM which can do mappins and sharings and transfers and whatnot, but you can still have your pager on the other side of a network. Since the VMM is just a cache, you could share the same "memory" on several machines as well. And being a cache, the VMM can do many of the normal optimizations, like avoid fetching a recently paged-out page again if the page which contains the relevant data hasn't been used for anything else yet, and the cache hasn't been invalidated.
The morale: things easiest for the kernel aren't necessarily the things easiest for the system. Moving policy from kernel to userspace is a good thing (and the real meat of having a microkernel) but moving everything outside the kernel can be counter productive.
As a side note: I've been thinking of putting some drivers in my microkernel in order to allow it do things like managing a "default swapfile" where it can swap pages if it's out of space, without needing to communicate with userspace pagers. This way the kernel can safely overcommit physical memory up to the size of default swapfile, while still avoiding the need for an OOM-killer.
Say, L4 doesn't have that good security model for IPC. EROS has a capability system. That means EROS needs several more memory operations on it's IPC fastpath when compared to L4. This will make the total latency of EROS worse. But if you think of it, the capability system is still very little overhead, and futhermore, it can pretty easily be used to emulate any other security policy. Some policies might be harder than others, but most can be mapped on capabilities (they are general enough) just fine.
What this means for the total system performance? If you try to implement a capability system on top of L4, you end up with lots of extra IPC calls. Not surprisingly, if the required number of total IPC calls is taken into account, L4 might not be that fast anymore.
I'm not saying that L4 people are wrong, or that the EROS people are right, but I agree that L4 people seem to be a bit too obsessed with getting their kernel fast, instead of making a kernel that allows a system to be fast. There's a difference.
Now, how this related to memory management? The L4 system might be pretty simple to implement for the kernel, but does it make it easy for easy for the rest of the system? For what I know, if you want to let a process share a page with another process, you have two options: either you let the first process map a page to another, and force the second to figure out what to do if the first revokes it, OR you have a common pager for all the parties.
But wait, now we have a common pager, that we are going to need if we are to share memory. So what reason do we have for not putting that logic into kernel? Suddenly we can also do copy-on-write and stuff like that without any IPC overhead.
That need not mean we put the policy in the kernel. Spring is a nice example. There is one VMM per machine which lives in the kernel (AFAIK). There can be any number of pagers. VMM is responsible for dealing with memory (cache objects), and pagers are responsible for dealing with backing store (pager objects). There's then a coherency protocol to make sure the caches and pagers stay in sync.
Now you have a VMM which can do mappins and sharings and transfers and whatnot, but you can still have your pager on the other side of a network. Since the VMM is just a cache, you could share the same "memory" on several machines as well. And being a cache, the VMM can do many of the normal optimizations, like avoid fetching a recently paged-out page again if the page which contains the relevant data hasn't been used for anything else yet, and the cache hasn't been invalidated.
The morale: things easiest for the kernel aren't necessarily the things easiest for the system. Moving policy from kernel to userspace is a good thing (and the real meat of having a microkernel) but moving everything outside the kernel can be counter productive.
As a side note: I've been thinking of putting some drivers in my microkernel in order to allow it do things like managing a "default swapfile" where it can swap pages if it's out of space, without needing to communicate with userspace pagers. This way the kernel can safely overcommit physical memory up to the size of default swapfile, while still avoiding the need for an OOM-killer.
Re:Minimalist Microkernel Memory Management
IIRC, there was a nice discussion around this in relation to the L4/HURD project. They came to the conclusion that the only way to implement a full, secure capability system on top of L4 is to involve a supervisor process, that intercepts _any_ messaging traffic and forwards it to the appropriate receiver. This means that every single message would need two context switches at least . That's why I'm looking forward to CoyotOS (which is the successor to EROS, btw.), since it is a pure microkernel, but still features a very powerful yet simple capability system.mystran wrote: What this means for the total system performance? If you try to implement a capability system on top of L4, you end up with lots of extra IPC calls. Not surprisingly, if the required number of total IPC calls is taken into account, L4 might not be that fast anymore.
As a side note: I've been thinking of putting some drivers in my microkernel in order to allow it do things like managing a "default swapfile" where it can swap pages if it's out of space, without needing to communicate with userspace pagers. This way the kernel can safely overcommit physical memory up to the size of default swapfile, while still avoiding the need for an OOM-killer.
What's the problem with involving a userspace swap file manager, provided it is trusted and it doesn't request ressources just to satisfy a request (risking a deadlock)? To handle a swapfile would also mean to put disk, (maybe filesystem) or network drivers into the kernel also, otherwise the benefit would be gone anyway (still would have to go into userspace). And once you've got those drivers in kernel space, why shouldn't go any storage related thing into the kernel too? (consistency) And then, where's the microkernel gone???
cheers Joe
Re:Minimalist Microkernel Memory Management
Hello,
Although not being familiar with the EROS design at all I can't help the feeling that it uses a drastically different approach to implement security. Am I right that EROS builds its whole security on IPC (checking which app is allowed to send what to whom) ?
I think that this would actually be a pretty bad idea as it doesn't differ at all from the traditional way as it's done in unix or windows. In such a system security checks are be spread accross several levels of applications and the complexity easiliy leads to bugs and vulnerabilities.
Exokernels (L4) take a different approach by only making security checks at the lowest layer. Every high level operation will eventually result in an access to the system hardware which is where the exokernel checks if the application is allowed to use the hardware resource.
L4 doesn't have a uniform security mechanism for all hardware resources which is certainly one of its downsides, but this doesn't mean that it isn't secure. All the primary resource (memory + I/O, IRQs, CPU-time) require a permission to be accessed. For the CPU-time the access controll system is pretty crude as schedlers are simply privileged and therefore allowed to start any number of tasks, but for the other two resource a rather elaborated system is used that can be compared to hierarchical capabilities.
For L4 IPC simply means sending some bytes to another app and doesn't imply and security decisions. Since IPC is an eternal resource and one can't do much harm with it other than spamming apps, it's not considered worth a proper protection although there are some bits that define whether a task is allowed to send IPC messages and to whom (only to threads in the same context or globally).
For some more detailed information about L4's security system, please refere to these documents:
- Why protection on resource level ? Exokernel)
- IRQ protection: Omega0 (L4)
- Memory and I/O Protection (L4)
regards,
gaf
Although not being familiar with the EROS design at all I can't help the feeling that it uses a drastically different approach to implement security. Am I right that EROS builds its whole security on IPC (checking which app is allowed to send what to whom) ?
I think that this would actually be a pretty bad idea as it doesn't differ at all from the traditional way as it's done in unix or windows. In such a system security checks are be spread accross several levels of applications and the complexity easiliy leads to bugs and vulnerabilities.
Exokernels (L4) take a different approach by only making security checks at the lowest layer. Every high level operation will eventually result in an access to the system hardware which is where the exokernel checks if the application is allowed to use the hardware resource.
L4 doesn't have a uniform security mechanism for all hardware resources which is certainly one of its downsides, but this doesn't mean that it isn't secure. All the primary resource (memory + I/O, IRQs, CPU-time) require a permission to be accessed. For the CPU-time the access controll system is pretty crude as schedlers are simply privileged and therefore allowed to start any number of tasks, but for the other two resource a rather elaborated system is used that can be compared to hierarchical capabilities.
For L4 IPC simply means sending some bytes to another app and doesn't imply and security decisions. Since IPC is an eternal resource and one can't do much harm with it other than spamming apps, it's not considered worth a proper protection although there are some bits that define whether a task is allowed to send IPC messages and to whom (only to threads in the same context or globally).
For some more detailed information about L4's security system, please refere to these documents:
- Why protection on resource level ? Exokernel)
- IRQ protection: Omega0 (L4)
- Memory and I/O Protection (L4)
If two processes decide to share memory they want to cooperate and therefore have to trust each other to some extend. If a task revokes a page that it has given to another task earlier, the latter gets a message that notifies it about this event and can simply stop using this page (the cooperation is finished). In my opinion this is all it can expect as there's no point in trying to force the two tasks to share memory. I also don't see how this should work in any other systen..mystran wrote:For what I know, if you want to let a process share a page with another process, you have two options: either you let the first process map a page to another, and force the second to figure out what to do if the first revokes it, OR you have a common pager for all the parties.
I'm not familiar with Spring, but in my opinion there's a great danger that such a design eventually leads to a monlithic kernel. By including the VMM into the kernel, some of the policy has been taken from the pagers because they now have to use a (still very low-level, I admit..) abstraction rather than the raw hardware. The problem is that once you've started including policy in the kernel it's impossible to stop it. Why shouldn't the VMM use an abstraction that a bit more high-level ? Since it's the same trade-off as before between a possibly (!) higher performance and a clean design that allows the pagers more liberties, the answer also has to be the same if you want to build a consistent system. This is why I believe that the only answer can be to go for the lowest abstraction possible which is the raw hardware...mystran wrote:That need not mean we put the policy in the kernel. Spring is a nice example. There is one VMM per machine which lives in the kernel (AFAIK). There can be any number of pagers. VMM is responsible for dealing with memory (cache objects), and pagers are responsible for dealing with backing store (pager objects). There's then a coherency protocol to make sure the caches and pagers stay in sync.
regards,
gaf
Re:Minimalist Microkernel Memory Management
Hi,
But I still believe that these two abstractions (entrypoints and pages) are a relatively good decision for a pure microkernel: A device (storage/network/graphics/whatever) is managed by a usermode device driver. Every process that accesses the device gets it's own set of capabilities to it, and the driver then multiplexes them in a way that is suitable for the device, and for the policy in the system (priorities, quotas and the like).
So when you put all that information about the hardware into the kernel, isn't that a monolythic kernel then???
Just my 2 cents, no offense
cheers Joe
Hmm, not _only_ IPC. The capabilities, which form the base of EROS' security system, can either be (IIRC) an entrypoint (a channel to transfer IPC messages) or a memory page (that can then be mapped into the address space that holds it). In the original EROS design (which was a somewhat monolythic design anyway), there were also capabilities for devices, since they were managed in the kernel as well.gaf wrote: Although not being familiar with the EROS design at all I can't help the feeling that it uses a drastically different approach to implement security. Am I right that EROS builds its whole security on IPC (checking which app is allowed to send what to whom) ?
But I still believe that these two abstractions (entrypoints and pages) are a relatively good decision for a pure microkernel: A device (storage/network/graphics/whatever) is managed by a usermode device driver. Every process that accesses the device gets it's own set of capabilities to it, and the driver then multiplexes them in a way that is suitable for the device, and for the policy in the system (priorities, quotas and the like).
The point is, I think, that not every ressource can be multiplexed and secured in the same way. A graphics display can be split, or switched on user input (linux vc's), a storage device can be divided into slices, a network adapter's bandwith can be shared...You got me? So the only sane place where these checks should go is, IMHO, the device drivers, which know the most about the hardware. I have to admit, btw, that I still haven't got how the 'real' exokernels today handle this issue...I think that this would actually be a pretty bad idea as it doesn't differ at all from the traditional way as it's done in unix or windows. In such a system security checks are be spread accross several levels of applications and the complexity easiliy leads to bugs and vulnerabilities.
Exokernels (L4) take a different approach by only making security checks at the lowest layer. Every high level operation will eventually result in an access to the system hardware which is where the exokernel checks if the application is allowed to use the hardware resource.
So when you put all that information about the hardware into the kernel, isn't that a monolythic kernel then???
Just my 2 cents, no offense
cheers Joe
Re:Minimalist Microkernel Memory Management
If I got it right this sounds pretty much like the 'normal' ?kernel design. The problem with it in general is that the almighty kernel of a monolithic operating system was just replaced by a almighty user-space manager. This might make the design a bit more flexible because this user-space manager can be exchanged, but for the application it really doesn't make a big difference as it still has to use the abstractions a higher instance imposes on it.JoeKayzA wrote:But I still believe that these two abstractions (entrypoints and pages) are a relatively good decision for a pure microkernel: A device (storage/network/graphics/whatever) is managed by a usermode device driver. Every process that accesses the device gets it's own set of capabilities to it, and the driver then multiplexes them in a way that is suitable for the device, and for the policy in the system (priorities, quotas and the like).
Yep, that's roughly how it's done: Each device has a driver whose job is to multiplex, that means allowing multiple apps to use the device in a secure way. To do so the device has to be virtualized as you descriped above and although it's not trivial to do so for some devices (e.g character devices) there's always a single solution that makes most sense. The driver then sets up a capability tree and gives its root manager full access to the device. This manager can then either split it's capability and share it among other managers or directly offer it to the applications through a simple interface.JoeKayzA wrote:The point is, I think, that not every ressource can be multiplexed and secured in the same way. A graphics display can be split, or switched on user input (linux vc's), a storage device can be divided into slices, a network adapter's bandwith can be shared...You got me? So the only sane place where these checks should go is, IMHO, the device drivers, which know the most about the hardware.
Applications can then aquire a capability for a certain part of the device (eg: page-frame 0x100-0x120; pixel-window L100, T100, R200, B200) from one of the managers and use it to directly access the hardware through the device driver where a single access-check takes place.
Whether a kernel is monolithic or not doesn't depend on the privilege level the drivers are using, but on the internal design. It's therefore perfectly possible to write a ?kernel system that runs totaly in kernel-space, although some scripting language might then be necessary to ensure that the modules keep to the rules.JoeKayzA wrote:So when you put all that information about the hardware into the kernel, isn't that a monolythic kernel then???
What really matters is how much control the applications have over the system policy. While they can only slightly influence it in a monolithic kernel (parameters in a system-call), ?kernels allow user-space managers decide about it freely. Unfortunately this doesn't automatically mean that the apps can choose their favourite policy as the user-space mangers might be monolithic themselves. Exokernels therefore try to keep the user-space managers to a minimum that is needed to ensure some system-wide fairness (eg: tasks must be kept from allocating all resources for themselves). The real policy decisions now take place in the apps..
regards,
gaf
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:Minimalist Microkernel Memory Management
Why does this really matter so much? Couldn't it be argued that an application that requires that much specialization (e.g. -- an embedded control system) should have a specialized OS (e.g. -- VxWorks, QNX) to run on anyway?gaf wrote: What really matters is how much control the applications have over the system policy.
But how does the exokernel decide what division of resources is fair? Isn't that exactly what policy is? I see a contradiction here...Exokernels therefore try to keep the user-space managers to a minimum that is needed to ensure some system-wide fairness (eg: tasks must be kept from allocating all resources for themselves). The real policy decisions now take place in the apps..
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Re:Minimalist Microkernel Memory Management
It's not like only highly specialized apps could benefit from being able to choose their policy. The kernel policy means overhead for everybody and I don't think that it's very practical to write a new OS for every single application, especially as this wouldn't really solve the problem either. The idea is therefore to allow applications to decide about their policy according to their own needs as long as no other apps are degraded (system-wide policy in the managers). The exokernel OS (kernel + user-space servers) can therefore be seen as a minimum platform that ensures fairness while the apps (reads: the libs they're linked to) forms a specialized libOS.Colonel Kernel wrote:Why does this really matter so much? Couldn't it be argued that an application that requires that much specialization (e.g. -- an embedded control system) should have a specialized OS (e.g. -- VxWorks, QNX) to run on anyway?
The kernel doesn't decide about fairness, that's the server's job, it only ensures that two applications can share a device without conflicts. In order to find an appropriate way of dividing the device, it's usefull to categorize it first:Colonel Kernel wrote:But how does the exokernel decide what division of resources is fair?
- Block Device (mem, disk, 2D video)
- Character Device - Input (kdb, mouse, ethernet-in)
- Character Device - Output (cpu, ethernet-out, printer)
I assume that you're familiar with this scheme, and although it's not perfect as the borders somewhat blur sometimes, it's still very usefull.
- Blockdevices are the easiest to multiplex because they can simply be devided into pages/sectors that are protected by capabilites. In order to reduce book-keeping it's usefull to allow several of these pages to be combined to extends which are then called "flex-pages" for memory and "files" for hard-disks.
- For character input devices the main question is for which application the input is intended. This problem can in general be solved using filtering techniques that allow applications us to define which input they want to get. Such techniques are nothing new to exokernels and have been used successfull for ethernet adapters in FreeBSD for many years. If the protocol doesn't allow this (mouse, partly kbd) a central server has to be used that then decides which app is meant.
- When multiplexing a character output devices the machanism has to allow servers to define how much bandwidth an app may consume and how long a burst may be and thus decide about throughput and latency.
The kernel only exports a mechanism that doesn't include any (significant) policy. Unlike a policy the output of a mechanism only depends on the immediate input and does not include any internal decisions.Colonel Kernel wrote:Isn't that exactly what policy is? I see a contradiction here...
Let's use (physical) memory management as an example:
The exokernel would in this case export an interface that allows applications (here the pagers) to allocate flexpages which are chunks of memory defined by offset and lenght. Apart from that a second mechanism is needed to allow the pagers to have a look at the dirty and modified bits of their pages as most (not all!) pages base their decision on this data.
[pre]flex AllocateFlexPage(uint base, uint size, uint read_write);
bool GetAccessData(flex my_page, uint* copy_here);[/pre]
Using these two mechanims (which are proteted in a real exokernel of course) a pager has full access to the hardware and can decide about the policy to be used freely. Policy decisions include which task gets how much memory and which page should be paged-out if there's a low memory situation.
regards,
gaf
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:Minimalist Microkernel Memory Management
Not every single application needs its own OS, that's the point. This would just be going too far the other way. What I mean is, is it necessary to have this mix of different policies on a single machine, swappable at run-time? In the real world, aren't most application mixes pretty predictable based on the role of the machine (controller, desktop, gaming rig, DB server, router, etc.)?gaf wrote: The kernel policy means overhead for everybody and I don't think that it's very practical to write a new OS for every single application, especially as this wouldn't really solve the problem either.
If I'm a Large Software Company selling an OS, and I want that OS to be useful for these different mixes of apps, is it not sufficient (and more cost-effective) to just have a modular architecture that can have policies changed around at compile time (or build-time if I'm just selecting different user-space managers to include with a shipping product)?
I think in reality the benefits of exokernels are more in the areas of security and performance rather than flexibility and generality... I know flexibility makes us feel warm and fuzzy sometimes, but it's often not as necessary as we convince ourselves it is. Also, it's usually a lot more expensive ($$).
Ok, got it. The user-space servers provide system-wide fairness.The idea is therefore to allow applications to decide about their policy according to their own needs as long as no other apps are degraded (system-wide policy in the managers). The exokernel OS (kernel + user-space servers) can therefore be seen as a minimum platform that ensures fairness while the apps (reads: the libs they're linked to) forms a specialized libOS.
But the kernel has to assign an initial set of resources to those servers in the first place. If this initial division is static, it makes things awkward... Does this mean that the user-space servers have to quarrel negotiate amongst themselves to re-allocate resources of the same type that they share? Or does there have to be a root server for each type of resource (c.f. sigma0)?The kernel doesn't decide about fairness, that's the server's job,Colonel Kernel wrote:But how does the exokernel decide what division of resources is fair?
...continued...
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:Minimalist Microkernel Memory Management
...continued...
On a related topic -- kernel memory management. I've been thinking about this problem for a while. Your original suggestion was to let the kernel unmap pages, and give the pagers a chance to save any contents of dirty pages (emphasis mine):
The second (and smaller) problem first: What happens if the page the kernel chooses to unmap is dirty and belongs to a server that lives right under sigma0? sigma0 will be unable to save any contents anywhere, so that implies that the server itself must save its own page. Does this mean that each "paging server" actually has its own pager thread for itself, whose pager thread is in turn in a higher-level "paging server"? The fact that pagers are threads sometimes throws me for a loop...
Anyway, I think I have a way for the kernel to steal pages without the need to notify anyone -- it just steals pages that aren't dirty. Since the kernel can see the page tables, it should know which pages are dirty and which aren't. I can think of a really pathological case where there are no clean pages left in the system, but this seems pretty unlikely to me. At the very least, servers and apps should be smart enough to keep the pages containing their code read-only...
Now, back to the first problem -- how does the kernel decide which pages to steal? This is a policy decision, which I guess is the reason that papers have been written about user-level pagers for the kernel. If it steals them randomly, that might be ok, as long as it doesn't happen often. This makes me uneasy though... I can't help but feel there would be unintended consequences.
I think I have a better solution though -- have the kernel "tax" process memory for thread creation. Let's say that it costs the kernel about 2.1 pages to create a new thread (TCB + kernel stack... not sure if 2.1 pages is a reasonable number, but you get the idea ). When a thread is created, the kernel will create it out of whatever memory it owns. If this memory is getting low, it "taxes" the address space that made the create-thread system call by stealing four pages. That covers the cost of the thread, and leaves some leftovers for other things (new mappings, page tables, etc.). That way at least the page stealing (ok... "taxing" ) is fair because the address space/task requesting new kernel resources is the one paying for them (at least during tough times). If we want to make things a bit nicer, we can let the task decide what pages it wants to sacrifice by specifying them in the create-thread system call.
Just like real taxation , this is converting resources into services. Also, the kernel (like govt.) is taking more than it needs "just in case". As long as there are no kickbacks going on...
What do you think? Is this workable?
On a related topic -- kernel memory management. I've been thinking about this problem for a while. Your original suggestion was to let the kernel unmap pages, and give the pagers a chance to save any contents of dirty pages (emphasis mine):
I agree that the kernel shouldn't be beholden to user processes to give it memory voluntarily, but I see two problems with this approach.gaf wrote: It therefore in my opinion has the right to forcefully revoke access to pages from user-level pagers. This is straight forward as the kernel only has to pick the page and then do an unmap() on it. The pager(s) or app holding that page will then be informed that they have lost a page and are given the chance to save the contents somewhere.
The second (and smaller) problem first: What happens if the page the kernel chooses to unmap is dirty and belongs to a server that lives right under sigma0? sigma0 will be unable to save any contents anywhere, so that implies that the server itself must save its own page. Does this mean that each "paging server" actually has its own pager thread for itself, whose pager thread is in turn in a higher-level "paging server"? The fact that pagers are threads sometimes throws me for a loop...
Anyway, I think I have a way for the kernel to steal pages without the need to notify anyone -- it just steals pages that aren't dirty. Since the kernel can see the page tables, it should know which pages are dirty and which aren't. I can think of a really pathological case where there are no clean pages left in the system, but this seems pretty unlikely to me. At the very least, servers and apps should be smart enough to keep the pages containing their code read-only...
Now, back to the first problem -- how does the kernel decide which pages to steal? This is a policy decision, which I guess is the reason that papers have been written about user-level pagers for the kernel. If it steals them randomly, that might be ok, as long as it doesn't happen often. This makes me uneasy though... I can't help but feel there would be unintended consequences.
I think I have a better solution though -- have the kernel "tax" process memory for thread creation. Let's say that it costs the kernel about 2.1 pages to create a new thread (TCB + kernel stack... not sure if 2.1 pages is a reasonable number, but you get the idea ). When a thread is created, the kernel will create it out of whatever memory it owns. If this memory is getting low, it "taxes" the address space that made the create-thread system call by stealing four pages. That covers the cost of the thread, and leaves some leftovers for other things (new mappings, page tables, etc.). That way at least the page stealing (ok... "taxing" ) is fair because the address space/task requesting new kernel resources is the one paying for them (at least during tough times). If we want to make things a bit nicer, we can let the task decide what pages it wants to sacrifice by specifying them in the create-thread system call.
Just like real taxation , this is converting resources into services. Also, the kernel (like govt.) is taking more than it needs "just in case". As long as there are no kickbacks going on...
What do you think? Is this workable?
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Re:Minimalist Microkernel Memory Management
Assuming that the 'user level pager' is located inside the app (or the libOS) now, I still can't see the benefit of letting it decide which page should be paged out on a low-memory-situation: When a page is less used, it will get paged out first, when it is used often, it won't. But the system must still enforce that a page gets swapped out, otherwise the other apps would be degraded, because they don't get their memory. Or did I get that completely wrong?Using these two mechanims (which are proteted in a real exokernel of course) a pager has full access to the hardware and can decide about the policy to be used freely. Policy decisions include which task gets how much memory and which page should be paged-out if there's a low memory situation.
Assuming that the user level pager is a process of its own, this still looks like a normal, pure microkernel system to me...maybe we just have a terminology problem?
And, yes, when you put fairness and priority issues into the system again, what piece of 'policy' will still remain for the apps? When you take a look at how CoyotOS dispatches CPU time, every process gets its own 'schedules' (a capability to some execution time), that it can use to let its code be executed. Its then up to the process to decide what to do with this ressource. When it creates other (child) processes, it has to share its schedules with them, in order to give them a chance to get executed, but no matter how many child processes it creates, they can not consume more execution time than the original process had at disposal with its schedules. Is this somehow what 'let the app decide about policy' means?
cheers Joe
Re:Minimalist Microkernel Memory Management
I start to like that idea . It also seems to solve another problem: A malicios process cannot impair the system by, say, creating tons of threads and processes (and letting the system run out of memory for the data structures these objects need), since it has to 'pay' the memory itself. The only thing that can happen is that it runs out of memory, but this won't hurt the rest of the system. You could do the same with CPU-time (have a look at my previous post).Colonel Kernel wrote: I think I have a better solution though -- have the kernel "tax" process memory for thread creation. Let's say that it costs the kernel about 2.1 pages to create a new thread (TCB + kernel stack... not sure if 2.1 pages is a reasonable number, but you get the idea ). When a thread is created, the kernel will create it out of whatever memory it owns. If this memory is getting low, it "taxes" the address space that made the create-thread system call by stealing four pages. That covers the cost of the thread, and leaves some leftovers for other things (new mappings, page tables, etc.). That way at least the page stealing (ok... "taxing" ) is fair because the address space/task requesting new kernel resources is the one paying for them (at least during tough times). If we want to make things a bit nicer, we can let the task decide what pages it wants to sacrifice by specifying them in the create-thread system call.
cheers Joe
Re:Minimalist Microkernel Memory Management
In fact the x86 is meant as a general purpose architecture and you can run a huge variety os apps on it. You've enumerated some of them yourself already and my idea would be to provide programmers with a number of libOS that are specialized for such systems.Colonel Kernel wrote:What I mean is, is it necessary to have this mix of different policies on a single machine, swappable at run-time? In the real world, aren't most application mixes pretty predictable based on the role of the machine (controller, desktop, gaming rig, DB server, router, etc.)?
What's the advantage of making it static ? This would mean that you couldn't even run a game on your "desktop" machine because it has a different user-level manager - hardly practical in my opinion..
Why, if not due to its flexibe design, should an exokernel be any faster than a traditional OS ? Security largly depends on the user-space manager design, and if they are monolithic it's not any better either.Colonel Kernel wrote:I think in reality the benefits of exokernels are more in the areas of security and performance rather than flexibility and generality...
Yep, the kernel sets up a capability that spans the whole device and sends it to a root-manager. The root-manager can then split this capability and pass it on to other lower level managers.Colonel Kernel wrote:Does this mean that the user-space servers have to quarrel negotiate amongst themselves to re-allocate resources of the same type that they share? Or does there have to be a root server for each type of resource (c.f. sigma0)?
The pager itself will hardly be more than a few megs in size, paging it out therefore doesn't make any sense. You would just take one of the pages it has passed on to its apps which can then also be dirty..Colonel Kernel wrote:Does this mean that each "paging server" actually has its own pager thread for itself, whose pager thread is in turn in a higher-level "paging server"? The fact that pagers are threads sometimes throws me for a loop...
Wouldn't it be easier if the task had to pay the tay right way ? In my opinion the idea requires too much book-keeping (every time a resource in allocated/delocated) just to make sure that the kernel won't run out of memory, but it should nevertheless work. If you just want to prevent DOS attacks, I'd be more practical to require caps for the creation af tasks etc..Colonel Kernel wrote:I think I have a better solution though -- have the kernel "tax" process memory for thread creation.
Well.. finding the right page is what it's all about and there are many ways of trying to do this (working set, LRU approximation, global policy <-> local policy). Apart from that the pager might also decide which app has to evict a page and can thus, for example, spare an important app. Maybe MM just wan't the best example in the first place as it's quite theoretic..JoeKayzA wrote:Assuming that the 'user level pager' is located inside the app (or the libOS) now, I still can't see the benefit of letting it decide which page should be paged out on a low-memory-situation
There's no fundamental difference between microkernels and exokernels, it's a qualitive matter. In a ?kernel the user-space pager would decide about the whole paging policy itself while in an exokernel the pager only decide about the minimum policy needed and leave the rest to the app.JoeKayzA wrote:Assuming that the user level pager is a process of its own, this still looks like a normal, pure microkernel system to me...maybe we just have a terminology problem?
If you expand the CoyotOS scheme a bit it's roughly how my own scheduler will work :JoeKayzA wrote:And, yes, when you put fairness and priority issues into the system again, what piece of 'policy' will still remain for the apps?
- The (root) task manager gets a capability for the whole CPU
- It can then split it up (50%, 50%) and pass the new caps to two lower level schedulers
- Each lower level scheduler can use its cap to create apps which together my only use 50% of the CPU
- Apps can use the cap they got from the schedluer to start threads, the threads may not use more CPU time than the app holds
The mechanism here is the capability system, the policy is to decide who gets how much..
regards,
gaf
Re:Minimalist Microkernel Memory Management
There's no fundamental difference between microkernels and exokernels, it's a qualitive matter. In a ?kernel the user-space pager would decide about the whole paging policy itself while in an exokernel the pager only decide about the minimum policy needed and leave the rest to the app.
I get the feeling that we were talking about quite the same thing, then . But I would have never thought of L4 or CoyotOS being suitable to form an exokernel system (if I got that right now, that should be possible?) I always thought of exokernels being something like a hypervisor (in the terms of Pacifica/Vanderpool technology). Anyway ::)
So when it's the point that an app has to share and decide about where its resources go, that's quite my vision of a future [micro/exo]kernel operating system.
cheers Joe