Colonel Kernel wrote:
As I see it, the problems that self-paging attempts to solve are twofold: [...]
Adding a swapping mechanism to a microkernel isn't really hard. What is hard, is trying to come up with a policy, that doesn't involve a pager in kernel, and doesn't require unbounded kernel memory usage. If you have unbounded memory usage, then you might need to free pages for kernel use, which is the real problem, if you try to avoid having a default pager in kernel.
Having a default pager in kernel is a problem, if you want to have your swap driver in userspace, 'cos while mandating that all processes must trust the default swap driver is fine, your kernel also needs to trust on an userspace process now, which is not nice.
While allowing apps the provide hints to VMM is one option, that doesn't solve certain problems. Suppose I have a database, which caches information in memory. Suppose futher that the in-memory cache uses a structure optimized for memory access, while the on-disk version is optimized for disk access, and conversion between the two is relatively cheap.
In such case, it makes no sense at all to use normal paging: if the in-memory version is written to page file, then it'll have to be loaded for memory later, just to store it again to disk in a different format. It might be cheaper to simply store to the on-disk format directly, and reconstruct the in-memory version later if necessary.
Or suppose I have a game, which stores models using spine-surfaces, but breaks them into regular polygon meshes with level-of-detail before use, and caches some amount of the processed meshes in memory to avoid disk access. Now there's little point to store the processes meshes to disk at all, and they are probably larger than the original on-disk version too. So if more memory is required, it's better to just throw them away, and reconstruct later. Since they are just cache, we might not even need them at all.
I can try to think of better examples too, if you want, but I think you get the point: applications know what is worth saving. Applications also know HOW the data should be saved. By forcing the use of a generalized pager, you might end up doing extra work.
In my OS, the difficulty I have with VMM in general is that I want to put most of the VMM policy outside the ukernel to keep the kernel simple, but I also want to support fast IPC via page mapping, which doesn't work since the VMM server is supposed to be aware of which process owns which pages at all times (but the kernel itself implements IPC).
Personally, I'm not really interested in trying to keep the ?kernel small to make it more simple; I'm more interested in trying to avoid forcing a certain policy. I want to be able to support several policies, running at the same time, at the same system, without kernel having to care.
I suspect that putting the VMM in the kernel is a better idea, but I want to keep my kernel as small as possible. Without swapping, that VMM policy would be much simpler and could easily go in the kernel, but I also suspect that the need for swapping is not going away any time soon.
Well, I'm going to keep my VMM in kernel, but after thinking about current directions and possibilities, I think I'm going to drop the idea of swapping too; just allocate physical memory (possibly with some quotas) as long as there's some. Once there's no more, just tell that to applications.
I am going to support some form of self-paging though, because it really doesn't need much support from kernel: you just need to let the process handle the page fault, and manage it's own mappings. So the only real questions are whether I'm going to somehow support process swap-out (reduce resident pages of inactive processes) and shared memory (which introduces a nasty amount of new problems).
Lot of systems seem to be moving away from shared memory lately, instead concentrating on efficient message passing. Things like copy-on-write are also easier to do fast, if shared memory need not be taken care of.
I'm thinking about dropping (logically) shared memory as well since it seems to be mostly trouble. I especially liked the point Singularity folks made:
A big problem with share memory is that if one process crashes, then the other process can't really tell what's the state of the memory, so in many cases the best it can do is just crash as well.
And thinking about it, in most cases that don't require crashing both processes at the same time, you'll be essentially using the shared memory as a fast substitute of message passing, in which case efficient copy-on-write optimized large messages would suffice just fine. In fact, I can think of several nice tricks that are easier to do with them than normal shared memory.