Hi,
Colonel Kernel wrote:A general comment about L4 based on what I've read -- the point of it being so minimalist is not just for the added flexibility, but also to keep its cache footprint as small as possible. RAM is the most critical performance bottleneck in modern hardware, and it's only getting worse. Cache and TLB thrashing seem to be the worst performance killers. Sounds to me like a good reason to keep the kernel as small as possible... Just my $0.02.
Not to me - keeping the kernel small just means having more services in other places, and doesn't make much difference for the code & data caches. The main difference would be the TLB's, where making good use of "global" pages and being careful to prevent CR3 being changed can help (which is the main reason why L4 has those "small address spaces" IMHO).
Colonel Kernel wrote:I do have another question though. Let's say that I decide I don't need to have multiple pagers in user-space, and that I want most of memory management in the microkernel. How do I handle page faults that require disk I/O? The disk and filesystem drivers will be in user-space, and it would be awkward for the kernel to have special knowledge of them... Any suggestions?
For my OS there's "thread states" which determine why a blocked thread was blocked. One of these states is "waiting for pager/disk IO".
For swap space, each "swap provider" notifies the kernel when it initializes and the kernel keeps a list of them (and keeps track of total size, free size, message port ID, etc for each one). For normal file access the VFS is used.
When data is needed from disk, the thread causes a page fault (page not present), the page fault handler figures out why then sends a message to the swap provider or VFS asking for the data. Then the page fault handler stores a sender ID and function ID, marks the thread as "waiting for pager/disk IO" and the scheduler switches to another thread for a while.
Sooner or later a message comes back from the VFS or swap provider containing the status of the operation and hopefully the data. The messaging code notices the thread was "waiting for pager/disk IO" and checks if the message sender matches the sender ID and function ID set by the page fault handler. When this happens it puts the message at the start of the message queue (rather than at the end, which is what would normally happen) and clears the sender ID and function ID. Then the messaging code clears the thread's "waiting for pager/disk IO" state and adds the thread to the scheduler's "ready to run" queue.
When the scheduler gives the thread CPU time the page fault handler gets the first message from the message queue, which happens to be the right message because of the extra stuff done by the messaging code. The message data is checked, and if the status is OK a free page is mapped where it needs to go, the data is copied into the free page and the page fault handler returns. If the status is bad (timeout, file IO error, etc) there's a critical error and the thread is terminated.
When data is swapped out to disk it's mostly the reverse of this (data sent via. messaging to be saved by the swap provider or VFS, and the pager blocks waiting for returned status). If a page is sent to swap the "block number" is stored in the page table entry so that the data can be found again (where a block number is a reference to which 4096 byte block of swap space was used to store the data).
This gives some restrictions - for a not-present page there's 31 unused bits in the page table entry and one bit is used to determine if it's swap of memory mapped. This means block numbers for swap space must be 30 bits or less, so 2^30 * 4096 gives a maximum swap space size of 4096 GB. When the kernel is using PAE this maximum is increased to 2^74 bytes, as there's 64 bit page table entries.
If a file is memory mapped an index into a "memory mapped range" list is stored in the page table entry. The scheduler steals a few MB of the thread's address space to store this memory mapped range list. Each entry in the list contains a starting address, size and file handle for the memory mapped file.
Of course for this to work you'd need to be able to transfer at least 4100 bytes in a message (easy for my OS). It's definately
not the fastest way, or the most flexible way, or the only way...
.
Cheers,
Brendan