Brendan wrote:
Not disabling processors is a good thing (how about an 8 way opteron motherboard
) I'm a bit dubious as to the costs though... Also "if all is well, the page shouldn't have been in use anyway" doesn't necessarily apply when you consider pages sent to swap and memory mapped files, or space used for rapidly changing message queues.
Pages sent to swap, and unmapped pages of memory mapped files were not in use recently. When they are being mapped, you first load it, then map it and then invlpg yourself. No problemo.
Firstly if a page isn't present you can just map a page and flush the TLB on the local CPU. If another processor tries to access the page it might still be 'not present' in the TLB so you'd need to try INVLPG and see if that fixes things before mapping a page. INVLPG is much quicker than messing about with IPIs so we can forget about the costs involved with mapping a page.
True, for mapping pages you don't have to signal anybody.
Next, if a page is unmapped in part of the address space that is only ever used by a single thread (my OS supports this & calls it "thread space") then you can forget about TLBs on the other CPU.
If a page is unmapped in part of the address space that is shared by multiple threads (my OS calls this "process space"), then you only have to worry about flushing TLB's on other CPUs if they are running threads that belong to the same process.
My OS doesn't support thread space, for instance. When unmapping a page you unmap it (1), you clear your own cache (2), you send messages to all processors (not only those running threads, see later) (3), and you continue doing whatever you were doing.
Theoretical situation about a race condition: You have a process A and a process B, and a process C. A uses 3 pages(0-2), B uses 3 pages (3-5). Process A is reading from all three pages on processor 1, B is reading from all three pages, C has no relevant issues other than not being A. Processor 0 switches from A to C, processor 1 sees B needing a page, unmaps page #2 from space in process A, process not active, no ipi's. Processor 0 switches back to A, still using the old TLB entry mapping page 2 of A to the page in the cache, and thereby overwrites B's data.
You want to signal all processors.
My processes (applications, drivers, etc) have a flag in the executable file header that says if it supports multi-processor. If this flag is clear the scheduler will not allow more than one thread from the process to be running. I expect that most processes will not set this flag. Also if the process only has 1 thread there's no problem. For a very rough estimate I guess that I'll only need to worry about TLBs on other CPUs in process space about %15 of the time.
That's very ugly. Multiprocessor things also occur with ALL multithreading programs, and if you don't multithread there's nothing the other processor can do about it. What's the use of the bit?
If my OS is running under a steady load (e.g. server) the kernel won't have any/much overhead from TLB flushes. If the kernel is under fluctuating load (multi-processor desktop machine?) then there's some overhead in idle time (negligable).
If it's under a steady load you'll see much processes starting, terminating, worker threads picking up tasks, buffer pages being swapped in & out, so you'll have a lot of work on your hands.
In addition it's a micro-kernel (device drivers, VFS, etc are processes) - it's not going to be changing pages as much as a monolithic kernel.
What's different between a monolithic kernel and a microkernel that makes you say this? I actually dare say you'll keep getting TLB flushes. You might be different from a traditional monolithic kernel that you don't load the code you never use in the first place. That doesn't make you any better though, all your processes are in separate pages, giving a load of overhead a monolithic kernel can beat easily. (yes, an optimized microkernel can be faster than a non-optimized monolithic kernel, pretty *duh* if you ask me). I'm still going for hybrid