today I was involved in some discussion. Until now I have no real opinion about that and I'm not so sure about the quality of the arguments that I heard during the discussion. So, I'd like to hear your thoughts/opinions as experts (in the general field of OS development). For now, I hold my personal thoughts back to not distort the picture your thoughts/opinions will paint.
So...
The x86 has the nice paging mechanism and the MMU does all the translation and handling for you. TLBs are more or less transparent and managed automatically (except when you change the paging structures - invalidation, TLB shootdown...). In some RISC designs (e.g. Alpha) you can have an exception in case of a TLB miss and you can insert a translation by hand (both not available in x86). That leads to a software managed TLB.
First question: What do you think - is such a software managed TLB better or worse than the x86-like TLB? Why?
Let us suppose that we want such a software managed TLB but we are using x86(-64). We can try to simulate it:
- Initially the TLB is flushed and all entries in PML4 (i.e. highest paging structure level) are marked "not present".
- For a memory access we will get a Page Fault. We then have the faulting address and the type of access that was made (read or write).
- We then build some page table structures that will map the virtual address to the appropriate physical address. Make the relevant path for the transition marked "present".
- We do a dummy read/write to the faulted address. So the address translation is loaded into the TLB.
- Make the PML4 entires marked as "not present".
- End of Page Fault handler.
- The address translation for the memory access that caused the Page Fault can now be satisfied from the TLB.
- You don't need big paging structures for every process - you can use only very small dummy structures and one big data structure internal to the OS (e.g. some sort of inverted page tables?).
- For the internal address translation structure you can use a format that suits your need best (you don't have to use the Intel's predefined ones that may or may not fit your needs).
- It's more flexible. You can give the address translation that you want at any time, because you see every memory access.
- You have to emulate some of the book keeping the MMU normally does for you for free (e.g. access/dirty bits).
- Performance will suck. (Really?)
To refine that question ... The answer is "No" if there's no TLB. And the Application Note of Intel says you can't rely on that. The most processors have one. So, suppose the processor has one. Is it feasible the way I described it? Then again, the answer may be "No", because no one says the TLB must cache a translation, it just may. (Or am I wrong?) Does somebody know how current implementations work - do they always cache the latest translation or just now and then?
Third question: How big would the impact of performance be? Some estimates?
Fourth question: Any pros/cons that I didn't mention? (I guess there are at least some more cons...)
Fifth question: If you think it would be pretty cool: Why the hell is noone doing it?
(That's my favourite. I'll answer just that for now... I personally think, that this is not the first time that idea came up. But I'm not aware of any system that works like that. So maybe the disadvantages are too big?!)
Your thoughts are very welcome. Please, don't think "Uh, that guy again with that weird ideas." As I said, I'm not conviced about that stuff, but I think it's discussable.
It's your turn... (please)
--TS