Combuster wrote:
Software TLBs are interesting, but they also pose an additional challenge (one I'm willing to try one day
).
I'm willing to try the challenge, too. But for now I'll stick to x86. And I don't think we're going to see software TLBs there in the (near) future.
Combuster wrote:
Also, there is no proper way to manually manage a TLB in software, since it may evict entries at random. There is also the problem of cache colouring when the TLB is not fully associative. Also, a cleared A or D bit will cause a write to the page table when the entry is used (for writing in case of the D bit). If you can guarantee that writebacks and TLB invalidations do not occur randomly, then yes this can work. But then again, I don't have all the TLB-related details, and there's always SMM that has potential of trashing your precious TLB.
I fully agree. The random eviction of entries wouldn't be a (big) problem in our case. And the A and D bits could be set, so no problem here too. So for our situation we're fine. But you're right that this wouldn't be the case in typical systems. Good point.
Combuster wrote:
Turning a hardware TLB into a software TLB is like wasting computing resources. There's only very little chance that custom TLB management is faster.
That's mainly the thing that I'm concerned about.
Combuster wrote:
Every miss will generate a fault which takes an order of magnitude more cycles than the hardware approach.
Exactly this was my point in our discussion. Do you - or someone else here - have a estimation about what it costs (time) to call a Page Fault handler and returning from it (the handling itself could be tweaked, but not the hardware side).
Combuster wrote:
Then again, the difference will only really matter if whatever application does not fit within the size of the TLB, since only a fixed amount of pagefaults are needed after a task switch (and you could possibly do that greedily for more effect)
This is also the reason why nobody is doing it - the x86 has a hardware TLB and nobody in his right mind is going to break performance by overriding that. You'd only do that when the platform requires you to.
The counter argument against me was that it would significantly reduce the size of paging structures. Imagine a 64 bit world where each process has its own page tables (okay, that's common) and every process uses a very big amount of the address space. Then you'll have very very much RAM "wasted" for just the paging structures. The space needed grows with the number of processes and the amount of address space used. Using the approach I outlined in my original post that space needed only grows with the number of processes (and the space needed per process is very very small) or is even fixed (and extremely small).
What do you - and the others - think? Is that a valid argument? Or is it negligible/less severe?
Combuster wrote:
That said, if I were to design a CPU, it would come with SW TLBs (of variable size
)
Full agreement. That was more or less what I said in the discussion: "I don't think turning HW TLBs into SW TLBs is the right way (because of performance impact). But generally I would favour SW TLBs over HW TLBs."
Colonel Kernel wrote:
Hyperdrive wrote:
We then build some page table structures that will map the virtual address to the appropriate physical address. Make the relevant path for the transition marked "present".
That's why performance would probably be poor. TLB misses are way, way more frequent than page faults. You want TLB miss handlers to run really, really fast.
Yes and no. I agree with you with the facts (TLB misses more frequent, TLB miss handlers have to be really fast), but I think that the TLB miss handler outlined by me can be made really fast. If it would be fast enough I don't know. I have a gut feeling, that this wouldn't be the case; and I'm afraid that only a complete implementation can prove or disprove the estimates.
Colonel Kernel wrote:
IMO the main reasons to go with software-managed TLBs are hardware-related, not software-related. The logic required to implement the x86 page table support and co-ordinate it across multiple processors is non-trivial, and has some bugs (
http://download.intel.com/design/mobile ... 922214.pdf). Those transistors would be better spent on bigger TLBs. Also, if there are bugs in the TLB handlers, you can actually fix them.
Right. That's my opinion, too. You can change software more easily (recompile and It's done). So we all agree to this point: Big software TLBs would be nice.
From your two opinions (thank you for that!) I'm affirmed in my position. Well, we are researchers and so the emulated software TLB will most likely be realized - just to see how good or bad it is (I'll bet on "bad" and from our discussion here I think you'd too, but we'll see). If you'd like I'll keep you informed.
One more quick question: Do the emulators emulate TLBs?
- Bochs
- Qemu
- VirtualBox
- VirtualPC
- VMWare
- ...
I didn't find really reliable information about that
--TS