I used the R2000 as an example as it was contemporaneous to the i386, being released about the same time.reapersms wrote:I haven't dealt with any MIPS older than the R3000, and none of the systems used paging/VM even when I did, so I don't have a particularly informed opinion there. Given the choice, I'd probably lean towards the i386, but that would be mostly due to familiarity, and a general sense that MIPS tended to get chosen for being cheap/easy to license. For the systems I did work with, they relied heavily on coprocessors to get anything done in a reasonable timeframe, and had pretty terrible performance for general purpose code relative to their competitors.
Yes, I can see how a hardware page walker would be faster for the cases handled by the page table format.reapersms wrote:
As for software paging, I suspect the hardware cost of the fixed function search/TLB fill for the page present case still works out to being a better/faster solution overall than routing to an interrupt every time, but have no hard data there. It certainly seems like there'd be tighter latency guarantees available (though if you care about that, you probably wouldn't have paging enabled anyways).
But that's more than offset by the higher hit rate of more TLB entries. The hard data of the time (when both the i386 and R2000 were released) makes for embarrassing reading for Intel, and put the R2000 at maybe 3x the performance of the i386.
If you need TLB latency guarantees, you can just lock TLB desired entries in MIPS.
A fully associative TLB should be as fast or faster, all look ups occur in parallel. The cost of fully associative caches is not in lookup performance, but instead in transistor count, die area and power. Each TLB entry will have it's own tags, they'll all be queried on lookup, but they'll do it in parallel, so the latency should be the same as a single TLB entry lookup no matter how many entries you have.reapersms wrote:
There's probably a tradeoff in there for a larger/more associative TLB, in that that likely slows down the initial lookup, all other things being equal. That slowdown may be generally insignificant (and probably is) overall, but it's something to keep in mind certainly. For the comparison given there, I'd expect the MIPS to have slightly slower TLB-found performance, but hitting that more often. The 386 would be slightly faster, but more likely to fall into a TLB fill situation, and win that one via the known table format. For the page not found case, neither one is going to be fast about it, but that's expected.
But the R2000 had a bigger transistor budget to play with. It had about 110,000 transistors versus the i386's 275,000 transistors, and even using a larger process (2um vs 1.5um), it had a smaller die than the i386. Bear in mind all that, and it had more registers as well as a larger TLB, and all those registers help avoid register spills to memory and hence memory/TLB use.
In fact, in terms of transistor count and die area, the R2000 was more comparable to an 80286.
Not at all in the case of TLB for the refill code. Perhaps, in the case of icache for the refill code.reapersms wrote:
One other consideration is that a full software approach means you're going to be consuming some amount of those larger TLBs or caches to track the translation and cache lines for the code to walk your structures. The 386 would be able to avoid that, as the page tables are referred to with direct physical addresses anyways. The tables themselves I assume dirty the cache either way. With what I recall of MIPS icache performance (when it existed), that would be a pretty heavy cost.
The code to handle TLB refill would be (in fact, must be) in the kseg0 address range, which is a fixed mapping and uses no TLB entries. The MIPS hardware also sets up a pointer to where the lookup would be in a virtual page table, which the refill code can use as is if so inclined, so there is the option of the refill logic being just a handful of instructions. The virtual page table would likely be in kseg2, which is mapped via the TLB, and so would also be subject to TLB refill. That would just be a recursive call into the refill code to map that, and simulates the two level page tables used in the i386.
My first thought about software TLB miss handling was the same as yours. But if you look into it, it's genius, and produces hella fast processors. It's used not just in MIPS, but also SPARCv9 and Alpha (that uses PALcode to do the page walking, but it is essentially a soft TLB refill as well as PALcode is just privileged code).
Not really. The topic was about x86 bloat, and hardware page walking is arguably bloat that is not needed.reapersms wrote:
Hardware vs software systems are probably veering further off topic though...