alexfru wrote:
Page tables and directories are only involved when the addresses aren't in the TLB yet. I guess, the CPU would have to bring the addresses into the TLB *before* executing the instruction (at least the addresses for the code pages containing the instruction) and it shouldn't evict those TLB entries (the ones that are valid for the instruction) from the TLB if, say, just page out of 6 code/data pages isn't accessible, which means you probably don't need to have 13 physical pages mapped at the same time on a 80386. IOW, here's what can happen:
- page fault for the physical address of the first byte of the instruction (fills one TLB entry if handled)
- page fault for the instruction fetch
- page fault for the physical address of the remaining bytes of the instruction (fills one more TLB entry if handled)
- page fault for the instruction fetch
- page faults for the physical address of the data accessed by the instruction (several more TLB entries filled if handled)
- page faults for the data accesses by the instruction
Once filled, the TLB entries should not get invalidated and then filled again when we return from the page fault handler to the instruction (unless the TLB is too small for all the activities performed by the page fault handler). Right? Wrong?
I think your point is right, but I don't think it helps much if at all.. You still need those TLB entries to point to physical memory, if you're going to repurpose the backing memory (RAM) to load in what ever is needed by the the code (which caused the #PF to begin with) then you're going to need to fill that RAM with something else.
Essentially you'd need to make the cache think that two different TLB (virtual address) entries are pointing to same backing memory (same 4KiB RAM page), yet the cache would have different contents for each virtual address.
Or did you mean when the page dir has a not-present PDE you'd create one on the fly (incl. the page table), containing only that one PTE entry, allowing the #PF to be resolved and a single TLB entry to be "fed" to the TLB. And then always re-purposing the 4KiB RAM page for the same use? I think it might be possible to get that to work...
Or did you mean something else?
Brendan wrote:
There's multiple cases where CPU accesses "EIP plus 2 other addresses" (MOVSD, CMPSD, PUSH, POP); and with misaligned accesses (e.g. accessing 2 bytes at 0x0FFFFFFF) each of the 3 simultaneous accesses can be split across 2 pages (and page tables, and ..); leading to a worst case of "6 pages plus 6 page tables plus page directory = 52 KiB" (for "plain 32-bit paging"), and a worst case of "6 pages plus 6 page tables plus 6 page directories plus 6 PDPTs plus PML4 = 100 KiB" (for long mode).
Might get slightly off topic, but won't you need at least the following for worst/pathological case:
- IDT
- #PF handler
- Stack (#PF will push to it)
- Current code page
- Source data page
- Destination data page
- Page table for each page (assuming absolutely worst case where everything is spread so far apart that they are in separate page tables)
- Page directory
For most of those you need to account for two pages for unaligned/page-boundary access. I'm not sure if there's some internals of the CPU you could take advantage of to reduce this list slightly, such as relying the cache on having old entries, but if that works it would at least be a horrible hack.
The OS dev could try to minimize the #PF handler and thus merge it with the IDT in the same page.
So I count:
- IDT + #PF handler; 1 page (combined, which means at most ~4KiB for #PF handler)
- Stack (#PF will push to it); 2 pages (page boundary)
- Current code page; 2 pages (page boundary)
- Source data page; 2 pages (page boundary)
- Destination data page; 2 pages (page boundary)
- Page table for each page; 9 pages (one for each of the above pages)
- Page directory; 1 page
Giving a total of 19 pages = 72KiB of RAM (still under Brendans 256KiB).
I'm interested if anyone can come up with anything that must be added to that list or can be removed, keeping in mind absolute worst/pathological case..
edit. Forgot to list GDT, though that could possibly be on the same page as IDT and #PF handler..