Page 1 of 1

Page Fault after invalidating non-accessed pages

Posted: Tue Jun 06, 2017 7:44 pm
by MintyFresh
I ran into a very bizarre issue last night. It was a bit tricky to debug, but eventually I tracked it down.
It was coming from my paging_remap_page() function, and causing a Page Fault despite the fact a physical address was properly mapped to the faulting address.

The problem was caused by this:

Code: Select all

pte->address = (paddr & SMALL_PAGE_MASK) | (pte->flags & ~SMALL_PAGE_MASK);
paging_invalidate_page(vaddr & SMALL_PAGE_MASK);
And, the fix was an incredibly minor change:

Code: Select all

pte->address = (paddr & SMALL_PAGE_MASK) | (pte->flags & ~SMALL_PAGE_MASK);

if(pte->accessed)
{
    paging_invalidate_page(vaddr & SMALL_PAGE_MASK);
}
Apparently

Code: Select all

invlpg 
behaves poorly if trying to invalidate a page that hasn't actually been accessed.
I haven't been able to find anything in literature that documents this nuance, and I was wondering if anyone else has run into the issue.

Also, I'll leave this here if anyone else does run into it.

Re: Page Fault after invalidating non-accessed pages

Posted: Tue Jun 06, 2017 8:35 pm
by goku420
Do you find it more likely that you discovered an undocumented feature that's trivial enough to be independently discovered or that there is a bigger bug in your program and this "fix" made it work by coincidence?

At least give us a rundown of your debugging process (what your initial problem was and how you came to the conclusion that X was the problem and Y was the fix) and the value of SMALL_PAGE_MASK.

FWIW I could not reproduce this.

Re: Page Fault after invalidating non-accessed pages

Posted: Tue Jun 06, 2017 8:44 pm
by MintyFresh
goku420 wrote:Do you find it more likely that you discovered an undocumented feature that's trivial enough to be independently discovered or that there is a bigger bug in your program and this "fix" made it work by coincidence?

At least give us a rundown of your debugging process (what your initial problem was and how you came to the conclusion that X was the problem and Y was the fix) and the value of SMALL_PAGE_MASK.

FWIW I could not reproduce this.
SMALL_PAGE_MASK is just 0xFFFFF000, used to mask away the bottom 12 bits (flags) from a physical address.
My debugging process consisted of me testing this across several VMs and 2 physical machines (Both Dell, P4 era). It only occurred on the physical hardware.

The way in which I debugged this was comparing the loaded pages between the VM and hardware. I found no discrepancy.
There was no notable differences in the values that were loaded into registers, any the error code included in the interrupt was 0 (no flags set).
Finally, I set up a minimal test case of mapping an available physical address to 0xD0000000, calling invlpg, and then reading from it (which caused the mentioned fault).

What made me suspect that invlpg was the cause was the unittest that was failing. I have 2 tests that call paging_remap_page, but one of them did so without accessing the mapped page first.

Re: Page Fault after invalidating non-accessed pages

Posted: Tue Jun 06, 2017 9:03 pm
by BrightLight
Without seeing more of your code, I can't guess. But INVLPG should work with both accessed and un-accessed pages, and it does in my OS. You shouldn't check if the page has been accessed; that is not a fix and will only cause problems on other hardware. You should invalidate the TLB cache every time you modify the page directory/table. Since you mention the error only occurred on physical hardware, this is far more likely a caching problem (just guessing.)

Tell us more: what is the error code of the page fault? What is in CR0, CR2 and CR4?