Page Fault after invalidating non-accessed pages

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
MintyFresh
Posts: 5
Joined: Mon Jun 05, 2017 6:09 am

Page Fault after invalidating non-accessed pages

Post by MintyFresh »

I ran into a very bizarre issue last night. It was a bit tricky to debug, but eventually I tracked it down.
It was coming from my paging_remap_page() function, and causing a Page Fault despite the fact a physical address was properly mapped to the faulting address.

The problem was caused by this:

Code: Select all

pte->address = (paddr & SMALL_PAGE_MASK) | (pte->flags & ~SMALL_PAGE_MASK);
paging_invalidate_page(vaddr & SMALL_PAGE_MASK);
And, the fix was an incredibly minor change:

Code: Select all

pte->address = (paddr & SMALL_PAGE_MASK) | (pte->flags & ~SMALL_PAGE_MASK);

if(pte->accessed)
{
    paging_invalidate_page(vaddr & SMALL_PAGE_MASK);
}
Apparently

Code: Select all

invlpg 
behaves poorly if trying to invalidate a page that hasn't actually been accessed.
I haven't been able to find anything in literature that documents this nuance, and I was wondering if anyone else has run into the issue.

Also, I'll leave this here if anyone else does run into it.
goku420
Member
Member
Posts: 51
Joined: Wed Jul 10, 2013 9:11 am

Re: Page Fault after invalidating non-accessed pages

Post by goku420 »

Do you find it more likely that you discovered an undocumented feature that's trivial enough to be independently discovered or that there is a bigger bug in your program and this "fix" made it work by coincidence?

At least give us a rundown of your debugging process (what your initial problem was and how you came to the conclusion that X was the problem and Y was the fix) and the value of SMALL_PAGE_MASK.

FWIW I could not reproduce this.
User avatar
MintyFresh
Posts: 5
Joined: Mon Jun 05, 2017 6:09 am

Re: Page Fault after invalidating non-accessed pages

Post by MintyFresh »

goku420 wrote:Do you find it more likely that you discovered an undocumented feature that's trivial enough to be independently discovered or that there is a bigger bug in your program and this "fix" made it work by coincidence?

At least give us a rundown of your debugging process (what your initial problem was and how you came to the conclusion that X was the problem and Y was the fix) and the value of SMALL_PAGE_MASK.

FWIW I could not reproduce this.
SMALL_PAGE_MASK is just 0xFFFFF000, used to mask away the bottom 12 bits (flags) from a physical address.
My debugging process consisted of me testing this across several VMs and 2 physical machines (Both Dell, P4 era). It only occurred on the physical hardware.

The way in which I debugged this was comparing the loaded pages between the VM and hardware. I found no discrepancy.
There was no notable differences in the values that were loaded into registers, any the error code included in the interrupt was 0 (no flags set).
Finally, I set up a minimal test case of mapping an available physical address to 0xD0000000, calling invlpg, and then reading from it (which caused the mentioned fault).

What made me suspect that invlpg was the cause was the unittest that was failing. I have 2 tests that call paging_remap_page, but one of them did so without accessing the mapped page first.
User avatar
BrightLight
Member
Member
Posts: 901
Joined: Sat Dec 27, 2014 9:11 am
Location: Maadi, Cairo, Egypt
Contact:

Re: Page Fault after invalidating non-accessed pages

Post by BrightLight »

Without seeing more of your code, I can't guess. But INVLPG should work with both accessed and un-accessed pages, and it does in my OS. You shouldn't check if the page has been accessed; that is not a fix and will only cause problems on other hardware. You should invalidate the TLB cache every time you modify the page directory/table. Since you mention the error only occurred on physical hardware, this is far more likely a caching problem (just guessing.)

Tell us more: what is the error code of the page fault? What is in CR0, CR2 and CR4?
You know your OS is advanced when you stop using the Intel programming guide as a reference.
Post Reply