Page 1 of 1

Long mode: page fault while getting to ring 3

Posted: Tue Feb 16, 2016 3:06 am
by lodo1995
Hi,
I'm trying to get to ring 3 with this code:

Code: Select all

asm volatile(
    "mov $0x23, %%ax \n"    // 0x20 is a usermode data segment
    "mov %%ax, %%ds \n"
    "mov %%ax, %%es \n"
    "mov %%ax, %%fs \n"
    "mov %%ax, %%gs \n"
    "pushq $0x23 \n"
    "pushq $4088 \n"            // just push a random (but valid) user stack... we won't use it...
    "pushfq \n"
    "pushq $0x1b \n"            // 0x18 is a usermode code segment
    "pushq %0 \n"               // this usermode rip will be 0x80
    "iretq \n"
    :: "g"(ring3_addr) :
    );
Running my code (without changing it), on virtualbox (that should be a "controlled" environment), gives me page faults around 50% of times, working perfectly in the other 50% of times.
These are the info that my page fault handler gives me:
  • cr2 says that the faulting address is 0x80, and the error code says that this address is mapped in the hierarchy (so the fault should be an access rights violation);
  • the faulting rip is also 0x80, address that contains a jump to itself (so no memory access), and the error code says that the fault is NOT caused by an instruction fetch (quite confusing, given that the only memory access in this operation is fetching it);
  • the error code says that the fault happens in usermode, but the VirtualBox debugger says that page 0x0 is usermode accessible, as are its pd, dpd and pml4 entries, so the error should not be a privilege violation;
  • the error code says that the fault happens during a read (in fact, there's no writing instruction around the faulting address, and the page is write protected);
  • the error code says that no other condition caused the fault (protection key, sgx, ...).
Do you have any idea of what could cause this issue? How is it possible that this fault happens only sometimes? I really can't imagine a possible explanation...

Thanks in advance.

Re: Long mode: page fault while getting to ring 3

Posted: Tue Feb 16, 2016 1:15 pm
by BrightLight
First, an "instruction fetch" page fault will only occur if you try to execute from an NX-protected page, and I don't assume you have NX enabled.
Also, maybe your page has some reserved bits set, and that's why the page fault occurs.
What's in your PML4? What's in your PD? 4 KB/2 MB pages? What's in your page directory/page entry? What's in your GDT? Are interrupts enabled?

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 4:16 am
by lodo1995
Thank you for your time.

This is my GDT (thanks to the Virtualbox debugger):

Code: Select all

0008 CodeER Bas=00000000 Lim=00000000 DPL=0 P  A        AVL=0 L=1
0010 DataRW Bas=00000000 Lim=00000000 DPL=0 P  A        AVL=0 L=0
0018 CodeER Bas=00000000 Lim=00000000 DPL=3 P  A        AVL=0 L=1
0020 DataRW Bas=00000000 Lim=00000000 DPL=3 P  A        AVL=0 L=0
0028 Tss64B Bas=ffff8000001e6038 Lim=00000067 DPL=0 P  B         AVL=0 R=0
My paging hierarchy (near address 0x80):

Code: Select all

cr3=000000003e9d5000 A--:RAM:0001745d5000:0002fd5:-0000 Long Mode
                        P - Present
                        | R/W - Read (0) / Write (1)
                        | | U/S - User (1) / Supervisor (0)
                        | | | A - Accessed
                        | | | | D - Dirty
                        | | | | | G - Global
                        | | | | | | WT - Write thru
                        | | | | | | |  CD - Cache disable
                        | | | | | | |  |  AT - Attribute table (PAT)
                        | | | | | | |  |  |  NX - No execute (K8)
                        | | | | | | |  |  |  |  4K/4M/2M - Page size.
                        | | | | | | |  |  |  |  |  AVL - 3 available bits.
Address          Level  | | | | | | |  |  |  |  |  |    Page
0000000000000000 0 |    P R U A ? . -- -- .. -- .. 000  000000003e9d1000 A--:RAM:0001745d1000:0002fd1:-0000
0000000000000000 1  |   P R U A ? . -- -- .. -- .. 000  000000003e9d0000 A--:RAM:0001745d0000:0002fd0:-0000
0000000000000000 2   |  P R U A ? . -- -- .. -- .. 000  000000003e9cf000 A--:RAM:0001745cf000:0002fcf:-0000
0000000000000000 3    | P R U A - - -- -- -- -- 4K 000  0000000000000000 A--:RAM:00018ed52000:0000465:U003b
0000000000001000 3    | P R S A D - -- -- -- -- 4K 000  000000003e8a0000 A--:RAM:0001744a0000:0002ea0:-0000
Interrupts are disabled.
The paging fault error code has the bit about reserved bit violations clear, so that's not the issue.

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 6:09 am
by iansjack
Just to check, what exactly is the error code? Does the VirtualBox log have nothing to say about the error?

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 6:18 am
by lodo1995
iansjack wrote:Just to check, what exactly is the error code? Does the VirtualBox log have nothing to say about the error?
The error code is 5 (bit 2 means "fault during user mode" and bit 0 means "faulting address was present in mapping hierarchy").
The VirtualBox log does not seem to contain any info (I think it's normal, given that page fault are a "normal" event for an OS).

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 6:29 am
by iansjack
All I can think, from the information that we have, is that the fault is caused by the invalid stack. (It is not in a writable page, and clearly a stack needs to be writable.) That your program doesn't use this stack is irrelevant; it's values are still loaded into the appropriate registers.

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 6:38 am
by lodo1995
iansjack wrote:All I can think, from the information that we have, is that the fault is caused by the invalid stack. (It is not in a writable page, and clearly a stack needs to be writable.) That your program doesn't use this stack is irrelevant; it's values are still loaded into the appropriate registers.
I tried to set page 0 (that contains both the faulting instruction and the unused usermode stack pointer) writable, but the fault still happens sometimes.

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 7:08 am
by Combuster
Have you tried clearing the TLB (using INVLPG or something else)? Normally you'd map all the memory for supervisor initially, and then modify them to be user-writable as well. Being the page for null pointer address, IVT and BDA, the CPU could have easily cached the earlier "available supervisor-only" page in the TLB and rejects it accordingly, while all the default inspections will of course look at RAM and not the TLB and tell you an alternative reality.

Re: Long mode: page fault while getting to ring 3

Posted: Wed Feb 17, 2016 7:28 am
by lodo1995
Combuster wrote:Have you tried clearing the TLB (using INVLPG or something else)? Normally you'd map all the memory for supervisor initially, and then modify them to be user-writable as well. Being the page for null pointer address, IVT and BDA, the CPU could have easily cached the earlier "available supervisor-only" page in the TLB and rejects it accordingly, while all the default inspections will of course look at RAM and not the TLB and tell you an alternative reality.
Thank you very much, Combuster! For some reason I forgot to put the invlpg instruction in my set_memory_rights function! I'm such an idiot!

Thank you again to everybody for your time!