Long mode: page fault while getting to ring 3

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
lodo1995
Posts: 16
Joined: Thu Nov 12, 2015 6:31 am

Long mode: page fault while getting to ring 3

Post by lodo1995 »

Hi,
I'm trying to get to ring 3 with this code:

Code: Select all

asm volatile(
    "mov $0x23, %%ax \n"    // 0x20 is a usermode data segment
    "mov %%ax, %%ds \n"
    "mov %%ax, %%es \n"
    "mov %%ax, %%fs \n"
    "mov %%ax, %%gs \n"
    "pushq $0x23 \n"
    "pushq $4088 \n"            // just push a random (but valid) user stack... we won't use it...
    "pushfq \n"
    "pushq $0x1b \n"            // 0x18 is a usermode code segment
    "pushq %0 \n"               // this usermode rip will be 0x80
    "iretq \n"
    :: "g"(ring3_addr) :
    );
Running my code (without changing it), on virtualbox (that should be a "controlled" environment), gives me page faults around 50% of times, working perfectly in the other 50% of times.
These are the info that my page fault handler gives me:
  • cr2 says that the faulting address is 0x80, and the error code says that this address is mapped in the hierarchy (so the fault should be an access rights violation);
  • the faulting rip is also 0x80, address that contains a jump to itself (so no memory access), and the error code says that the fault is NOT caused by an instruction fetch (quite confusing, given that the only memory access in this operation is fetching it);
  • the error code says that the fault happens in usermode, but the VirtualBox debugger says that page 0x0 is usermode accessible, as are its pd, dpd and pml4 entries, so the error should not be a privilege violation;
  • the error code says that the fault happens during a read (in fact, there's no writing instruction around the faulting address, and the page is write protected);
  • the error code says that no other condition caused the fault (protection key, sgx, ...).
Do you have any idea of what could cause this issue? How is it possible that this fault happens only sometimes? I really can't imagine a possible explanation...

Thanks in advance.
User avatar
BrightLight
Member
Member
Posts: 901
Joined: Sat Dec 27, 2014 9:11 am
Location: Maadi, Cairo, Egypt
Contact:

Re: Long mode: page fault while getting to ring 3

Post by BrightLight »

First, an "instruction fetch" page fault will only occur if you try to execute from an NX-protected page, and I don't assume you have NX enabled.
Also, maybe your page has some reserved bits set, and that's why the page fault occurs.
What's in your PML4? What's in your PD? 4 KB/2 MB pages? What's in your page directory/page entry? What's in your GDT? Are interrupts enabled?
You know your OS is advanced when you stop using the Intel programming guide as a reference.
lodo1995
Posts: 16
Joined: Thu Nov 12, 2015 6:31 am

Re: Long mode: page fault while getting to ring 3

Post by lodo1995 »

Thank you for your time.

This is my GDT (thanks to the Virtualbox debugger):

Code: Select all

0008 CodeER Bas=00000000 Lim=00000000 DPL=0 P  A        AVL=0 L=1
0010 DataRW Bas=00000000 Lim=00000000 DPL=0 P  A        AVL=0 L=0
0018 CodeER Bas=00000000 Lim=00000000 DPL=3 P  A        AVL=0 L=1
0020 DataRW Bas=00000000 Lim=00000000 DPL=3 P  A        AVL=0 L=0
0028 Tss64B Bas=ffff8000001e6038 Lim=00000067 DPL=0 P  B         AVL=0 R=0
My paging hierarchy (near address 0x80):

Code: Select all

cr3=000000003e9d5000 A--:RAM:0001745d5000:0002fd5:-0000 Long Mode
                        P - Present
                        | R/W - Read (0) / Write (1)
                        | | U/S - User (1) / Supervisor (0)
                        | | | A - Accessed
                        | | | | D - Dirty
                        | | | | | G - Global
                        | | | | | | WT - Write thru
                        | | | | | | |  CD - Cache disable
                        | | | | | | |  |  AT - Attribute table (PAT)
                        | | | | | | |  |  |  NX - No execute (K8)
                        | | | | | | |  |  |  |  4K/4M/2M - Page size.
                        | | | | | | |  |  |  |  |  AVL - 3 available bits.
Address          Level  | | | | | | |  |  |  |  |  |    Page
0000000000000000 0 |    P R U A ? . -- -- .. -- .. 000  000000003e9d1000 A--:RAM:0001745d1000:0002fd1:-0000
0000000000000000 1  |   P R U A ? . -- -- .. -- .. 000  000000003e9d0000 A--:RAM:0001745d0000:0002fd0:-0000
0000000000000000 2   |  P R U A ? . -- -- .. -- .. 000  000000003e9cf000 A--:RAM:0001745cf000:0002fcf:-0000
0000000000000000 3    | P R U A - - -- -- -- -- 4K 000  0000000000000000 A--:RAM:00018ed52000:0000465:U003b
0000000000001000 3    | P R S A D - -- -- -- -- 4K 000  000000003e8a0000 A--:RAM:0001744a0000:0002ea0:-0000
Interrupts are disabled.
The paging fault error code has the bit about reserved bit violations clear, so that's not the issue.
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Long mode: page fault while getting to ring 3

Post by iansjack »

Just to check, what exactly is the error code? Does the VirtualBox log have nothing to say about the error?
lodo1995
Posts: 16
Joined: Thu Nov 12, 2015 6:31 am

Re: Long mode: page fault while getting to ring 3

Post by lodo1995 »

iansjack wrote:Just to check, what exactly is the error code? Does the VirtualBox log have nothing to say about the error?
The error code is 5 (bit 2 means "fault during user mode" and bit 0 means "faulting address was present in mapping hierarchy").
The VirtualBox log does not seem to contain any info (I think it's normal, given that page fault are a "normal" event for an OS).
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Long mode: page fault while getting to ring 3

Post by iansjack »

All I can think, from the information that we have, is that the fault is caused by the invalid stack. (It is not in a writable page, and clearly a stack needs to be writable.) That your program doesn't use this stack is irrelevant; it's values are still loaded into the appropriate registers.
lodo1995
Posts: 16
Joined: Thu Nov 12, 2015 6:31 am

Re: Long mode: page fault while getting to ring 3

Post by lodo1995 »

iansjack wrote:All I can think, from the information that we have, is that the fault is caused by the invalid stack. (It is not in a writable page, and clearly a stack needs to be writable.) That your program doesn't use this stack is irrelevant; it's values are still loaded into the appropriate registers.
I tried to set page 0 (that contains both the faulting instruction and the unused usermode stack pointer) writable, but the fault still happens sometimes.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Long mode: page fault while getting to ring 3

Post by Combuster »

Have you tried clearing the TLB (using INVLPG or something else)? Normally you'd map all the memory for supervisor initially, and then modify them to be user-writable as well. Being the page for null pointer address, IVT and BDA, the CPU could have easily cached the earlier "available supervisor-only" page in the TLB and rejects it accordingly, while all the default inspections will of course look at RAM and not the TLB and tell you an alternative reality.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
lodo1995
Posts: 16
Joined: Thu Nov 12, 2015 6:31 am

Re: Long mode: page fault while getting to ring 3

Post by lodo1995 »

Combuster wrote:Have you tried clearing the TLB (using INVLPG or something else)? Normally you'd map all the memory for supervisor initially, and then modify them to be user-writable as well. Being the page for null pointer address, IVT and BDA, the CPU could have easily cached the earlier "available supervisor-only" page in the TLB and rejects it accordingly, while all the default inspections will of course look at RAM and not the TLB and tell you an alternative reality.
Thank you very much, Combuster! For some reason I forgot to put the invlpg instruction in my set_memory_rights function! I'm such an idiot!

Thank you again to everybody for your time!
Post Reply