Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
asm volatile(
"mov $0x23, %%ax \n" // 0x20 is a usermode data segment
"mov %%ax, %%ds \n"
"mov %%ax, %%es \n"
"mov %%ax, %%fs \n"
"mov %%ax, %%gs \n"
"pushq $0x23 \n"
"pushq $4088 \n" // just push a random (but valid) user stack... we won't use it...
"pushfq \n"
"pushq $0x1b \n" // 0x18 is a usermode code segment
"pushq %0 \n" // this usermode rip will be 0x80
"iretq \n"
:: "g"(ring3_addr) :
);
Running my code (without changing it), on virtualbox (that should be a "controlled" environment), gives me page faults around 50% of times, working perfectly in the other 50% of times.
These are the info that my page fault handler gives me:
cr2 says that the faulting address is 0x80, and the error code says that this address is mapped in the hierarchy (so the fault should be an access rights violation);
the faulting rip is also 0x80, address that contains a jump to itself (so no memory access), and the error code says that the fault is NOT caused by an instruction fetch (quite confusing, given that the only memory access in this operation is fetching it);
the error code says that the fault happens in usermode, but the VirtualBox debugger says that page 0x0 is usermode accessible, as are its pd, dpd and pml4 entries, so the error should not be a privilege violation;
the error code says that the fault happens during a read (in fact, there's no writing instruction around the faulting address, and the page is write protected);
the error code says that no other condition caused the fault (protection key, sgx, ...).
Do you have any idea of what could cause this issue? How is it possible that this fault happens only sometimes? I really can't imagine a possible explanation...
First, an "instruction fetch" page fault will only occur if you try to execute from an NX-protected page, and I don't assume you have NX enabled.
Also, maybe your page has some reserved bits set, and that's why the page fault occurs.
What's in your PML4? What's in your PD? 4 KB/2 MB pages? What's in your page directory/page entry? What's in your GDT? Are interrupts enabled?
You know your OS is advanced when you stop using the Intel programming guide as a reference.
0008 CodeER Bas=00000000 Lim=00000000 DPL=0 P A AVL=0 L=1
0010 DataRW Bas=00000000 Lim=00000000 DPL=0 P A AVL=0 L=0
0018 CodeER Bas=00000000 Lim=00000000 DPL=3 P A AVL=0 L=1
0020 DataRW Bas=00000000 Lim=00000000 DPL=3 P A AVL=0 L=0
0028 Tss64B Bas=ffff8000001e6038 Lim=00000067 DPL=0 P B AVL=0 R=0
iansjack wrote:Just to check, what exactly is the error code? Does the VirtualBox log have nothing to say about the error?
The error code is 5 (bit 2 means "fault during user mode" and bit 0 means "faulting address was present in mapping hierarchy").
The VirtualBox log does not seem to contain any info (I think it's normal, given that page fault are a "normal" event for an OS).
All I can think, from the information that we have, is that the fault is caused by the invalid stack. (It is not in a writable page, and clearly a stack needs to be writable.) That your program doesn't use this stack is irrelevant; it's values are still loaded into the appropriate registers.
iansjack wrote:All I can think, from the information that we have, is that the fault is caused by the invalid stack. (It is not in a writable page, and clearly a stack needs to be writable.) That your program doesn't use this stack is irrelevant; it's values are still loaded into the appropriate registers.
I tried to set page 0 (that contains both the faulting instruction and the unused usermode stack pointer) writable, but the fault still happens sometimes.
Have you tried clearing the TLB (using INVLPG or something else)? Normally you'd map all the memory for supervisor initially, and then modify them to be user-writable as well. Being the page for null pointer address, IVT and BDA, the CPU could have easily cached the earlier "available supervisor-only" page in the TLB and rejects it accordingly, while all the default inspections will of course look at RAM and not the TLB and tell you an alternative reality.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Combuster wrote:Have you tried clearing the TLB (using INVLPG or something else)? Normally you'd map all the memory for supervisor initially, and then modify them to be user-writable as well. Being the page for null pointer address, IVT and BDA, the CPU could have easily cached the earlier "available supervisor-only" page in the TLB and rejects it accordingly, while all the default inspections will of course look at RAM and not the TLB and tell you an alternative reality.
Thank you very much, Combuster! For some reason I forgot to put the invlpg instruction in my set_memory_rights function! I'm such an idiot!