Page 1 of 2
Paging - strange(?) problem
Posted: Sun Jul 20, 2014 8:45 am
by begin
Hi guys!
I´m in PM and want to enable paging for x64. I am not sure if my code for mapping the memory works, but thats not the problem at the moment.
This is what I am doing:
- setup page tables (identity map) from start of code to end of code
- write the phys address of my PML4T into CR3
- set LM bit
- set PG bit
After the last step
Code: Select all
mov EAX, CR0
or EAX, (1 << 31)
mov CR0, EAX
I get a page fault for the next instruction.
And here is what I dont understand:
EIP = 0x300765
But CR2 = 0x80 !?
Should CR2 not be the next instruction?
I am a bit confused now, did I misunderstand anything?
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 8:59 am
by evoex
On page fault, CR2 contains the address that was written to or read from causing the page fault. So it would seem to me that the page at virtual address 0 isn't mapped.
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 9:33 am
by begin
Right, but why is the page at 0x0 (or 0x80) accessed? The next instructions are only "nop"s (for testing). So there is no memory access, only execution. And as I said, EIP is far away from 0x0. So when this instruction right after "mov cr0, eax" does a page fault, the CR2 should hold the address of the instruction (EIP+X), shouldnt it?
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 9:52 am
by Combuster
It probably means that there are page tables for that address, but it doesn't point where it should point.
What (else) does bochs tell you when you try to run that code?
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 11:56 am
by begin
Bochs says:
#PF, code = 9
#DF (I have no handlers installed at the moment, so this is no problem)
CR2 = 0x80
-> reset
What else do you need?
Just for my understanding:
CR2 holds the linear address (not the physical one) which was accessed and caused the #PF.
Code: Select all
(EIP)mov cr0, eax
(EIP+X)nop
(EIP+Y)nop
...
Now there is 0x80 in CR2. The code at (EIP+X) caused the fault. But this code does no memory access. That means, the instruction itself causes the fault. But as I said, EIP is far away from 0x80. So what sets EIP to 0x80? Do you know what I mean? I do dentity mapping, so EIP should be (EIP+X), shouldnt it??
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 1:10 pm
by Combuster
What (else) does bochs tell you
At least one big part of your question can be answered by just looking at bochs' dump - it gives you the exact instruction that causes the fault.
And please post that logdump here as well - including all the preceding messages caused by your OS. It's a waste of time to play 100 questions.
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 1:25 pm
by iansjack
Do you have a valid IDT and interrupt handlers? If not, have you considered the possibility that there are multiple page faults and you are seeing the last one rather than the first one?
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 2:49 pm
by begin
@iansjack:
I have no IDT. Bochs quits after the first fault.
Okay, here is the dump:
Code: Select all
(0) Magic breakpoint
00183518510i[XGUI ] Mouse capture off
<bochs:6> disasm /16
0030075a: ( ): mov eax, cr0 ; 0f20c0
0030075d: ( ): or eax, 0x80000000 ; 0d00000080
00300762: ( ): mov cr0, eax ; 0f22c0
00300765: ( ): nop ; 90
00300766: ( ): nop ; 90
00300767: ( ): nop ; 90 // end of code, only trash from here
00300768: ( ): add byte ptr ds:[eax], al ; 0000
0030076a: ( ): add byte ptr ds:[eax], al ; 0000
0030076c: ( ): add byte ptr ds:[eax], al ; 0000
0030076e: ( ): add byte ptr ds:[eax], al ; 0000
00300770: ( ): add byte ptr ds:[eax], al ; 0000
00300772: ( ): add byte ptr ds:[eax], al ; 0000
00300774: ( ): add byte ptr ds:[edx+175], bl ; 009aaf000000
0030077a: ( ): add byte ptr ds:[eax], al ; 0000
0030077c: ( ): add byte ptr ds:[edx+1573039], bl ; 009aaf001800
00300782: ( ): add byte ptr ds:[eax], al ; 0000
00183518510i[XGUI ] Mouse capture off
<bochs:7> trace on
Tracing enabled for CPU0
00183518510i[XGUI ] Mouse capture off
<bochs:8> s
(0).[183518510] [0x00000030075a] 0008:000000000030075a (unk. ctxt): mov eax, cr0 ; 0f20c0
00183518510i[XGUI ] Mouse capture off
<bochs:9> s
(0).[183518510] [0x00000030075d] 0008:000000000030075d (unk. ctxt): or eax, 0x80000000 ; 0d00000080
00183518510i[XGUI ] Mouse capture off
<bochs:10> s
(0).[183518510] [0x000000300762] 0008:0000000000300762 (unk. ctxt): mov cr0, eax ; 0f22c0
00183518510i[XGUI ] Mouse capture off
<bochs:11> s
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Interrupt 0x0e occured (error_code=0x0009)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Exception 0x08 - (#DF) double fault occured (error_code=0x0000)
CPU 0: Interrupt 0x08 occured (error_code=0x0000)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
00183518510i[CPU0 ] CPU is in compatibility mode (active)
00183518510i[CPU0 ] CS.mode = 32 bit
00183518510i[CPU0 ] SS.mode = 32 bit
00183518510i[CPU0 ] EFER = 0x00000500
00183518510i[CPU0 ] | RAX=00000000e0000011 RBX=00000000003022e1
00183518510i[CPU0 ] | RCX=00000000c0000080 RDX=0000000000000000
00183518510i[CPU0 ] | RSP=00000000000079dc RBP=00000000000079dc
00183518510i[CPU0 ] | RSI=0000000000100018 RDI=000000000000892a
00183518510i[CPU0 ] | R8=0000000000000000 R9=0000000000000000
00183518510i[CPU0 ] | R10=0000000000000000 R11=0000000000000000
00183518510i[CPU0 ] | R12=0000000000000000 R13=0000000000000000
00183518510i[CPU0 ] | R14=0000000000000000 R15=0000000000000000
00183518510i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00183518510i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00183518510i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 ffffffff 1 1
00183518510i[CPU0 ] | DS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00183518510i[CPU0 ] | SS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00183518510i[CPU0 ] | ES:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00183518510i[CPU0 ] | FS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00183518510i[CPU0 ] | GS:0010( 0002| 0| 0) 00000000 ffffffff 1 1
00183518510i[CPU0 ] | MSR_FS_BASE:0000000000000000
00183518510i[CPU0 ] | MSR_GS_BASE:0000000000000000
00183518510i[CPU0 ] | RIP=0000000000300765 (0000000000300765)
00183518510i[CPU0 ] | CR0=0xe0000011 CR2=0x0000000000000080
00183518510i[CPU0 ] | CR3=0x00303000 CR4=0x00000020
(0).[183518510] ??? (physical address not available)
00183518510e[CPU0 ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
00183518510i[SYS ] bx_pc_system_c::Reset(HARDWARE) called
00183518510i[CPU0 ] cpu hardware reset
Some information right before the crash:
Code: Select all
Global Descriptor Table (base=0x00000000000085be, limit=24):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-Conforming, Accessed, 32-bit
GDT[0x02]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
CR0=0x60000011: pg CD NW ac wp ne ET ts em mp PE
CR2=page fault laddr=0x0000000000000000
CR3=0x000000303000
PCD=page-level cache disable=0
PWT=page-level write-through=0
CR4=0x00000020: smap smep osxsave pcid fsgsbase smx vmx osxmmexcpt osfxsr pce pge mce PAE pse de tsd pvi vme
CR8: 0x0
EFER=0x00000100: ffxsr nxe lma LME sce
Thanks for you help!
Re: Paging - strange(?) problem
Posted: Sun Jul 20, 2014 3:15 pm
by iansjack
Hmm:
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Interrupt 0x0e occured (error_code=0x0009)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Exception 0x08 - (#DF) double fault occured (error_code=0x0000)
CPU 0: Interrupt 0x08 occured (error_code=0x0000)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
...
3rd (14) exception with no resolution
I'm not convinced that you are seeing the first page fault exception.
Create a handler for the page fault exception and then you can capture and inspect the first page fault, not the wreckage left after the third one.
Edit: As for what causes the first page fault - well, what do you suppose is going to happen when the processor tries to execute the instruction starting at 0x00300768?
Edit 2: Oops - forget that; I see that you are single-stepping the code. Probably an invalid Page Table then.
Re: Paging - strange(?) problem
Posted: Mon Jul 21, 2014 7:37 am
by begin
Yep, adding a IVT was a very good idea! I thought Bochs would stop after the first error, but it doesnt.
I get some errors much earlier ("random" double faults on different locations, so there is at least one big other problem
) . I will try to investigate them first. Maybe this will fix all further errors.
I will come back when I am done.
Thank you!
Re: Paging - strange(?) problem
Posted: Mon Jul 21, 2014 9:50 am
by iansjack
To be able to stop after the first exception you would have to be debugging at the microcode level (and I'm not sure that is possible). What is happening is that the instruction is causing a page fault for some reason. Without running any further program instructions the processor then tries to run the exception handler by loading the code pointed to by the IDT for that exception. In your case that could be anywhere as you have no handlers. The processor is trying to read the instruction from a random memory location. The result is that the processor flags a second page fault exception (unless by a remarkable coincidence that random memory location happened to be in a valid page) and tries to process it. This time the result is exactly the same as before. All the while it is still trying to process the original faulting instruction so has nowhere to "break into" the running program even if you are single-stepping it. Processors don't like three unhandled exceptions in a row, so at this point it throws up its hands. The result, in Bochs, is to halt execution - on a real processor it would initiate a reset. This is the infamous triple fault.
Now as long as you have an exception handler - it doesn't have to do anything; a simple "hlt" (or even, for your purposes, a "nop") would do - and a valid IDT pointing to it, things are different. This time the processor will proceed to the next program instruction (in this case the start of the exception handler) so the program can break. If you are single-stepping the program would halt at this point and you would say "hey - why has the IP suddenly jumped somewhere else?".
Re: Paging - strange(?) problem
Posted: Mon Jul 21, 2014 10:49 am
by begin
Okay, here I am again
I am still getting (random?) #DF. Sometimes
Code: Select all
lidt fword ptr [whatever]
sti // here
add esp, 8Ch // or here
or later in memset for different destination addresses.
These addresses are valid and free to use, I checked it before (E820 memory map).
The problem is, #DF should push error code 0 on the stack, but it doesnt. Stack layout is EIP, CS, EFLAGS.
IDT seems to be fine:
Code: Select all
Interrupt Descriptor Table (base=0x0000000000303000, limit=167):
IDT[0x00]=32-Bit Interrupt Gate target=0x0008:0x00300751, DPL=0
IDT[0x01]=32-Bit Interrupt Gate target=0x0008:0x00300760, DPL=0
IDT[0x02]=32-Bit Interrupt Gate target=0x0008:0x0030076f, DPL=0
IDT[0x03]=32-Bit Interrupt Gate target=0x0008:0x0030077e, DPL=0
IDT[0x04]=32-Bit Interrupt Gate target=0x0008:0x0030078d, DPL=0
IDT[0x05]=32-Bit Interrupt Gate target=0x0008:0x0030079c, DPL=0
IDT[0x06]=32-Bit Interrupt Gate target=0x0008:0x003007ab, DPL=0
IDT[0x07]=32-Bit Interrupt Gate target=0x0008:0x003007ba, DPL=0
IDT[0x08]=32-Bit Interrupt Gate target=0x0008:0x003007c9, DPL=0
IDT[0x09]=32-Bit Interrupt Gate target=0x0008:0x003007d3, DPL=0
IDT[0x0a]=32-Bit Interrupt Gate target=0x0008:0x003007e2, DPL=0
IDT[0x0b]=32-Bit Interrupt Gate target=0x0008:0x003007ec, DPL=0
IDT[0x0c]=32-Bit Interrupt Gate target=0x0008:0x003007f6, DPL=0
IDT[0x0d]=32-Bit Interrupt Gate target=0x0008:0x00300800, DPL=0
IDT[0x0e]=32-Bit Interrupt Gate target=0x0008:0x0030080a, DPL=0
IDT[0x0f]=32-Bit Interrupt Gate target=0x0008:0x00000000, DPL=0
IDT[0x10]=32-Bit Interrupt Gate target=0x0008:0x00300814, DPL=0
IDT[0x11]=32-Bit Interrupt Gate target=0x0008:0x00300823, DPL=0
IDT[0x12]=32-Bit Interrupt Gate target=0x0008:0x0030082d, DPL=0
IDT[0x13]=32-Bit Interrupt Gate target=0x0008:0x0030083c, DPL=0
IDT[0x14]=32-Bit Interrupt Gate target=0x0008:0x0030084b, DPL=0
GDT seems to be fine also:
Code: Select all
Global Descriptor Table (base=0x00000000000085be, limit=24):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-Conforming, Accessed, 32-bit
GDT[0x02]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
I found the same problem ("random" #DF after sti) on this forum, but the problem had something to do with PIC initialization (I dont do any hardware initialization yet).
Edit:
Code: Select all
CS = 0x08
DS = SS = ES = FS = GS = 0x10
Re: Paging - strange(?) problem
Posted: Mon Jul 21, 2014 3:03 pm
by Octocontrabass
begin wrote:The problem is, #DF should push error code 0 on the stack, but it doesnt.
Sounds like a hardware interrupt.
begin wrote:I found the same problem ("random" #DF after sti) on this forum, but the problem had something to do with PIC initialization (I dont do any hardware initialization yet).
So you're still receiving IRQ0 around 18 times a second, and IRQ0 is still mapped to interrupt 8?
Re: Paging - strange(?) problem
Posted: Mon Jul 21, 2014 4:24 pm
by begin
Your absolutly right.
It´s a hardware interrupt. I dont know why I didnt test this before. I just created a idle loop and the interrupt handler fired...
I did not map any IRQs, because I thought I have read some time ago, that the PIC must be enabled first before it fires IRQs... Thats wrong I see.
So I think thats the reason.
I have no IDT handler for IRQs -> #DF.
So many hours of debugging because of such a stupid thing
Thank you all and sorry for taking your time for such a **** "problem"...
Re: Paging - strange(?) problem
Posted: Tue Jul 22, 2014 9:40 am
by begin
Hey guys I am already back again
Next problem about paging. I am trying to map virtual address 0x0 to physical address 0x0.
This is my code (actually it is the visual studio version which I use for debugging):
Code: Select all
// ...
#define PG_PSZ 4096 // page size
#define PG_EPT 512 // entries per table
// ...
pg_pml4 = (PML4T*)_aligned_malloc(sizeof(PML4T) * PG_EPT, PG_PSZ);
memset(pg_pml4, 0, sizeof(PML4T) * PG_EPT);
// ...
map(0, 0);
// ...
bool map(uint32_t V, uint32_t P){
uint64_t va = V;
uint32_t off_idx = va & 0xfff;
va >>= 12;
uint32_t pt_idx = va & 0x1ff;
va >>= 9;
uint32_t pdt_idx = va & 0x1ff;
va >>= 9;
uint32_t pdpt_idx = va & 0x1ff;
va >>= 9;
uint32_t pmlt4_idx = va & 0x1ff;
PML4T* pml4e = &pg_pml4[pmlt4_idx];
if (!pml4e->PhysAddr){
pml4e->PhysAddr = (uint32_t)_aligned_malloc(sizeof(PDPT) * PG_EPT, PG_PSZ);
if (!pml4e->PhysAddr)
return false;
memset((void*)(uint32_t)pml4e->PhysAddr, 0, sizeof(PDPT) * PG_EPT);
}
pml4e->P = pml4e->RW = 1;
PDPT* pdpt = (PDPT*)(uint32_t)pml4e->PhysAddr;
PDPT* pdpte = &pdpt[pdpt_idx];
if (!pdpte->PhysAddr){
pdpte->PhysAddr = (uint32_t)_aligned_malloc(sizeof(PDT) * PG_EPT, PG_PSZ);
if (!pdpte->PhysAddr)
return false;
memset((void*)(uint32_t)pdpte->PhysAddr, 0, sizeof(PDT) * PG_EPT);
}
pdpte->P = pdpte->RW = 1;
PDT* pdt = (PDT*)(uint32_t)pdpte->PhysAddr;
PDT* pdte = &pdt[pdt_idx];
if (!pdte->PhysAddr){
pdte->PhysAddr = (uint32_t)_aligned_malloc(sizeof(PT) * PG_EPT, PG_PSZ);
if (!pdte->PhysAddr)
return false;
memset((void*)(uint32_t)pdte->PhysAddr, 0, sizeof(PT) * PG_EPT);
}
pdte->P = pdte->RW = 1;
PT* pt = (PT*)(uint32_t)pdte->PhysAddr;
PT* pte = &pt[pt_idx];
pte->P = pte->RW = 1;
pte->PhysAddr = P;
}
- All paging struct have the right size (8 bytes)
- memory allocation does not fail
Then I finally break after enabling paginging (set bit in CR0) to do not crash because I did not map any other page than page 0x0.
I use the bochs "page 0x0" command to view the mapping.
But bochs says:
<bochs:7> page 0
PML4: 0x0000000106000003 ps a pcd pwt S W P
// 0x106000 is pml4e->PhysAddr
PDPE: 0xffffffffffffffff XD PS G PAT D A PCD PWT U W P
physical address not available for linear 0x0000000000000000
I read Intels 64 and IA-32 Architecture Manual - which part did I misunderstand?
And thank you again guys