Paging - strange(?) problem

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Paging - strange(?) problem

Post by begin »

Hi guys!

I´m in PM and want to enable paging for x64. I am not sure if my code for mapping the memory works, but thats not the problem at the moment.

This is what I am doing:
- setup page tables (identity map) from start of code to end of code
- write the phys address of my PML4T into CR3
- set LM bit
- set PG bit

After the last step

Code: Select all

mov		EAX, CR0
or		EAX, (1 << 31)
mov		CR0, EAX
I get a page fault for the next instruction.
And here is what I dont understand:
EIP = 0x300765
But CR2 = 0x80 !?
Should CR2 not be the next instruction?
I am a bit confused now, did I misunderstand anything?
evoex
Member
Member
Posts: 103
Joined: Tue Dec 13, 2011 4:11 pm

Re: Paging - strange(?) problem

Post by evoex »

On page fault, CR2 contains the address that was written to or read from causing the page fault. So it would seem to me that the page at virtual address 0 isn't mapped.
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

Right, but why is the page at 0x0 (or 0x80) accessed? The next instructions are only "nop"s (for testing). So there is no memory access, only execution. And as I said, EIP is far away from 0x0. So when this instruction right after "mov cr0, eax" does a page fault, the CR2 should hold the address of the instruction (EIP+X), shouldnt it?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Paging - strange(?) problem

Post by Combuster »

It probably means that there are page tables for that address, but it doesn't point where it should point.

What (else) does bochs tell you when you try to run that code?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

Bochs says:
#PF, code = 9
#DF (I have no handlers installed at the moment, so this is no problem)
CR2 = 0x80
-> reset

What else do you need?

Just for my understanding:
CR2 holds the linear address (not the physical one) which was accessed and caused the #PF.

Code: Select all

(EIP)mov cr0, eax
(EIP+X)nop
(EIP+Y)nop
...
Now there is 0x80 in CR2. The code at (EIP+X) caused the fault. But this code does no memory access. That means, the instruction itself causes the fault. But as I said, EIP is far away from 0x80. So what sets EIP to 0x80? Do you know what I mean? I do dentity mapping, so EIP should be (EIP+X), shouldnt it?? :?
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Paging - strange(?) problem

Post by Combuster »

What (else) does bochs tell you
At least one big part of your question can be answered by just looking at bochs' dump - it gives you the exact instruction that causes the fault.

And please post that logdump here as well - including all the preceding messages caused by your OS. It's a waste of time to play 100 questions.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
iansjack
Member
Member
Posts: 4709
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Paging - strange(?) problem

Post by iansjack »

Do you have a valid IDT and interrupt handlers? If not, have you considered the possibility that there are multiple page faults and you are seeing the last one rather than the first one?
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

@iansjack:
I have no IDT. Bochs quits after the first fault.

Okay, here is the dump:

Code: Select all

(0) Magic breakpoint
00183518510i[XGUI  ] Mouse capture off
<bochs:6> disasm /16
0030075a: (                    ): mov eax, cr0              ; 0f20c0
0030075d: (                    ): or eax, 0x80000000        ; 0d00000080
00300762: (                    ): mov cr0, eax              ; 0f22c0
00300765: (                    ): nop                       ; 90
00300766: (                    ): nop                       ; 90
00300767: (                    ): nop                       ; 90 // end of code, only trash from here
00300768: (                    ): add byte ptr ds:[eax], al ; 0000
0030076a: (                    ): add byte ptr ds:[eax], al ; 0000
0030076c: (                    ): add byte ptr ds:[eax], al ; 0000
0030076e: (                    ): add byte ptr ds:[eax], al ; 0000
00300770: (                    ): add byte ptr ds:[eax], al ; 0000
00300772: (                    ): add byte ptr ds:[eax], al ; 0000
00300774: (                    ): add byte ptr ds:[edx+175], bl ; 009aaf000000
0030077a: (                    ): add byte ptr ds:[eax], al ; 0000
0030077c: (                    ): add byte ptr ds:[edx+1573039], bl ; 009aaf001800
00300782: (                    ): add byte ptr ds:[eax], al ; 0000
00183518510i[XGUI  ] Mouse capture off
<bochs:7> trace on
Tracing enabled for CPU0
00183518510i[XGUI  ] Mouse capture off
<bochs:8> s
(0).[183518510] [0x00000030075a] 0008:000000000030075a (unk. ctxt): mov eax, cr0              ; 0f20c0
00183518510i[XGUI  ] Mouse capture off
<bochs:9> s
(0).[183518510] [0x00000030075d] 0008:000000000030075d (unk. ctxt): or eax, 0x80000000        ; 0d00000080
00183518510i[XGUI  ] Mouse capture off
<bochs:10> s
(0).[183518510] [0x000000300762] 0008:0000000000300762 (unk. ctxt): mov cr0, eax              ; 0f22c0
00183518510i[XGUI  ] Mouse capture off
<bochs:11> s
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Interrupt 0x0e occured (error_code=0x0009)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Exception 0x08 - (#DF) double fault occured (error_code=0x0000)
CPU 0: Interrupt 0x08 occured (error_code=0x0000)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
00183518510i[CPU0  ] CPU is in compatibility mode (active)
00183518510i[CPU0  ] CS.mode = 32 bit
00183518510i[CPU0  ] SS.mode = 32 bit
00183518510i[CPU0  ] EFER   = 0x00000500
00183518510i[CPU0  ] | RAX=00000000e0000011  RBX=00000000003022e1
00183518510i[CPU0  ] | RCX=00000000c0000080  RDX=0000000000000000
00183518510i[CPU0  ] | RSP=00000000000079dc  RBP=00000000000079dc
00183518510i[CPU0  ] | RSI=0000000000100018  RDI=000000000000892a
00183518510i[CPU0  ] |  R8=0000000000000000   R9=0000000000000000
00183518510i[CPU0  ] | R10=0000000000000000  R11=0000000000000000
00183518510i[CPU0  ] | R12=0000000000000000  R13=0000000000000000
00183518510i[CPU0  ] | R14=0000000000000000  R15=0000000000000000
00183518510i[CPU0  ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00183518510i[CPU0  ] | SEG sltr(index|ti|rpl)     base    limit G D
00183518510i[CPU0  ] |  CS:0008( 0001| 0|  0) 00000000 ffffffff 1 1
00183518510i[CPU0  ] |  DS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00183518510i[CPU0  ] |  SS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00183518510i[CPU0  ] |  ES:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00183518510i[CPU0  ] |  FS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00183518510i[CPU0  ] |  GS:0010( 0002| 0|  0) 00000000 ffffffff 1 1
00183518510i[CPU0  ] |  MSR_FS_BASE:0000000000000000
00183518510i[CPU0  ] |  MSR_GS_BASE:0000000000000000
00183518510i[CPU0  ] | RIP=0000000000300765 (0000000000300765)
00183518510i[CPU0  ] | CR0=0xe0000011 CR2=0x0000000000000080
00183518510i[CPU0  ] | CR3=0x00303000 CR4=0x00000020
(0).[183518510] ??? (physical address not available)
00183518510e[CPU0  ] exception(): 3rd (14) exception with no resolution, shutdown status is 00h, resetting
00183518510i[SYS   ] bx_pc_system_c::Reset(HARDWARE) called
00183518510i[CPU0  ] cpu hardware reset
Some information right before the crash:

Code: Select all

Global Descriptor Table (base=0x00000000000085be, limit=24):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-Conforming, Accessed, 32-bit
GDT[0x02]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed

CR0=0x60000011: pg CD NW ac wp ne ET ts em mp PE
CR2=page fault laddr=0x0000000000000000
CR3=0x000000303000
    PCD=page-level cache disable=0
    PWT=page-level write-through=0
CR4=0x00000020: smap smep osxsave pcid fsgsbase smx vmx osxmmexcpt osfxsr pce pge mce PAE pse de tsd pvi vme
CR8: 0x0
EFER=0x00000100: ffxsr nxe lma LME sce
Thanks for you help!
User avatar
iansjack
Member
Member
Posts: 4709
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Paging - strange(?) problem

Post by iansjack »

Hmm:
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Interrupt 0x0e occured (error_code=0x0009)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Exception 0x08 - (#DF) double fault occured (error_code=0x0000)
CPU 0: Interrupt 0x08 occured (error_code=0x0000)
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
...
3rd (14) exception with no resolution
I'm not convinced that you are seeing the first page fault exception.

Create a handler for the page fault exception and then you can capture and inspect the first page fault, not the wreckage left after the third one.

Edit: As for what causes the first page fault - well, what do you suppose is going to happen when the processor tries to execute the instruction starting at 0x00300768?

Edit 2: Oops - forget that; I see that you are single-stepping the code. Probably an invalid Page Table then.
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

Yep, adding a IVT was a very good idea! I thought Bochs would stop after the first error, but it doesnt.
I get some errors much earlier ("random" double faults on different locations, so there is at least one big other problem #-o) . I will try to investigate them first. Maybe this will fix all further errors.
I will come back when I am done.
Thank you!
User avatar
iansjack
Member
Member
Posts: 4709
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Paging - strange(?) problem

Post by iansjack »

To be able to stop after the first exception you would have to be debugging at the microcode level (and I'm not sure that is possible). What is happening is that the instruction is causing a page fault for some reason. Without running any further program instructions the processor then tries to run the exception handler by loading the code pointed to by the IDT for that exception. In your case that could be anywhere as you have no handlers. The processor is trying to read the instruction from a random memory location. The result is that the processor flags a second page fault exception (unless by a remarkable coincidence that random memory location happened to be in a valid page) and tries to process it. This time the result is exactly the same as before. All the while it is still trying to process the original faulting instruction so has nowhere to "break into" the running program even if you are single-stepping it. Processors don't like three unhandled exceptions in a row, so at this point it throws up its hands. The result, in Bochs, is to halt execution - on a real processor it would initiate a reset. This is the infamous triple fault.

Now as long as you have an exception handler - it doesn't have to do anything; a simple "hlt" (or even, for your purposes, a "nop") would do - and a valid IDT pointing to it, things are different. This time the processor will proceed to the next program instruction (in this case the start of the exception handler) so the program can break. If you are single-stepping the program would halt at this point and you would say "hey - why has the IP suddenly jumped somewhere else?".
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

Okay, here I am again :roll:

I am still getting (random?) #DF. Sometimes

Code: Select all

lidt fword ptr [whatever]
sti // here
add     esp, 8Ch // or here
or later in memset for different destination addresses.
These addresses are valid and free to use, I checked it before (E820 memory map).
The problem is, #DF should push error code 0 on the stack, but it doesnt. Stack layout is EIP, CS, EFLAGS.
IDT seems to be fine:

Code: Select all

Interrupt Descriptor Table (base=0x0000000000303000, limit=167):
IDT[0x00]=32-Bit Interrupt Gate target=0x0008:0x00300751, DPL=0
IDT[0x01]=32-Bit Interrupt Gate target=0x0008:0x00300760, DPL=0
IDT[0x02]=32-Bit Interrupt Gate target=0x0008:0x0030076f, DPL=0
IDT[0x03]=32-Bit Interrupt Gate target=0x0008:0x0030077e, DPL=0
IDT[0x04]=32-Bit Interrupt Gate target=0x0008:0x0030078d, DPL=0
IDT[0x05]=32-Bit Interrupt Gate target=0x0008:0x0030079c, DPL=0
IDT[0x06]=32-Bit Interrupt Gate target=0x0008:0x003007ab, DPL=0
IDT[0x07]=32-Bit Interrupt Gate target=0x0008:0x003007ba, DPL=0
IDT[0x08]=32-Bit Interrupt Gate target=0x0008:0x003007c9, DPL=0
IDT[0x09]=32-Bit Interrupt Gate target=0x0008:0x003007d3, DPL=0
IDT[0x0a]=32-Bit Interrupt Gate target=0x0008:0x003007e2, DPL=0
IDT[0x0b]=32-Bit Interrupt Gate target=0x0008:0x003007ec, DPL=0
IDT[0x0c]=32-Bit Interrupt Gate target=0x0008:0x003007f6, DPL=0
IDT[0x0d]=32-Bit Interrupt Gate target=0x0008:0x00300800, DPL=0
IDT[0x0e]=32-Bit Interrupt Gate target=0x0008:0x0030080a, DPL=0
IDT[0x0f]=32-Bit Interrupt Gate target=0x0008:0x00000000, DPL=0
IDT[0x10]=32-Bit Interrupt Gate target=0x0008:0x00300814, DPL=0
IDT[0x11]=32-Bit Interrupt Gate target=0x0008:0x00300823, DPL=0
IDT[0x12]=32-Bit Interrupt Gate target=0x0008:0x0030082d, DPL=0
IDT[0x13]=32-Bit Interrupt Gate target=0x0008:0x0030083c, DPL=0
IDT[0x14]=32-Bit Interrupt Gate target=0x0008:0x0030084b, DPL=0
GDT seems to be fine also:

Code: Select all

Global Descriptor Table (base=0x00000000000085be, limit=24):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, base=0x00000000, limit=0xffffffff, Execute/Read, Non-Conforming, Accessed, 32-bit
GDT[0x02]=Data segment, base=0x00000000, limit=0xffffffff, Read/Write, Accessed
I found the same problem ("random" #DF after sti) on this forum, but the problem had something to do with PIC initialization (I dont do any hardware initialization yet).

Edit:

Code: Select all

CS = 0x08
DS = SS = ES = FS = GS = 0x10
Octocontrabass
Member
Member
Posts: 5590
Joined: Mon Mar 25, 2013 7:01 pm

Re: Paging - strange(?) problem

Post by Octocontrabass »

begin wrote:The problem is, #DF should push error code 0 on the stack, but it doesnt.
Sounds like a hardware interrupt.
begin wrote:I found the same problem ("random" #DF after sti) on this forum, but the problem had something to do with PIC initialization (I dont do any hardware initialization yet).
So you're still receiving IRQ0 around 18 times a second, and IRQ0 is still mapped to interrupt 8?
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

Your absolutly right.
It´s a hardware interrupt. I dont know why I didnt test this before. I just created a idle loop and the interrupt handler fired...

I did not map any IRQs, because I thought I have read some time ago, that the PIC must be enabled first before it fires IRQs... Thats wrong I see.

So I think thats the reason.
I have no IDT handler for IRQs -> #DF.

So many hours of debugging because of such a stupid thing :oops: :oops: :oops:

Thank you all and sorry for taking your time for such a **** "problem"...
begin
Posts: 17
Joined: Mon Jun 23, 2014 12:51 pm

Re: Paging - strange(?) problem

Post by begin »

Hey guys I am already back again :oops:

Next problem about paging. I am trying to map virtual address 0x0 to physical address 0x0.
This is my code (actually it is the visual studio version which I use for debugging):

Code: Select all

// ...
#define PG_PSZ 4096 // page size
#define PG_EPT 512 // entries per table
// ...
pg_pml4 = (PML4T*)_aligned_malloc(sizeof(PML4T) * PG_EPT, PG_PSZ);
memset(pg_pml4, 0, sizeof(PML4T) * PG_EPT);
// ...
map(0, 0);
// ...

bool map(uint32_t V, uint32_t P){
	uint64_t va = V;

	uint32_t off_idx = va & 0xfff;
	va >>= 12;
	uint32_t pt_idx = va & 0x1ff;
	va >>= 9;
	uint32_t pdt_idx = va & 0x1ff;
	va >>= 9;
	uint32_t pdpt_idx = va & 0x1ff;
	va >>= 9;
	uint32_t pmlt4_idx = va & 0x1ff;

	PML4T* pml4e = &pg_pml4[pmlt4_idx];
	if (!pml4e->PhysAddr){
		pml4e->PhysAddr = (uint32_t)_aligned_malloc(sizeof(PDPT) * PG_EPT, PG_PSZ);
		if (!pml4e->PhysAddr)
			return false;
		memset((void*)(uint32_t)pml4e->PhysAddr, 0, sizeof(PDPT) * PG_EPT);
	}
	pml4e->P = pml4e->RW = 1;

	PDPT* pdpt = (PDPT*)(uint32_t)pml4e->PhysAddr;
	PDPT* pdpte = &pdpt[pdpt_idx];
	if (!pdpte->PhysAddr){
		pdpte->PhysAddr = (uint32_t)_aligned_malloc(sizeof(PDT) * PG_EPT, PG_PSZ);
		if (!pdpte->PhysAddr)
			return false;
		memset((void*)(uint32_t)pdpte->PhysAddr, 0, sizeof(PDT) * PG_EPT);
	}
	pdpte->P = pdpte->RW = 1;

	PDT* pdt = (PDT*)(uint32_t)pdpte->PhysAddr;
	PDT* pdte = &pdt[pdt_idx];
	if (!pdte->PhysAddr){
		pdte->PhysAddr = (uint32_t)_aligned_malloc(sizeof(PT) * PG_EPT, PG_PSZ);
		if (!pdte->PhysAddr)
			return false;
		memset((void*)(uint32_t)pdte->PhysAddr, 0, sizeof(PT) * PG_EPT);
	}
	pdte->P = pdte->RW = 1;

	PT* pt = (PT*)(uint32_t)pdte->PhysAddr;
	PT* pte = &pt[pt_idx];
	pte->P = pte->RW = 1;
	pte->PhysAddr = P;
}
- All paging struct have the right size (8 bytes)
- memory allocation does not fail

Then I finally break after enabling paginging (set bit in CR0) to do not crash because I did not map any other page than page 0x0.
I use the bochs "page 0x0" command to view the mapping.
But bochs says:
<bochs:7> page 0
PML4: 0x0000000106000003 ps a pcd pwt S W P
// 0x106000 is pml4e->PhysAddr
PDPE: 0xffffffffffffffff XD PS G PAT D A PCD PWT U W P
physical address not available for linear 0x0000000000000000
I read Intels 64 and IA-32 Architecture Manual - which part did I misunderstand?

And thank you again guys =D>
Post Reply