[SOLVED] Somthing is setting bit 20 (reserved, MBZ) of my pd

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
cds84
Posts: 14
Joined: Thu Jan 13, 2011 7:20 am

[SOLVED] Somthing is setting bit 20 (reserved, MBZ) of my pd

Post by cds84 »

Hi guys!

Just as i thought I understood paging, Ive been become stuck for weeks, and cannot for the life of me see anything wrong with my code!

I MUST be doing something really stupid here, but i cannot work out what, and im out of ideas!
Perhaps a fresh pair of eyes could help me out ???

In my boot loader, i setup some initial 2meg page tables.
first gigabyte of physical memory identity mapped at virtual address 0x0
first gigabyte of physical memory offset mapped at virtual 0xffff800000000000
first free whole 2megabytes of physical memory for kernel at virtual 0xffffffff8000000 ( for -mcmodel=kernel )

I then load the kernel, switch to long mode.

At this point, i am able to print characters to the screen using virtual address 0xb8000 ( identity map ) OR 0xffff8000000b8000 ( offset mapped )

Now, here is where my trouble starts!
I want to re-generate my page tables to get them out of low-memory, and into memory controlled by my physical allocator.
For the moment, i am creating an identical set of mappings to that created by the boot loader, but in the future, i will remove the identity map ( so that NULL causes a page fault! )

But after writing my new pml4e into cr3, any attempt to use my offset mapping to write to the screen causes a page fault!
However, writing to the screen with the identity mapping works as expected!

At first, i thought there must be a fault with my tables, but after a great time with my debugger, i have found nothing, and i am out of ideas!

some debug output...

// tables setup by bootloader ( working! )

Code: Select all

walking page tables for virtual address 0xffff800000000000.
pml4e @ 0x10000 - pml4e[256] = 0x13023  ( accessed, dirty, writable, present )
pdpe  @ 0x13000 - pdpe[0]      = 0x14023  ( accessed, dirty, writable, present )
pdp    @ 0x14000 - pdp[0]       = 0xe3       ( terminal, accessed, dirty, writable, present )
// tables setup by kernel ( page fault! )

Code: Select all

walking page tables for virtual address 0xffff800000000000.
pml4e @ 0x3ff000 - pml4e[256] = 0x3fe003  ( writable, present )
pdpe  @ 0x3fe000 - pdpe[0]     = 0x3fd003  ( writable, present )
pdp    @ 0x3fd000 - pdp[0]      = 0x83       ( terminal, writable, present )
Apart from the dirty / accessed bits, they are identical!

bochs debug info...

Code: Select all

00067792014i[CPU0 ] CPU is in long mode (active)
00067792014i[CPU0 ] CS.d_b = 16 bit
00067792014i[CPU0 ] SS.d_b = 16 bit
00067792014i[CPU0 ] EFER   = 0x00000500
00067792014i[CPU0 ] | RAX=ffff8000000b80a0  RBX=0000000000000000
00067792014i[CPU0 ] | RCX=00000000000000a0  RDX=ffff8000000b8000
00067792014i[CPU0 ] | RSP=000000000007ff2f  RBP=ffffffff80000f6e
00067792014i[CPU0 ] | RSI=ffff8000000b8140  RDI=000000000000000a
00067792014i[CPU0 ] |  R8=0000000040000000   R9=0000000000000000
00067792014i[CPU0 ] | R10=000000000000ffb7  R11=ffffffff80000000
00067792014i[CPU0 ] | R12=0000000000000000  R13=000000000000000a
00067792014i[CPU0 ] | R14=0000000000000000  R15=0000000000000000
00067792014i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF CF
00067792014i[CPU0 ] | SEG selector     base    limit G D
00067792014i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00067792014i[CPU0 ] |  CS:0008( 0001| 0|  0) 00000000 000fffff 0 0
00067792014i[CPU0 ] |  DS:0018( 0003| 0|  0) 00000000 000fffff 0 0
00067792014i[CPU0 ] |  SS:0018( 0003| 0|  0) 00000000 000fffff 0 0
00067792014i[CPU0 ] |  ES:0018( 0003| 0|  0) 00000000 000fffff 0 0
00067792014i[CPU0 ] |  FS:0018( 0003| 0|  0) 00000000 000fffff 0 0
00067792014i[CPU0 ] |  GS:0018( 0003| 0|  0) 00000000 000fffff 0 0
00067792014i[CPU0 ] |  MSR_FS_BASE:0000000000000000
00067792014i[CPU0 ] |  MSR_GS_BASE:0000000000000000
00067792014i[CPU0 ] | RIP=ffffffff80000a68 (ffffffff80000a68)
00067792014i[CPU0 ] | CR0=0xe0000011 CR2=0xffff8000000b80a0
00067792014i[CPU0 ] | CR3=0x003ff000 CR4=0x000000a0
(0).[67792014] [0x00200a68] 0008:ffffffff80000a68 (unk. ctxt): mov dil, byte ptr ds:[rax] ; 408a38
00067792014e[CPU0 ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
The fault was caused by an attempt to read a byte from the framebuffer at 0xffff8000000b80a0 ( see cr2 and rax )
mov dil, byte ptr ds:[rax]

according to cr3, i have the correct pml4e loaded.

what could possibly have gone wrong ?
What more can i do to debug ?
I am using bochs x86-64 2.4.5 ( gentoo )
The same fault also shows up on qemu.

And now, for your pleasure, the fault occurring with trace and trace-mem on!

Code: Select all

(0) Breakpoint 1, 0xffffffff80000a46 in ?? ()
Next at t=39766823
(0) [0x00200a46] 0008:ffffffff80000a46 (unk. ctxt): rep movsd dword ptr es:[rdi], dword ptr ds:[rsi] ; f3a5
<bochs:7> page 0x0000000000000000
linear page 0x0000000000000000 maps to physical page 0x00000000
<bochs:8> page 0xffff800000000000
physical address not available for linear 0xffff800000000000
<bochs:9> trace-mem on
Memory-Tracing enabled for CPU0
<bochs:10> trace on
Tracing enabled for CPU0
<bochs:11> s
(0).[39766823] [0x00200a46] 0008:ffffffff80000a46 (unk. ctxt): rep movsd dword ptr es:[rdi], dword ptr ds:[rsi] ; f3a5
[CPU0 RD]: PHY 0x003ff800 (len=8): 0x00000000 0x003FE003        ; PML4E
[CPU0 RD]: PHY 0x003fe000 (len=8): 0x00000000 0x003FD003        ; PDPTE
[CPU0 RD]: PHY 0x003fd000 (len=8): 0x00000000 0x00100083        ; PDE                <****** HERE! why is bit 20 set? **********>
CPU 0: Exception 0x0e - (#PF) page fault occured (error_code=0x0009)
CPU 0: Interrupt 0x0e occured (error_code=0x0009)
[CPU0 RD]: PHY 0x003ff000 (len=8): 0x00000000 0x003FC023        ; PML4E
[CPU0 RD]: PHY 0x003fc000 (len=8): 0x00000000 0x003FB023        ; PDPTE
[CPU0 RD]: PHY 0x003fb000 (len=8): 0x00000000 0x000000E3        ; PDE
[CPU0 RD]: LIN 0x00000000000000e0 PHY 0x000000e0 (len=8, pl=0): 0xF000FF53 0xF000FF53
[CPU0 RD]: LIN 0x00000000000000e8 PHY 0x000000e8 (len=8, pl=0): 0xF000FF53 0xF000FF53
00039766823e[CPU0 ] interrupt(long mode): IDT entry extended attributes DWORD4 TYPE != 0
CPU 0: Exception 0x0d - (#GP) general protection fault occured (error_code=0x0072)
CPU 0: Exception 0x08 - (#DF) double fault occured (error_code=0x0000)
CPU 0: Interrupt 0x08 occured (error_code=0x0000)
Notice but 20 is set in the PDE? i didnt set that!
So i added a write watchpoint at that physical address.
This watchpoint gets triggered all over the place, on random instructions that SHOULD NOT WRITE to any memory.

for example

Code: Select all

00053010136i[CPU0 ] [53010136] Caught write watch point
(0) Caught write watch point at 0x003fd000
Next at t=53010136
(0) [0x00200d9b] 0008:ffffffff80000d9b (unk. ctxt): xor ebx, ebx              ; 31db
huh?

Interrupts are turned off, so i guess hardware is writing to this address ???
It shouldn't be! my physical allocated only gives out addresses that the bios memory map labeled USABLE.

Any Ideas ?????

Thanks!
Last edited by cds84 on Thu Mar 31, 2011 2:38 pm, edited 1 time in total.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Somthing is setting bit 20 (reserved, MBZ) of my pdp..??

Post by Combuster »

Are you sure you are not looking at the instruction after the watchpoint?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
cds84
Posts: 14
Joined: Thu Jan 13, 2011 7:20 am

Re: Somthing is setting bit 20 (reserved, MBZ) of my pdp..??

Post by cds84 »

SOLVED!

Thanks for the hint Combuster, that got me moving again!

Like i guessed, it was a stupid mistake!

Code: Select all

/*** physical memory bitmap ***/
static uint64_t phy_bitmap[(PAGE_MAX/64)+1];

.... blah blah blah ....
phy_bitmap[0] = 1; // FIXME: HACK - set_page_*() funcs ignore NULL page. manually make it un-available here.
which makes pages 1 to 15 available again !!!!

obviously should have been.

Code: Select all

phy_bitmap[0] |= 1; // FIXME: HACK - set_page_*() funcs ignore NULL page. manually make it un-available here.
It turned out that bit 20 was part of an my console fifo spin lock, so only access to the offset map during a kprintf was causing a page fault!

You really cant lose focus even for a second developing an OS.. or like me you will be paying with hours of debug!

Again, thanks!
Post Reply