Second Page Mapping Causes OS To Crash

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
maxtyson123
Posts: 24
Joined: Wed Apr 19, 2023 1:40 am
Libera.chat IRC: maxtyson123

Second Page Mapping Causes OS To Crash

Post by maxtyson123 »

Hi,

When ever I map my second page, and then try to use my printing function it will crash the OS, this happens no matter the address for the first or even for the second page and I can run other code before the printing function that will still work.

https://github.com/maxtyson123/MaxOS/tr ... Management

More Deatails:
After sucessfully mapping an address:

Code: Select all

physical = {MaxOS::memory::physical_address_t *} 0xfee00000 
virtual_address = {MaxOS::memory::virtual_address_t *} 0xffffffff7ee00000 
My logging function causes the OS to break, however the reason for this is odd.

Using GDB, I inspected my function:

Code: Select all

void _kprintf_internal(uint8_t type, const char* file, int line, const char* func, const char* format, ...)
...
type = {uint8_t} 0 '\000' [0x0]
file = {const char *} 0xffffffff801b0ba0 "/home/max/MaxOS/kernel/src/memory/physical.cpp"
line = {int} 293 [0x125]
func = {const char *} 0xffffffff801b0c63 "map"
format = {const char *} 0xffffffff801b0c97 "Mapped: 0x%x to 0x%x\n"
Which is correct, but when diving a layer deeper into the first line of the function (which is to print a header)

Code: Select all

pre_kprintf(const char* file, int line, const char* func, uint8_t type)
...
file = {const char *} 0xffffffff80103f7d ""
line = {int} -1 [0xffffffff]
func = {const char *} 0xffffffff801c5a18 "p\226\033\200\377\377\377\377"
type = {uint8_t} 224 '\340' [0xe0]
Which suggest to me that some how ive overriten something somewhere with my page mapping?

Registers at the time of crash

Code: Select all

rax            0xffffffff801b0ba0  -2145711200
rbx            0x2                 2
rcx            0x0                 0
rdx            0xffffffff801b0c63  -2145711005
rsi            0x125               293
rdi            0xffffffff801b0ba0  -2145711200
rbp            0xffffffff801c21d0  0xffffffff801c21d0
rsp            0xffffffff801c20f8  0xffffffff801c20f8
r8             0xffffffff801b0c97  -2145710953
r9             0xfee00000          4276092928
r10            0x0                 0
r11            0x0                 0
r12            0x0                 0
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0xffffffff801010cc  0xffffffff801010cc <MaxOS::hardwarecommunication::InterruptManager::HandleException0x03()>
eflags         0x200083            [ ID IOPL=0 SF CF ]
cs             0x8                 8
ss             0x10                16
ds             0x10                16
es             0x10                16
fs             0x10                16
gs             0x10                16
fs_base        0x0                 0
gs_base        0x0                 0
k_gs_base      0x0                 0
cr0            0x80010011          [ PG WP ET PE ]
cr2            0x0                 0
cr3            0x1bc000            [ PDBR=444 PCID=0 ]
cr4            0x20                [ PAE ]
cr8            0x0                 0
efer           0x500               [ LMA LME ]
mxcsr          0x1f80              [ IM DM ZM OM UM PM ]

When inspecting the page mapping in qemu, It looks like physical address 0x210000 has been mapped 146 times.The TLB only seems to be "dirty" like that after it crashes, for instance, it is as expected after writing the new page, and even during interrupt handling of the crash. I Also notice that we running with debugger it is filled with 0x20..... instead of 0x21......

It seems like as soon as that second entry is set the next print fails to execute.
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Second Page Mapping Causes OS To Crash

Post by Octocontrabass »

maxtyson123 wrote: Tue Jul 02, 2024 10:45 pm

Code: Select all

rsp            0xffffffff801c20f8  0xffffffff801c20f8

cr3            0x1bc000            [ PDBR=444 PCID=0 ]
CR3 and therefore p4_table points to physical address 0x1bc000, which means p3_table is 0x1bd000, p3_table_hh is 0x1be000, p2_table is 0x1bf000, pt_tables is 0x1c0000, and the bottom of your stack is 0x1c2000.

How much stack does your logging function use? I'm guessing the answer is too much.
maxtyson123
Posts: 24
Joined: Wed Apr 19, 2023 1:40 am
Libera.chat IRC: maxtyson123

Re: Second Page Mapping Causes OS To Crash

Post by maxtyson123 »

I'll have a look into to it. How do you reckon I should fix this?
maxtyson123
Posts: 24
Joined: Wed Apr 19, 2023 1:40 am
Libera.chat IRC: maxtyson123

Re: Second Page Mapping Causes OS To Crash

Post by maxtyson123 »

Also how would this happen if its able to work perfectly fine the first time I map the page and only fails the second?
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Second Page Mapping Causes OS To Crash

Post by Octocontrabass »

maxtyson123 wrote: Wed Jul 03, 2024 12:54 amHow do you reckon I should fix this?
Make your stack bigger? Rewrite your code to use less stack space? Add some guard pages so a stack overflow causes a double fault?

That's assuming it's a stack overflow, but that should be easy to check just by making the stack bigger.
maxtyson123 wrote: Wed Jul 03, 2024 4:05 pmAlso how would this happen if its able to work perfectly fine the first time I map the page and only fails the second?
You're not flushing the TLB when you corrupt your page tables.
maxtyson123
Posts: 24
Joined: Wed Apr 19, 2023 1:40 am
Libera.chat IRC: maxtyson123

Re: Second Page Mapping Causes OS To Crash

Post by maxtyson123 »

Octocontrabass wrote: Wed Jul 03, 2024 9:49 pm That's assuming it's a stack overflow, but that should be easy to check just by making the stack bigger.
You're not flushing the TLB when you corrupt your page tables.
I made the stack 32KiB and it still does not work. This fails even If comment out that particular print and instead the one on line 48 of apic.cpp.
I added the flushing TLB, thank you for pointing that out.

I have been trying to fix this on and off for months and am completely lost, any help would be greatly appreciated.
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Second Page Mapping Causes OS To Crash

Post by iansjack »

Have you tried setting a watch on the address(es) you suspect are being corrupted? This might give you a fix on when the corruption occurs.
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Second Page Mapping Causes OS To Crash

Post by Octocontrabass »

What does QEMU's "info tlb" and/or "info mem" say when the mappings are messed up? It might help in figuring out which part of your page tables gets overwritten.
maxtyson123
Posts: 24
Joined: Wed Apr 19, 2023 1:40 am
Libera.chat IRC: maxtyson123

Re: Second Page Mapping Causes OS To Crash

Post by maxtyson123 »

Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Second Page Mapping Causes OS To Crash

Post by Octocontrabass »

I still see a stack overflow. That weird physical address that gets repeated a bunch is RFLAGS being pushed by nested interrupts.

Can you find those interrupts in QEMU's "-d int" log?
Post Reply