Weird page fault triggering bug...

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Matt1223
Member
Member
Posts: 45
Joined: Mon Jul 30, 2018 2:58 am

Weird page fault triggering bug...

Post by Matt1223 »

So I'm having this very strange bug that triggers a page fault. Apparently, those are the instructions that triggers the page fault:

Code: Select all

(0) [0x000000402f42] 0008:0000000000402f42 (unk. ctxt): add eax, 0x2345e064       ; 0564e04523
<bochs:8> s
Next at t=1451281218
(0) [0x000000402f47] 0008:0000000000402f47 (unk. ctxt): add dword ptr ds:[eax], eax ; 0100
It makes sense as 0x2345e064 is not a valid adress. The problem however is that when I look at my kernel.exe in IDA it shows following instructions at the same spot:

Code: Select all

.text:00402F42                 add     eax, offset word_40E064
.text:00402F47                 movzx   eax, word ptr [eax]
It looks like the code is being modified while the OS is running...

What's more interresting. I started having this issue after changing this line of code:

Code: Select all

terminal_print(term, "\tAvailable memory: %uMB\n", memory_get_available()/1024/1024);
to this one:

Code: Select all

terminal_print(term, "\tAvailable memory: %uMB\n", memory_get_available()/1000000);
in a funtion that has absolutely nothing to do with the page fault triggering function.

I have absolutely no idea what is going on in here. Can you help me with understanding it?
Octocontrabass
Member
Member
Posts: 5562
Joined: Mon Mar 25, 2013 7:01 pm

Re: Weird page fault triggering bug...

Post by Octocontrabass »

You've got a pointer somewhere that points to something it shouldn't, and at some point something uses that pointer to write the value 0x12345 into memory.

It's impossible to narrow it down further without more information.
Matt1223
Member
Member
Posts: 45
Joined: Mon Jul 30, 2018 2:58 am

Re: Weird page fault triggering bug...

Post by Matt1223 »

Octocontrabass wrote:You've got a pointer somewhere that points to something it shouldn't, and at some point something uses that pointer to write the value 0x12345 into memory.

It's impossible to narrow it down further without more information.
Is there a way to make bochsdbg inform me when memory at the particular adress gets modified?
Octocontrabass
Member
Member
Posts: 5562
Joined: Mon Mar 25, 2013 7:01 pm

Re: Weird page fault triggering bug...

Post by Octocontrabass »

Yes, it's called a watchpoint. Here's the documentation for that.
Matt1223
Member
Member
Posts: 45
Joined: Mon Jul 30, 2018 2:58 am

Re: Weird page fault triggering bug...

Post by Matt1223 »

So it looks like this adress is being written to only once, when kernel is being loaded by the bootloader, and it's already wrong. Maybe there is a bug in my bootloader load_kernel funtion:

Code: Select all

load_kernel: ; ret eax = entry point , edx = ImageBase, edi = SizeOfImage
  mov eax, [0x00010000+0x3C];PE offset

  xor ecx, ecx
  mov ecx, [0x00010000+eax+0x50] ;SizeOfImage
  push ecx; push SizeOfImage

  xor ecx, ecx
  mov cx, [0x00010000+eax+6] ;number of section

  mov edx, [0x00010000+eax+0x34] ;image base
  push edx ; push ImageBase

  xor ebx, ebx
  mov bx, [0x00010000+eax+0x14];optional Header Size
  add ebx, 0x00010000+0x18
  add ebx, eax; ebx - section table

  mov eax,[0x00010000+eax+0x28]
  add eax, edx ;eax - AddressOfEntryPoint

  .l1:

  push ecx

  mov ecx, [ebx+0x10];SizeOfRawData
  mov edi, [ebx+0xC];VirtualAdress
  mov esi, [ebx+0x14];PointerToRawData

  add edi, edx
  add esi, 0x00010000

  cld
  rep movsb 

  add ebx, 0x28 
  pop ecx

  cmp ecx, 5
  loop .l1

  pop edx ; edx = ImageBase
  pop edi ; edi = SizeOfImage

  ret
Octocontrabass
Member
Member
Posts: 5562
Joined: Mon Mar 25, 2013 7:01 pm

Re: Weird page fault triggering bug...

Post by Octocontrabass »

That function is supposed to copy the kernel from one location to another, right?

Have you checked to see if it's already corrupt before the copy happens?
Matt1223
Member
Member
Posts: 45
Joined: Mon Jul 30, 2018 2:58 am

Re: Weird page fault triggering bug...

Post by Matt1223 »

Octocontrabass wrote:That function is supposed to copy the kernel from one location to another, right?
Right. The kernel.exe file is present at 0x10000.
Octocontrabass wrote:Have you checked to see if it's already corrupt before the copy happens?
I'm working on it right now. There are some interresting things happening. So it looks like the load_kernel function breaks the file it is supposed to be copying. The instruction is right before it's copied to the place it is supposed to be:

Code: Select all

<bochs:14> disasm 0x00000012342
0000000000012342: (                    ): add eax, 0x0040e064       ; 0564e04000
Then after the load_kernel function is called this happens:

Code: Select all

<bochs:17> disasm 0x00000012342
0000000000012342: (                    ): add eax, 0x2345e064       ; 0564e04523
<bochs:18> disasm 0x000000402f42
0000000000402f42: (                    ): add eax, 0x2345e064       ; 0564e04523
The code that was meant to be just copied changed and this changed code was copied to 0x402f42.
Matt1223
Member
Member
Posts: 45
Joined: Mon Jul 30, 2018 2:58 am

Re: Weird page fault triggering bug...

Post by Matt1223 »

I've found the problem!

So before loading kernel I was enabling a20 with a following function:

Code: Select all

enable_a20:
  ;set a20 http://wiki.osdev.org/A20 (Fast A20 Gate)
  pushad ;test a20
  mov edi,0x112345  ;odd megabyte address.
  mov esi,0x012345  ;even megabyte address.
  mov [esi],esi     ;making sure that both addresses contain diffrent values.
  mov [edi],edi     ;(if A20 line is cleared the two pointers would point to the address 0x012345 that would contain 0x112345 (edi)) 
  cmpsd             ;compare addresses to see if the're equivalent.
  popad
  jne A20_on        ;if not equivalent , A20 line is set.
    
  in al, 0x92 ;enable a20
  test al, 2
  jnz A20_on
  or al, 2
  and al, 0xFE
  out 0x92, al
   
  A20_on:

  ret

Code: Select all

  mov edi,0x112345  ;odd megabyte address.
  mov esi,0x012345  ;even megabyte address.
  mov [esi],esi     ;making sure that both addresses contain diffrent values.
  mov [edi],edi     ;(if A20 line is cleared the two pointers would point to the address 0x012345 that would contain 0x112345 
This looks suspicious ,right? =P~

What a stupid bug to make!
Post Reply