Debugging a page fault on function return?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
exscape
Posts: 3
Joined: Sun Sep 04, 2011 5:13 am

Debugging a page fault on function return?

Post by exscape »

Hi, everyone!
I'm new to OS development, but not programming in general. I'm working on my first project, mostly by following JamesM's tutorial. Some stuff is my own, but the major things are still from JamesM. I'm currently working on debugging the stuff from chapter 7, "the heap".
After fixing (or so it seems) two or three bugs that seemed to occur even in the the downloadable code, I found another problem after doing some stress testing with kmalloc() and kfree(); namely, I get a page fault that I can't seem to debug properly.
It occurs after a bunch of "random" (deterministic) allocations and frees, and is triggered during allocation of a block.

Now, here's the problem... With the previous page faults I've gotten, it was relatively easy to find out which line of code triggered it. In this case, GDB doesn't help me at all! I might be missing something, though, which is why I ask you, the more knowledgeable. ;)
The interrupt handler appears to be called on returning from a successful allocation, and I don't see why. When debugging page faults previously, the occured on access (i.e when executing *some_pointer = 0; or such); not now.
Here's the entire GDB output from boot to panic: http://pastebin.com/NHnNwknj (all the lines with just (gdb) on them are implicit next commands)
Mostly uninteresting things; I just don't understand why it enters the interrupt handler when exiting the function...?
I actually don't think any more code than in the debug output is necessary for now; I mostly want to get hints on *how* to debug this.

All my code is in C or assembly (only tiny bits of assembly), and I use gcc 4.6 (no optimization flags; -O0 -ggdb3) to compile.

Relevant details for the page fault:
CR2 (faulting address) is 0x2be3a70 (far from any revelant addresses, i.e. ~0x0... for the kernel + stack and 0xC0000000 for the heap).
Error code = 0x02 (page not present, action=write, kernel mode).

Again, I'm not sure which details to post, but since I'm really asking how to begin debugging this I won't bother posting bunches of source code.
It's all available (with some lag, but it's up-to-date as of right now) on github, though.

Thanks in advance. :)
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Debugging a page fault on function return?

Post by xenos »

I agree with berkus - a page fault with some random address upon return from a function usually means that you trashed your stack and tried to return to some random, non-present address. Another possibility would be trashing the stack pointer so that it points to that non-present address. The stack is the only memory accessed by the "ret" instruction, and the memory access to the return address occurs because of the following instruction fetch.

However, since your error code indicates a write access, and the two mentioned cases are both read accesses, I wonder whether the "ret" instruction is actually causing the page fault. Have you checked at which (assembly) instruction the page fault occurs? You can find this address on the page fault handler's stack along with the error code, or use bochs' debugger and single step (one step per instruction) through your code to see what actually happens.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
exscape
Posts: 3
Joined: Sun Sep 04, 2011 5:13 am

Re: Debugging a page fault on function return?

Post by exscape »

Ah, I fixed it - thanks for the tip regarding the bochs debugger. :)
I hadn't used Bochs before because I couldn't get it to compile properly on OS X.

Anyhow, the error was that I used the wrong variable for array indexing back where the call to kmalloc() was made, in kmain(). I had an (stack-based) array with 1000 elements and accessed it (saving a variable to it) in a loop between index 0 and 9999.
I wish GDB would've told me that, rather than having to break into disassembly, though. :)
Post Reply