Debugging a page fault on function return?
Posted: Mon Sep 12, 2011 1:36 pm
Hi, everyone!
I'm new to OS development, but not programming in general. I'm working on my first project, mostly by following JamesM's tutorial. Some stuff is my own, but the major things are still from JamesM. I'm currently working on debugging the stuff from chapter 7, "the heap".
After fixing (or so it seems) two or three bugs that seemed to occur even in the the downloadable code, I found another problem after doing some stress testing with kmalloc() and kfree(); namely, I get a page fault that I can't seem to debug properly.
It occurs after a bunch of "random" (deterministic) allocations and frees, and is triggered during allocation of a block.
Now, here's the problem... With the previous page faults I've gotten, it was relatively easy to find out which line of code triggered it. In this case, GDB doesn't help me at all! I might be missing something, though, which is why I ask you, the more knowledgeable.
The interrupt handler appears to be called on returning from a successful allocation, and I don't see why. When debugging page faults previously, the occured on access (i.e when executing *some_pointer = 0; or such); not now.
Here's the entire GDB output from boot to panic: http://pastebin.com/NHnNwknj (all the lines with just (gdb) on them are implicit next commands)
Mostly uninteresting things; I just don't understand why it enters the interrupt handler when exiting the function...?
I actually don't think any more code than in the debug output is necessary for now; I mostly want to get hints on *how* to debug this.
All my code is in C or assembly (only tiny bits of assembly), and I use gcc 4.6 (no optimization flags; -O0 -ggdb3) to compile.
Relevant details for the page fault:
CR2 (faulting address) is 0x2be3a70 (far from any revelant addresses, i.e. ~0x0... for the kernel + stack and 0xC0000000 for the heap).
Error code = 0x02 (page not present, action=write, kernel mode).
Again, I'm not sure which details to post, but since I'm really asking how to begin debugging this I won't bother posting bunches of source code.
It's all available (with some lag, but it's up-to-date as of right now) on github, though.
Thanks in advance.
I'm new to OS development, but not programming in general. I'm working on my first project, mostly by following JamesM's tutorial. Some stuff is my own, but the major things are still from JamesM. I'm currently working on debugging the stuff from chapter 7, "the heap".
After fixing (or so it seems) two or three bugs that seemed to occur even in the the downloadable code, I found another problem after doing some stress testing with kmalloc() and kfree(); namely, I get a page fault that I can't seem to debug properly.
It occurs after a bunch of "random" (deterministic) allocations and frees, and is triggered during allocation of a block.
Now, here's the problem... With the previous page faults I've gotten, it was relatively easy to find out which line of code triggered it. In this case, GDB doesn't help me at all! I might be missing something, though, which is why I ask you, the more knowledgeable.
The interrupt handler appears to be called on returning from a successful allocation, and I don't see why. When debugging page faults previously, the occured on access (i.e when executing *some_pointer = 0; or such); not now.
Here's the entire GDB output from boot to panic: http://pastebin.com/NHnNwknj (all the lines with just (gdb) on them are implicit next commands)
Mostly uninteresting things; I just don't understand why it enters the interrupt handler when exiting the function...?
I actually don't think any more code than in the debug output is necessary for now; I mostly want to get hints on *how* to debug this.
All my code is in C or assembly (only tiny bits of assembly), and I use gcc 4.6 (no optimization flags; -O0 -ggdb3) to compile.
Relevant details for the page fault:
CR2 (faulting address) is 0x2be3a70 (far from any revelant addresses, i.e. ~0x0... for the kernel + stack and 0xC0000000 for the heap).
Error code = 0x02 (page not present, action=write, kernel mode).
Again, I'm not sure which details to post, but since I'm really asking how to begin debugging this I won't bother posting bunches of source code.
It's all available (with some lag, but it's up-to-date as of right now) on github, though.
Thanks in advance.