tripple fault when enabling paging
tripple fault when enabling paging
I return again, after battling with paging all day long with yet another problem.
I followed this tutorial today:
"Implementing Basic Paging" - http://www.osdever.net/tutorials/paging.php?the_id=43
The first error took me hours of debugging to figure out. What that guy didn't mention in his tutorial is that by choosing the addresses he suggests, and following his code, you end up bumping into VRAM! So when you try to read back the page table address, you get 0xffffffff (on bochs) or 0x00000000 (on vmware)!
So I tried relocating my page tables to the 1MB boundary (0x100000) and that seemed to work, in the sense that the page table entries seemed to be correct now (0x0003, 0x1003, 0x2003, 0x4003 etc). The first page directory entry contains as the first entry 0x101003, and 0x0 for all the other entries (not-present).
Following the tutorial, I set up paging to map the first 4MB of memory to their real physical addresses. If I understand correctly, I should then be able to use the first 4MB of memory with paging turned on without getting any page faults. This was a bit of a major assumption since I haven't yet set up any interrupt tables or interrupt handlers.
However, right after turning on paging, the CPU tripple faults with a page error, which I didn't expect, not accessing anything after the first 4MB.
The message in bochs is this:
"Message: exception(): 3rd (14) exception with no resolution"
I thought maybe the A20 line hadn't been enabled properly, but I verified this by writing a byte to 0x100000 then reading 0x000000 and checking that they were not the same - they weren't so the A20 line seems fine.
Could anybody verify that this shouldn't happen, and if so, possibly suggest why it might?
My code is attached.
I followed this tutorial today:
"Implementing Basic Paging" - http://www.osdever.net/tutorials/paging.php?the_id=43
The first error took me hours of debugging to figure out. What that guy didn't mention in his tutorial is that by choosing the addresses he suggests, and following his code, you end up bumping into VRAM! So when you try to read back the page table address, you get 0xffffffff (on bochs) or 0x00000000 (on vmware)!
So I tried relocating my page tables to the 1MB boundary (0x100000) and that seemed to work, in the sense that the page table entries seemed to be correct now (0x0003, 0x1003, 0x2003, 0x4003 etc). The first page directory entry contains as the first entry 0x101003, and 0x0 for all the other entries (not-present).
Following the tutorial, I set up paging to map the first 4MB of memory to their real physical addresses. If I understand correctly, I should then be able to use the first 4MB of memory with paging turned on without getting any page faults. This was a bit of a major assumption since I haven't yet set up any interrupt tables or interrupt handlers.
However, right after turning on paging, the CPU tripple faults with a page error, which I didn't expect, not accessing anything after the first 4MB.
The message in bochs is this:
"Message: exception(): 3rd (14) exception with no resolution"
I thought maybe the A20 line hadn't been enabled properly, but I verified this by writing a byte to 0x100000 then reading 0x000000 and checking that they were not the same - they weren't so the A20 line seems fine.
Could anybody verify that this shouldn't happen, and if so, possibly suggest why it might?
My code is attached.
Re:tripple fault when enabling paging
Reasons for a page fault:
(1) You try to access a page from user mode, which is marked as supervisor.
- All my pages have the supervisor bit set, but I'm the kernel and I don't even have such a thing as user mode yet. So this can't be the problem.
(2) You try to write to a read only page.
- All my pages are set to read/write access. So this can't be the problem either.
(3) You try to access a page marked as "not present".
- I have my entire first page table (the first 4MB) marked as present, and I don't access memory above the 4MB mark. So I fail to see why this would be a problem either.
So my question is, why am I getting a page fault? If you need more source other than the paging code attached in the above message, feel free to ask and I can upload the rest (there isn't really much more)
??? ??? ???
(1) You try to access a page from user mode, which is marked as supervisor.
- All my pages have the supervisor bit set, but I'm the kernel and I don't even have such a thing as user mode yet. So this can't be the problem.
(2) You try to write to a read only page.
- All my pages are set to read/write access. So this can't be the problem either.
(3) You try to access a page marked as "not present".
- I have my entire first page table (the first 4MB) marked as present, and I don't access memory above the 4MB mark. So I fail to see why this would be a problem either.
So my question is, why am I getting a page fault? If you need more source other than the paging code attached in the above message, feel free to ask and I can upload the rest (there isn't really much more)
??? ??? ???
Re:tripple fault when enabling paging
Hi,
What does Bochs say (at the end of bochsout.txt), especially the value in CR2?
Also you could halt the CPU just before paging is enabled and use Bochs debugger to examine the physical memory (and registers like CR3) to make sure everything is correct.
I'd start with CR3 and then check the page directory, then the page table (walk through it the same as the CPU would).
If you get a page fault immediately after your "movl %0, %%cr0" then I'd guess that the CPU can't read your code anymore.
Cheers,
Brendan
What does Bochs say (at the end of bochsout.txt), especially the value in CR2?
Also you could halt the CPU just before paging is enabled and use Bochs debugger to examine the physical memory (and registers like CR3) to make sure everything is correct.
I'd start with CR3 and then check the page directory, then the page table (walk through it the same as the CPU would).
If you get a page fault immediately after your "movl %0, %%cr0" then I'd guess that the CPU can't read your code anymore.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:tripple fault when enabling paging
just a thought ... is your data-segment zero-based ? If not, the address you use to program the tables and the address they have for CR3 is *not* the same ...
Re:tripple fault when enabling paging
Well I wrote debugging code which does the following just before turning on paging:
Writes out the first 3 entries of the page directory to the screen.
Writes out the first 3 entries of the page table to screen.
Tries setting the cr3 register and writing out it's value to the screen.
All of the values seem perfectly fine:
Page Directory: 0x101003 0x000002 0x000002
Page Table: 0x000003 0x001003 0x002003
CR3 Register: 0x100000
I'm not quite sure how I would use the bochs debugger to debug this. Perhaps set a "watch read" on 0x100000 or something? I dunno.
From bochsout I have:
So at least I know now it's not something obvious and stupid I've done, otherwise I guess you guys (and the various other people on IRC etc) would have spotted it by now.
I think what I might have to do is go learn interrupt handling today and set up an interrupt handler for the page fault. Then I can see a more details like what caused it (not present? not enough privilages? no write access?).
Although, that CR2=0x00000040 looks strange to me. That would be the address which was accessed that caused the page fault right? hm...strange since all my code is loaded well above that.
Well I guess I'll go learn interrupt handling now. Of course if anybody has any more ideas, please share.
And thanks for at least taking a look.
Writes out the first 3 entries of the page directory to the screen.
Writes out the first 3 entries of the page table to screen.
Tries setting the cr3 register and writing out it's value to the screen.
All of the values seem perfectly fine:
Page Directory: 0x101003 0x000002 0x000002
Page Table: 0x000003 0x001003 0x002003
CR3 Register: 0x100000
I'm not quite sure how I would use the bochs debugger to debug this. Perhaps set a "watch read" on 0x100000 or something? I dunno.
From bochsout I have:
Code: Select all
00001310831p[CPU ] >>PANIC<< exception(): 3rd (14) exception with no resolution
00001310831i[SYS ] Last time is 1092310929
00001310831i[CPU ] protected mode
00001310831i[CPU ] CS.d_b = 32 bit
00001310831i[CPU ] SS.d_b = 32 bit
00001310831i[CPU ] | EAX=e0000011 EBX=60000011 ECX=00000520 EDX=0000000f
00001310831i[CPU ] | ESP=0008ffd8 EBP=0008ffd8 ESI=0000069a EDI=00000005
00001310831i[CPU ] | IOPL=0 NV UP DI NG NZ NA PE NC
00001310831i[CPU ] | SEG selector base limit G D
00001310831i[CPU ] | SEG sltr(index|ti|rpl) base limit G D
00001310831i[CPU ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00001310831i[CPU ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00001310831i[CPU ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00001310831i[CPU ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00001310831i[CPU ] | SS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00001310831i[CPU ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00001310831i[CPU ] | EIP=000100d7 (000100d7)
00001310831i[CPU ] | CR0=0xe0000011 CR1=0x00000000 CR2=0x00000040
00001310831i[CPU ] | CR3=0x00100000 CR4=0x00000000
00001310831i[ ] restoring default signal behavior
00001310831i[CTRL ] quit_sim called with exit code 1
I think what I might have to do is go learn interrupt handling today and set up an interrupt handler for the page fault. Then I can see a more details like what caused it (not present? not enough privilages? no write access?).
Although, that CR2=0x00000040 looks strange to me. That would be the address which was accessed that caused the page fault right? hm...strange since all my code is loaded well above that.
Well I guess I'll go learn interrupt handling now. Of course if anybody has any more ideas, please share.
And thanks for at least taking a look.
Re:tripple fault when enabling paging
Hi,
If all that looks good, shift the endless loop to just after paging is enabled and see if Bochs crashes or not
To me everything I've seen so far looks good (apart from the minor/unrelated "CR0=0xe0000011" indicating that the CPU caches are disabled). IMHO there's only 3 ways to find your bug. The first way is to step through all your paging structures with Bochs (as described above). The next way is to find something wrong in the C source (which no-one's been able to do so far). The other way is to go through a disassembly of your binary with pen, paper and magnifying glass. This last method will find problems caused by the compiler (e.g. compiler bugs or unexpected padding/alignment of unsigned longs).
Cheers,
Brendan
First you need to stop execution at the right spot. Putting a "for(;;)" just before turning paging on with "WriteCR0()" will do. Then go to the console screen and press "control + c" to stop Bochs inside the endless loop (you'll get a command prompt from Bochs). Now, try "info cpu" and see what is says for CR3. Then try "x /32 0x100000", which will (hopefully) display your page directory contents, followed by "x /32 0x101000" for the first page table.IRBMe wrote: I'm not quite sure how I would use the bochs debugger to debug this. Perhaps set a "watch read" on 0x100000 or something? I dunno.
If all that looks good, shift the endless loop to just after paging is enabled and see if Bochs crashes or not
I wouldn't have noticed much - I only had a brief look (it's written in C & I'm an assembly programmer). I'm also starting to realise that it's better to teach someone how to fix their own bugs, rather than giving an answer like "change <something> on line <something> to <something> and it'll work"... There's a saying - something like "give a man a fish and he'll eat for a day, but teach a man to catch fish and he'll eat until he's old and senile..."IRBMe wrote: So at least I know now it's not something obvious and stupid I've done, otherwise I guess you guys (and the various other people on IRC etc) would have spotted it by now.
It is strange and it's also probably wrong. Bochs gave me the same CR2=0x00000040 when I completely messed up PAE paging (which I found out Bochs only pretends to support, after much confusion). I'd ignore CR2 as I think it's a bug in Bochs. A normal page fault is handled by Bochs correctly though, so I'm going to guess that something's fairly messed up (I'm still thinking the instruction after the "movl %0, %%cr0" can't be accessed after paging is initialized).IRBMe wrote: Although, that CR2=0x00000040 looks strange to me. That would be the address which was accessed that caused the page fault right? hm...strange since all my code is loaded well above that.
If I'm right and the CPU/Bochs can't access your code when paging is enabled, then it probably won't be able to access your exception handlers and/or stack either. In this case very good exception handlers still won't help.IRBMe wrote: I think what I might have to do is go learn interrupt handling today and set up an interrupt handler for the page fault. Then I can see a more details like what caused it (not present? not enough privilages? no write access?).
To me everything I've seen so far looks good (apart from the minor/unrelated "CR0=0xe0000011" indicating that the CPU caches are disabled). IMHO there's only 3 ways to find your bug. The first way is to step through all your paging structures with Bochs (as described above). The next way is to find something wrong in the C source (which no-one's been able to do so far). The other way is to go through a disassembly of your binary with pen, paper and magnifying glass. This last method will find problems caused by the compiler (e.g. compiler bugs or unexpected padding/alignment of unsigned longs).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:tripple fault when enabling paging
Well I think I've found out a good bit more about the problem.
I have these 2 global variables:
PAGE_DIRECTORY_ADDRESS = 0x100000
PAGE_TABLE_ADDRESS = 0x101000
Now, when I write to page_directory like so:
...then I examine memory at 0x100000. And sure enough, there's 0x1234 0xabcd. ok! great!
Now I repeat for page_table:
..then I examine memory at 0x101000. And....uh oh! It's 0x0000 0x0000!
If I read the values back with code like so:
Then sure enough, then value and value2 have the original values I assigned: 0x1234 and 0xabcd respectively.
Something's not right there!
Further more, if I create a LOCAL variable and write to that:
Then I examine memory at 0x101000, then sure enough there's 0x1234 and 0xabcd!
If, after doing that, I read back my page_table variable again:
They still contain 0!
So it seems that the global variable:
is NOT the same as the local variable:
Yet it seems to work ok with the "page_directory" global variable.
This is the root of my page table problem. Due to that weird error, my page_table variable (which should point to 0x101000) contains the correct page table. But the actual address 0x101000 where cr3 points to is full of 0's!
Now what in the heck could cause that?!
I have these 2 global variables:
Code: Select all
unsigned long *page_directory = (unsigned long *) PAGE_DIRECTORY_ADDRESS;
unsigned long *page_table = (unsigned long *) PAGE_TABLE_ADDRESS;
PAGE_TABLE_ADDRESS = 0x101000
Now, when I write to page_directory like so:
Code: Select all
page_directory[0] = 0x1234;
page_directory[1] = 0xabcd;
Now I repeat for page_table:
Code: Select all
page_table[0] = 0x1234;
page_table[1] = 0xabcd;
If I read the values back with code like so:
Code: Select all
value = page_table[0];
value2 = page_table[1];
Something's not right there!
Further more, if I create a LOCAL variable and write to that:
Code: Select all
unsigned long *p = (unsigned long*) PAGE_TABLE_ADDRESS
p[0] = 0x1234;
p[1] = 0xabcd;
If, after doing that, I read back my page_table variable again:
Code: Select all
value = page_table[0];
value2 = page_table[1];
So it seems that the global variable:
Code: Select all
unsigned long * page_table = (unsigned long *) PAGE_TABLE_ADDRESS;
Code: Select all
unsigned long * p = (unsigned long *) PAGE_TABLE_ADDRESS;
This is the root of my page table problem. Due to that weird error, my page_table variable (which should point to 0x101000) contains the correct page table. But the actual address 0x101000 where cr3 points to is full of 0's!
Now what in the heck could cause that?!
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:tripple fault when enabling paging
okay, so your very first page is not present (better if you want to catch null pointers) and 0x40 is on that page, so i suppose some null/misinitialized pointers to be used here (with an offset, possibly)
The real question is "how far is EIP=000100d7 (000100d7) from the paging intialization thing".
Indeed, objdump -drS yourkernel.o might be the best option for you ...
Then look where you are (in the ASM and in the C source) and *then* wonder how you get there...
The real question is "how far is EIP=000100d7 (000100d7) from the paging intialization thing".
Indeed, objdump -drS yourkernel.o might be the best option for you ...
Then look where you are (in the ASM and in the C source) and *then* wonder how you get there...
Re:tripple fault when enabling paging
hahaha I can't f***ing believe it!
Broken code:
Fixed code:
I forgot the brackets around the defines, so it turned into:
instead of
So, it was more of a glaringly obvious error than I thought (only not that obvious really).
*still laughing in disbelief*
Thanks for the help though. ;)
Broken code:
Code: Select all
unsigned long *page_directory = (unsigned long *) PAGE_DIRECTORY_ADDRESS;
unsigned long *page_table = (unsigned long *) PAGE_TABLE_ADDRESS;
Code: Select all
unsigned long *page_directory = (unsigned long *) (PAGE_DIRECTORY_ADDRESS);
unsigned long *page_table = (unsigned long *) (PAGE_TABLE_ADDRESS);
Code: Select all
unsigned long *page_table = (unsigned long *) PAGE_DIRECTORY_ADDRESS + PAGE_DIRECTORY_SIZE
Code: Select all
unsigned long *page_table = (unsigned long *) (PAGE_DIRECTORY_ADDRESS + PAGE_DIRECTORY_SIZE)
*still laughing in disbelief*
Thanks for the help though. ;)
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:tripple fault when enabling paging
For further steps, make sure that your #defines can be used as a single token.
like #define X (Y+Z) ...
It will work better than having to use (X) instead of X everywhere in the code ...
like #define X (Y+Z) ...
It will work better than having to use (X) instead of X everywhere in the code ...
Re:tripple fault when enabling paging
yup that's how I did it. Live and learn, I believe is the expression
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:tripple fault when enabling paging
in the same "folder of recommendations", it is suggested that you have ".H" for pure-declarative (i.e. no code) interfaces and .C for "pure code" (no structure definitions, no #defines or whatsoever) ...
It eases the "modularization" of your code: Video.H describes all components need to access functions/abstractions offered by the video module and Video.C is the code itself, compiled apart, linked with the rest so that clear_screen() function is present only once in your project.
And by having Video.C including Video.H, you can check that the implementation keeps consistent with the interface ...
It eases the "modularization" of your code: Video.H describes all components need to access functions/abstractions offered by the video module and Video.C is the code itself, compiled apart, linked with the rest so that clear_screen() function is present only once in your project.
And by having Video.C including Video.H, you can check that the implementation keeps consistent with the interface ...
Re:tripple fault when enabling paging
And in the alternate view department: Your *.c files will (or should at least) have structure definitions and #defines for anything that ought not be externally referenced. Your *.h files should contain as little as possible. This prevents other code from messings around in bits it shouldn't. It's called "encapsulation" and is one of the things, along with "inheritence" and "polymorphism" that many people mistakenly believe you need C++ for but, in fact, is quite useful in plain old ANSI C.Pype.Clicker wrote:in the same "folder of recommendations", it is suggested that you have ".H" for pure-declarative (i.e. no code) interfaces and .C for "pure code" (no structure definitions, no #defines or whatsoever) ...
The latter two are only an issue if you're using object oriented design, but encapsulation is always a good idea. I shudder to think of projects where all the nitty-gritty details of a particular .c file's implementation are available for all to see in it's .h file. Don't declare any structures there at all if you can avoid it (and you usually can). Don't declare global variables there either. Instead of this:
Code: Select all
extern volatile int systemTicks;
Code: Select all
inline int getSystemTicks()
{
extern volatile int systemTicks;
return systemTicks;
}
Encapsulation. Gotta love it...
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:tripple fault when enabling paging
I didn't think "inline" was part of ANSI C...
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Re:tripple fault when enabling paging
Yup, it is. It was introduced in the C99 standard.Colonel Kernel wrote: I didn't think "inline" was part of ANSI C...