Page 1 of 3

[Solved] OS crashes on paging enable

Posted: Fri Jun 02, 2017 3:38 pm
by BluCode
I am trying to implement paging in my OS, but as soon as I set the paging bit of cr0, the OS crashes. I am following James Molloy's tutorial but I can't get it to work. When I follow the tutorial on the osdev wiki here it works fine, and I don't understand paging enough to debug it (yet). When I use gdb here is what I get:

Code: Select all

(gdb) b switch_page_directory
Breakpoint 1 at 0x84ba: file kernel/paging.c, line 170.
(gdb) continue
Continuing.

Breakpoint 1, switch_page_directory (dir=0xc000) at kernel/paging.c:170
170     {
(gdb) step
171        current_directory = dir;
(gdb)
172        asm volatile("mov %0, %%cr3":: "r"(&dir->tablesPhysical));
(gdb)
174        asm volatile("mov %%cr0, %0": "=r"(cr0));
(gdb)
175        cr0 |= 0x80000000; // Enable paging!
(gdb)
176        asm volatile("mov %0, %%cr0":: "r"(cr0));
(gdb)
177     }
(gdb)
0x0000e05b in ?? ()
(gdb)
Cannot find bounds of current function
(gdb)
At which point if I continue, QEMU resets.

My paging.c file: https://hastebin.com/akequpexij.cs
Can anyone see what I am doing wrong?

Re: OS crashes on paging enable

Posted: Fri Jun 02, 2017 4:58 pm
by BrightLight
I just quickly scanned your code as I don't have much time right now, and I'm more sleepy than awake, but it's worth nothing that all paging structures must be physical addresses and aligned to 4 KB (4096 bytes). In short, it means that the hexadecimal address should always have the lowest three digits zero (i.e. 0x1000, 0x2000, 0x7A8000, etc.) I see you are using kmalloc to allocate the page directory - how does your kmalloc work? My wild guess says your kmalloc doesn't return page-aligned addresses.
Also, why are you hard-coding the amount of memory as 16 MB? If you are using a multiboot-compatible boot loader (like GRUB,) you have the memory map information passed to you.
As a better debug technique for debugging paging, try to stop right before you set bit 31 of CR0 register, and dump all page related structures/registers, including CR3, the complete page directory and the complete page tables.

Re: OS crashes on paging enable

Posted: Fri Jun 02, 2017 6:33 pm
by BluCode
My kmalloc_a function returns an aligned address always. I am hard coding the amount because I am using my own bootloader and don't know how to get that info myself. Could you elaborate on what commands I should use to dump all of that?

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 2:12 am
by iansjack
Stop the code just before you enable paging. (asm("jmp .") will do the job), and then type "info mem" in the qemu monitor. This will show you which address ranges are mapped in your page directory. You should also examine your page tables (use the "print" or "x" commands in gdb or the "x" command in the qemu monitor) to see if they look right. Once you see what is wrong with them it should be possible to work out where your error is.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 6:19 am
by LtG
Since you already use gdb you don't need to add any "jmp .", you can just single step (as you were doing) and check the paging stuff from Qemu, once the CR3 is set.

As for getting the memory info, see the wiki:
http://wiki.osdev.org/Detecting_Memory_ ... .3D_0xE820

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 6:33 am
by iansjack
It's much more efficient to put judicious halts into a program rather than having to single-step through all the instructions up to the point of interest. Alternatively, you can set breakpoints at the appropriate places. If you're going to run the program, up to a certain point, a number of times an inserted halt is the most efficient solution.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 6:55 am
by LtG
iansjack wrote:It's much more efficient to put judicious halts into a program rather than having to single-step through all the instructions up to the point of interest. Alternatively, you can set breakpoints at the appropriate places. If you're going to run the program, up to a certain point, a number of times an inserted halt is the most efficient solution.
To each their own, I guess. I find using breakpoints and single stepping far easier and faster. It allows me to dynamically inspect what I want, instead of modifying the code all the time and recompiling to move the infinite-loop "breakpoint".

I, (hopefully) obviously, wasn't suggesting single stepping 1MiB executable for the first 500k steps =)

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 8:03 am
by BluCode
After doing some testing here is what I get:

Code: Select all

(gdb) x kernel_directory
0xc000: 0x0000f000
(gdb) x 0xf000
0xf000: 0x00000005
(gdb) x 0xf004
0xf004: 0x00001005
(gdb) x 0xf008
0xf008: 0x00002005
(gdb) x 0xf030
0xf030: 0x0000c005
(gdb) x 0xf03c
0xf03c: 0x0000f005
(gdb) x 0xf040
0xf040: 0x00000000

Code: Select all

(qemu) info mem
0000000000000000-0000000000010000 0000000000010000 -r-
(qemu)
Which leads me to believe that for some reason it isn't paging enough of the memory, but I don't really know. When I use x in gdb to inspect the page tables at 0x0000, 0x1000, 0x2000 etc I get very different results for each. I hope you can do something more with this info.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 8:25 am
by goku420
You have a number of subtle bugs. First:

Code: Select all

page->user = (is_kernel)?0:1;
Setting user/supervisor bit sets user mode, not the other way around.

Code: Select all

dir->tablesPhysical[table_idx] = tmp | 0x7; // PRESENT, RW, US.
Again, this sets user mode.

I have a feeling that you copy/pasted some of James Molloy's code which contains gotchas for the unobservant.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 8:42 am
by BluCode
goku420 wrote:

Code: Select all

page->user = (is_kernel)?0:1;
Setting user/supervisor bit sets user mode, not the other way around.
As I understand it, that code means that if is_kernel is true, the value will be zero, and if not it will be 1, which is correct.
goku420 wrote:

Code: Select all

dir->tablesPhysical[table_idx] = tmp | 0x7; // PRESENT, RW, US.
Again, this sets user mode.

I have a feeling that you copy/pasted some of James Molloy's code which contains gotchas for the unobservant.
And what is the issue with setting user mode? Would that cause the kernel to reset? According to the wiki 'If the bit is set, then the page may be accessed by all' so that shouldn't be a problem.

Yes I am reusing his code, but as far as I can tell what you pointed out won't have a significant effect.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 8:47 am
by goku420
BluCode wrote:And what is the issue with setting user mode? Would that cause the kernel to reset? According to the wiki 'If the bit is set, then the page may be accessed by all' so that shouldn't be a problem.
The very same page you referenced says "Therefore if you wish to make a page a user page, you must set the user bit in the relevant page directory entry as well as the page table entry.". Also it's clear from your page directory* that somehow the flags are getting screwed up (0x5 implies the RW bit is cleared).

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 8:56 am
by BluCode
I see, do you have any idea as to how I can fix my flags? I'm afraid that I really don't have much idea what I'm doing here yet.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 2:38 pm
by iocoder
Hi!

I'd like to leave several notes that might help you with your issue.

First, what is the value of cr3? gdb printout suggests that the value of "kernel_directory" variable is 0x0000C000:

Code: Select all

Breakpoint 1, switch_page_directory (dir=0xc000) at kernel/paging.c:170
Looking at your code, it seems to me that kernel_directory points to a data structure that contains, at least, these two arrays:
  • An array that stores the virtual addresses of your allocated page tables (called tables[]).
  • An array that stores page entries to your page tables (called tablesPhysical[]).
The second array is what we call a page directory. Each page entry consists of flags + physical address of a page table. If you go to that page table, you shall find other 1024 page entries that actually point to actual memory frames. Each page entry consists of flags + physical address of an actual memory frame.

However, there seems to be some confusion in the comments above...

Code: Select all

dir->tablesPhysical[table_idx] = tmp | 0x7; // PRESENT, RW, US.
This peaceful line of code sets the flag of an entry in the page directory, not page table. However, you are referring to the part of the wiki article that talks about page table entries, not page directory entries:
BluCode wrote:
goku420 wrote:

Code: Select all

page->user = (is_kernel)?0:1;
Setting user/supervisor bit sets user mode, not the other way around.
goku420 wrote:

Code: Select all

dir->tablesPhysical[table_idx] = tmp | 0x7; // PRESENT, RW, US.
Again, this sets user mode.

I have a feeling that you copy/pasted some of James Molloy's code which contains gotchas for the unobservant.
And what is the issue with setting user mode? Would that cause the kernel to reset? According to the wiki 'If the bit is set, then the page may be accessed by all' so that shouldn't be a problem.

Yes I am reusing his code, but as far as I can tell what you pointed out won't have a significant effect.
So, if I understand your code correctly, I expect that your page directory actually contains one entry [at index 0]: 0x0000F007. It obviously points to a page table at 0x0000F000, and the flags PRESENT, RW, and US are all set (and it's OK).

Now if we go to your page table at 0x0000F000, we should see entries to actual memory frames. This is related to line 159:

Code: Select all

       // Kernel code is readable but not writeable from userspace.
       alloc_frame( get_page(i, 1, kernel_directory), 0, 0);
You pass 0 and 0 to alloc_frame(), which means: set US but please don't set RW. Thus, you will have only US and PRESENT flags set [i.e, 0x05]:

Code: Select all

(gdb) x 0xf000
0xf000: 0x00000005
(gdb) x 0xf004
0xf004: 0x00001005
(gdb) x 0xf008
0xf008: 0x00002005
(gdb) x 0xf030
0xf030: 0x0000c005
(gdb) x 0xf03c
0xf03c: 0x0000f005
(gdb) x 0xf040
0xf040: 0x00000000
This seems sensible to me. If the flag of your first page directory entry is 0x7 and the flags of your page entries are 0x5, then your US flag is actually set correctly.
BluCode wrote:I see, do you have any idea as to how I can fix my flags? I'm afraid that I really don't have much idea what I'm doing here yet.
In my point of view, your flags are OK.

So how can we make sure that the flags are set correctly? print the value of cr3, not kernel_directory variable. I believe (according to your gdb outputs) that kernel_directory points to 0xC000, which is the base for a data structure that holds the two arrays explained above. It seems to me that the first array (the virtual addresses of page tables) is actually based at 0xC000, while the second one might be at 0xD000 or 0xE000. We want this second one.

Code: Select all

(gdb) x kernel_directory
0xc000: 0x0000f000
^ This prints the first entry of kernel_directory->tables[] (the first array). We actually want the first entry of kernel_directory->tablesPhysical[] (the second array), since this actually what your processor sees and translates as a page directory. Remember how you have loaded cr3:

Code: Select all

172        asm volatile("mov %0, %%cr3":: "r"(&dir->tablesPhysical));
-----------------------------------------------

Now let's move to another point...
BluCode wrote:

Code: Select all

(qemu) info mem
0000000000000000-0000000000010000 0000000000010000 -r-
(qemu)
Which leads me to believe that for some reason it isn't paging enough of the memory, but I don't really know. When I use x in gdb to inspect the page tables at 0x0000, 0x1000, 0x2000 etc I get very different results for each. I hope you can do something more with this info.

Code: Select all

// Defined in kheap.c
extern u32int placement_address;

Code: Select all

   // We need to identity map (phys addr = virt addr) from
   // 0x0 to the end of used memory, so we can access this
   // transparently, as if paging wasn't enabled.
   // NOTE that we use a while loop here deliberately.
   // inside the loop body we actually change placement_address
   // by calling kmalloc(). A while loop causes this to be
   // computed on-the-fly rather than once at the start.
   int i = 0;
   while (i < placement_address)
   {
       // Kernel code is readable but not writeable from userspace.
       alloc_frame( get_page(i, 1, kernel_directory), 0, 0);
       i += 0x1000;
   }
Could you please check the value of "placement_address" variable?

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 3:21 pm
by BluCode
Wow, thanks for all that. Could you please clarify this sentence as you say page entry twice with different meanings?
iocoder wrote:The second array is what we call a page directory. Each page entry consists of flags + physical address of a page table. If you go to that page table, you shall find other 1024 page entries that actually point to actual memory frames. Each page entry consists of flags + physical address of an actual memory frame.
placement_address has a value of 65535 before paging is enabled, and cr3 is set to 0x0000D000.

I also get this when looking at 0xD000:

Code: Select all

(gdb) x 0xd000
0xd000: 0x0000f007
(gdb) x 0xd004
0xd004: 0x00000000
(gdb)
Which seems to suggest that there is only one table mapped, at 0xF000, which when read gives 0x00000005, as before. Thanks again for your help.

Re: OS crashes on paging enable

Posted: Sat Jun 03, 2017 4:05 pm
by iocoder
BluCode wrote:Wow, thanks for all that. Could you please clarify this sentence as you say page entry twice with different meanings?
iocoder wrote:The second array is what we call a page directory. Each page entry consists of flags + physical address of a page table. If you go to that page table, you shall find other 1024 page entries that actually point to actual memory frames. Each page entry consists of flags + physical address of an actual memory frame.
placement_address has a value of 65535 before paging is enabled, and cr3 is set to 0x0000D000.

I also get this when looking at 0xD000:

Code: Select all

(gdb) x 0xd000
0xd000: 0x0000f007
(gdb) x 0xd004
0xd004: 0x00000000
(gdb)
Which seems to suggest that there is only one table mapped, at 0xF000, which when read gives 0x00000005, as before. Thanks again for your help.
Sorry for not choosing the keywords precisely. The entries of page directory should be called 'page directory entries', and the entries of a page table should be called 'page table entries'. These are actually the terms used in the original intel 80386 programmer's reference manual (1986), which also uses the shortcuts 'DIR ENTRY' and 'PG TBL ENTRY' for both terms, respectively.