Reboot loop on paging enable

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
cormacobrien
Posts: 13
Joined: Thu Oct 02, 2014 6:24 pm
Location: Chicago

Reboot loop on paging enable

Post by cormacobrien »

Git repo is here.
The operating system allocates page frames from a stack located right after the kernel. This stack is initialized in src/kernel/kernel/kernel_init_pframe_stack.c, and allocation/freeing are performed by s/k/k/kernel_alloc_pframe.c and s/k/k/kernel_free_pframe.c, respectively. Everything works swimmingly up until the routine to enable paging, in src/kernel/arch/i386/kernel_enable_paging.s.

Here's the kernel_enable_paging routine:

Code: Select all

extern _kernel_page_directory
global kernel_enable_paging
section .text
kernel_enable_paging:
    mov eax, [_kernel_page_directory]
    mov cr3, eax

    mov eax, cr0
    or  eax, 0x80000000
    mov cr0, eax ; This line causes a reboot. It's commented out in the repo
    ret
I know that line causes a reboot because I can place a call to abort() before it and cause a kernel panic, but placing the abort() call after it does nothing. Any ideas what I've set up incorrectly?
Find me on Github and Bitbucket.
Octocontrabass
Member
Member
Posts: 5590
Joined: Mon Mar 25, 2013 7:01 pm

Re: Reboot loop on paging enable

Post by Octocontrabass »

Run your code in Bochs and read the error log. The log won't tell you exactly why it doesn't work, but it's often good enough to at least narrow down the possible causes. You should post the log here, too; it will help us narrow down the real cause of the reboot.

Add a "magic breakpoint" instruction and run your code in Bochs with the debugger and magic breakpoints enabled. When the breakpoint is hit, examine the values of the registers and memory associated with paging. Are they correct?
User avatar
cormacobrien
Posts: 13
Joined: Thu Oct 02, 2014 6:24 pm
Location: Chicago

Re: Reboot loop on paging enable

Post by cormacobrien »

The page enabling code really seems to go through without a hitch. Here's how things go:

Call the paging enable function:

Code: Select all

 > step
(0) [0x0000001003ff] 0010:00000000001003ff (unk. ctxt): jmp .-644 (0x00100180)    ; e97cfdffff
Move the address of the page directory into EAX:

Code: Select all

> step
(0) [0x000000100180] 0010:0000000000100180 (unk. ctxt): mov eax, dword ptr ds:0x00102000 ; a100201000

> reg
rax: 00000000_0ffef000
The value in RAX is correct, it's the same one that my kernel reports.

Copy the page directory address into CR3:

Code: Select all

> step
(0) [0x000000100185] 0010:0000000000100185 (unk. ctxt): mov cr3, eax              ; 0f22d8

> creg
CR3=0x00000ffef000
Note that this information was actually from the CR* poll from after the next call. I guess it takes longer to update...?

Copy CR0 to EAX:

Code: Select all

> step
(0) [0x000000100188] 0010:0000000000100188 (unk. ctxt): mov eax, cr0              ; 0f20c0

> reg
rax: 00000000_60000011
This one also took two steps to take effect, apparently.

Set the high bit:

Code: Select all

> step
(0) [0x00000010018b] 0010:000000000010018b (unk. ctxt): or eax, 0x80000000        ; 0d00000080

> reg
rax: 00000000_e0000011
Looks right to me.

Copy the correct value back into CR0:

Code: Select all

> step
(0) [0x000000100190] 0010:0000000000100190 (unk. ctxt): mov cr0, eax              ; 0f22c0

> creg
CR0=0xe0000011: PG CD NW ac wp ne ET ts em mp PE
Houston, we have paging.

Then these two guys showed up:

Code: Select all

> step
(0) [0x00000feed193] 0010:0000000000100193 (unk. ctxt): add byte ptr ds:[eax+edx*2], 0xfc ; 800450fc

> step
(0).[101655341] [0x00000feed193] 0010:0000000000100193 (unk. ctxt): add byte ptr ds:[eax+edx*2], 0xfc ; 800450fc
And one more step brings me to

Code: Select all

> step
(0) [0x0000fffffff0] f000:fff0 (unk. ctxt): jmpf 0xf000:e05b          ; ea5be000f0
Continuing from here gives me a reboot. Any ideas?
Find me on Github and Bitbucket.
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Reboot loop on paging enable

Post by Nable »

It looks like there's something wrong with your page tables (i.e. this function isn't inside identity-mapped region). AFAIR, you can examine page tables from Bochs debugger. So you should look at them after executing "mov cr0, eax" before going any further.
User avatar
cormacobrien
Posts: 13
Joined: Thu Oct 02, 2014 6:24 pm
Location: Chicago

Re: Reboot loop on paging enable

Post by cormacobrien »

Anything in particular I should be looking for? Bochs reports 1024 contiguous 4K blocks, looks right to me.
Find me on Github and Bitbucket.
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Reboot loop on paging enable

Post by stlw »

Ceann wrote:Anything in particular I should be looking for? Bochs reports 1024 contiguous 4K blocks, looks right to me.
Right after you enabled paging, do you have mapping for your EIP ?
The EIP is pointing to 'ret' instruction inside your kernel_enable_paging function.
This linear address is supposed to have 1:1 mapping such as the 'ret' instruction could be fetched immediately after enabling paging as well.

Right after mov to CR0, type:
> page CS:EIP
Does it map to the place you expected to map ?
I.e. physical(CS:EIP) == linear(CS:EIP) == CS.BASE + EIP ?

Stanislav
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Reboot loop on paging enable

Post by Brendan »

Hi,
Ceann wrote:Anything in particular I should be looking for? Bochs reports 1024 contiguous 4K blocks, looks right to me.
The virtual address space may have 1024 contiguous virtual pages; but that doesn't mean the physical pages are contiguous, and even if the physical pages were contiguous it still doesn't mean that the kernel is identity mapped.

Your bug is here:

Code: Select all

    const uint32_t page_table_addr = kernel_alloc_pframe();
    printf("First page table at 0x%x.\n", page_table_addr);
    uint32_t * const page_table = (uint32_t *)page_table_addr;
    for(size_t i = 0; i < 1024; ++i) {
        page_table[i] = kernel_alloc_pframe() | 3;           // *** BUG ***
    }
For identity mapping you'd want something more like:

Code: Select all

    const uint32_t page_table_addr = kernel_alloc_pframe();
    uint32_t physical_address = 0;
    printf("First page table at 0x%x.\n", page_table_addr);
    uint32_t * const page_table = (uint32_t *)page_table_addr;
    for(size_t i = 0; i < 1024; ++i) {
        page_table[i] = physical_address | 3;
        physical_address += 0x00001000;
    }
Note: I also happened to notice that your physical memory manager's initialisation is broken - it's a pity that C allows you to try to jam a 64-bit value into a 32-bit integer. ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Reboot loop on paging enable

Post by xenos »

Ceann wrote:Note that this information was actually from the CR* poll from after the next call. I guess it takes longer to update...?
...
This one also took two steps to take effect, apparently.
After typing "step", Bochs executes one instruction and shows the next instruction which is to be executed, i.e., it has not been executed yet. That means that when it shows the mov to cr3, you need to type "step" (or simply "s") once more to execute it, and then examine cr3. Also, at the point where you enter single step mode, Bochs shows you which instruction will be evaluated upon the next evaluation step.
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
User avatar
cormacobrien
Posts: 13
Joined: Thu Oct 02, 2014 6:24 pm
Location: Chicago

Re: Reboot loop on paging enable

Post by cormacobrien »

Brendan wrote:Note: I also happened to notice that your physical memory manager's initialisation is broken - it's a pity that C allows you to try to jam a 64-bit value into a 32-bit integer.
Any chance you could point me to where that's happening? I can't seem to find it :(

I have got it working now, thanks so much for your help. I'm a bit confused, though. I understand now why the kernel should be identity mapped (seems obvious), but am I supposed to only map the virtual addresses to the same physical ones up until the end of the kernel? Some of the ones that get mapped by the loop you wrote are also on the page frame stack, albeit at the very bottom, i.e.:

0x07fdc000 is the first free page frame. The last free page frame is the first 4K aligned block after the kernel. The kernel starts at 1M and only goes for a few KiB, but the first page table maps 4MiB of memory. What happens when the page frames in this first 4MiB segment get allocated? Or should I just set that memory aside?

Thanks all for your help.
Find me on Github and Bitbucket.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Reboot loop on paging enable

Post by Brendan »

Hi,
Ceann wrote:
Brendan wrote:Note: I also happened to notice that your physical memory manager's initialisation is broken - it's a pity that C allows you to try to jam a 64-bit value into a 32-bit integer.
Any chance you could point me to where that's happening? I can't seem to find it :(
Imagine a computer that has 6 GiB of RAM, where there's a 2 GiB area of RAM from 0x000000100000000 to 0x000000180000000 (e.g. the memory map has and entry where "start = 0x000000100000000" and "size = 0x80000000") . Now take a look how the function "void kernel_init_pframe_stack(const mb_info_t * const mbi)" would handle this.

First it does "uint32_t first_pframe = mmap->addr;" so that "first_pframe = 0x00000000" (because the highest 32-bits get truncated).

Then it does "const uint32_t last_pframe = mmap->addr + mmap->len - ((mmap->addr + mmap->len) % PAGE_SIZE + PAGE_SIZE);". This ends up being like "const uint32_t last_pframe = 0x000000180000000" where the highest 32-bits get truncated, so "last_pframe = 0x80000000".

Next you do "for ( next_pframe = first_pframe to last_pframe )" while ends up being "for ( next_pframe = 0x00000000 to 0x80000000)".

The end result is that you end up adding a lot of pages onto the free page stack, which have the wrong physical address, and are either already on the free page stack or not usable RAM at all. This can lead to unexpected crashes later (e.g. allocating a physical page for a page table without knowing the actual physical page has been allocated twice).
Ceann wrote:I have got it working now, thanks so much for your help. I'm a bit confused, though. I understand now why the kernel should be identity mapped (seems obvious), but am I supposed to only map the virtual addresses to the same physical ones up until the end of the kernel? Some of the ones that get mapped by the loop you wrote are also on the page frame stack, albeit at the very bottom, i.e.:

0x07fdc000 is the first free page frame. The last free page frame is the first 4K aligned block after the kernel. The kernel starts at 1M and only goes for a few KiB, but the first page table maps 4MiB of memory. What happens when the page frames in this first 4MiB segment get allocated? Or should I just set that memory aside?
In general; I'd map the kernel where it belongs (e.g. maybe at 0xC0000000) and just have a minimal/temporary identity mapping (at 0x00100000) for the purpose of enabling paging; where after you've enabled paging you jump to the kernel (at 0xC0000000) and then remove the temporary identity mapping.

Of course this creates a mess in your linker script, etc (different sections for code and data at different virtual addresses - e.g. at 0x00100000 and at 0xC0000000); which is why I'd recommend setting up paging in the boot loader so that the entire kernel runs at virtual address (e.g.) 0xC0000000.

Note: This has the additional benefit of shifting some "only run once" initialisation code from the kernel (where it permanently wastes space) to the boot loader (where the RAM is free after boot loader is finished); and may have other benefits (like, if the RAM at 0x00100000 is faulty and unusable then the boot loader could just load the kernel into different physical pages and map them at the exact same virtual address so that the kernel needn't care).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply