Crash when enabling paging - but memdump looks fine?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
minater247
Posts: 17
Joined: Sat Jun 18, 2022 11:38 pm

Crash when enabling paging - but memdump looks fine?

Post by minater247 »

I'm attempting to set up a simple paging system using a combination of information from here and the Intel manual, and it was initially working well until I tried to move the kernel to the higher half. Now it just immediately closes, and although I've tried the -d flags with qemu, I'm not getting much of anything.

After running the code, cr3->103000, so I dump the memory at that address:

Code: Select all

(qemu) xp/769xw 0x103000
0000000000103000: 0x00104000 0x00000000 0x00000000 0x00000000
0000000000103010: 0x00000000 0x00000000 0x00000000 0x00000000
0000000000103020: 0x00000000 0x00000000 0x00000000 0x00000000
[...many zeroes follow]
0000000000103be0: 0x00000000 0x00000000 0x00000000 0x00000000
0000000000103bf0: 0x00000000 0x00000000 0x00000000 0x00000000
0000000000103c00: 0x00105000
So it seems to be working properly, the page table's first entry is 0x104000, the next 4096 bytes, and the 768th entry is 0x105000, the next 4096 bytes after that.

Taking a look at one of the page tables:

Code: Select all

(qemu) xp/500xw 0x104000
0000000000104000: 0x00000003 0x00001003 0x00002003 0x00003003
0000000000104010: 0x00004003 0x00005003 0x00006003 0x00007003
0000000000104020: 0x00008003 0x00009003 0x0000a003 0x0000b003
0000000000104030: 0x0000c003 0x0000d003 0x0000e003 0x0000f003
0000000000104040: 0x00010003 0x00011003 0x00012003 0x00013003
0000000000104050: 0x00014003 0x00015003 0x00016003 0x00017003
0000000000104060: 0x00018003 0x00019003 0x0001a003 0x0001b003
[...]
That... also looks right. That's the first entry, which should identity map the first 4MB. The same with the next page table, which maps the same addresses at a 3GB offset, so it looks like both the directory and table work fine. So I'm lost, the memory looks exactly like the manual says it should, the PG and PE bits are enabled, so I don't see what's causing it to crash. Any help is much appreciated!


Now for the code:

Code: Select all

.section .data
.align 4096
boot_page_directory:
	# skip 4096 bytes
	.skip 4096
boot_page_table1:
	# skip 4096 bytes
	.skip 4096
boot_page_table2:
	# skip 4096 bytes
	.skip 4096

# The kernel entry point.
.section .text
.global _start
_start:
	# Set up the stack.
	mov $stack_top, %esp
	subl $VIRTUAL_BASE, %esp

	# Set up paging
	pushl %eax
	pushl %ebx
	pushl %ecx
	pushl %edx

	# %eax = physical address of page directory
	movl $boot_page_directory, %eax
	subl $VIRTUAL_BASE, %eax
	andl $0xFFFFF000, %eax # align to 4 KiB
	
	# %ebx = physical address of page table 1
	movl $boot_page_table1, %ebx
	subl $VIRTUAL_BASE, %ebx
	andl $0xFFFFF000, %ebx # align to 4 KiB

	# Move the page table into the first entry of the page directory.
	movl %ebx, (%eax)

	# Fill the page table with entries
	# %eax = physical address of current page table entry, starting at 0x00000000 and incrementing by 4096
	# %ecx = number of pages to fill (looping with loop, fill 1024 pages)
	# Since we have the address of the page table in ebx, we can increment that by 4 each time.
	movl $0x0, %eax
	movl $1023, %ecx
1:
	# Add the flags to the address
	addl $0x3, %eax
	# Move the entry into the table
	movl %eax, (%ebx)
	# Remove the flags
	subl $0x3, %eax
	# Move to the next table entry
	addl $0x4, %ebx
	# Increase the physical address set by 4096
	addl $4096, %eax
	# Loop
	loop 1b

	# %ebx = physical address of page table 2
	movl $boot_page_table2, %ebx
	subl $VIRTUAL_BASE, %ebx
	andl $0xFFFFF000, %ebx # align to 4 KiB

	# This part is a little tricky - we need to set it to entry 768 (0x300) in the page directory to offset it by 3 GiB.
	# We can do this by adding 0x300 * 4 to the page directory address.

	# %eax = physical address of page directory
	movl $boot_page_directory, %eax
	subl $VIRTUAL_BASE, %eax
	andl $0xFFFFF000, %eax # align to 4 KiB
	# Add 0x300 * 4 to the page directory address
	addl $0xC00, %eax

	# %ebx = physical address of page table 2
	movl $boot_page_table2, %ebx
	subl $VIRTUAL_BASE, %ebx
	andl $0xFFFFF000, %ebx # align to 4 KiB

	# Move the second page table into the 768th entry of the page directory.
	movl %ebx, (%eax)

	# Fill the page table with entries
	# %eax = physical address of current page table entry, starting at 0x00000000 and incrementing by 4096
	# %ecx = number of pages to fill (looping with loop, fill 1024 pages)
	# Since we have the address of the page table in ebx, we can increment that by 4 each time.
	movl $0x0, %eax
	movl $1023, %ecx
2:
	# Add the flags to the address
	addl $0x3, %eax
	# Move the entry into the table
	movl %eax, (%ebx)
	# Remove the flags
	subl $0x3, %eax
	# Move to the next table entry
	addl $0x4, %ebx
	# Increase the physical address set by 4096
	addl $4096, %eax
	# Loop
	loop 2b
	
	# Move the directory address into cr3
	movl $boot_page_directory, %eax
	subl $VIRTUAL_BASE, %eax
	andl $0xFFFFF000, %eax # align to 4 KiB
	movl %eax, %cr3

	# Enable PG and PE
	movl %cr0, %eax
	orl $0x80000001, %eax

stop:
	jmp stop # in case you're actually running this, stop here
	
	movl %eax, %cr0

	# Anything after enabling paging crashes. I can't tell if it's actually enabling
	# paging that does it, or if it's trying to run code *after* paging, but anything
	# after this point crashes the system.
And the linker file, since we're dealing with higher-half stuff:

Code: Select all

/* The bootloader will look at this image and start execution at the symbol
   designated as the entry point. */
ENTRY(_start)

/* A few definitions the assembly code can use for physical addresses. */
PHYSICAL_BASE = 0x00100000;
VIRTUAL_BASE = 0xC0000000;
 
/* Tell where the various sections of the object files will be put in the final
   kernel image. */
SECTIONS
{
	/* Begin putting sections at 1 MiB, a conventional place for kernels to be
	   loaded at by the bootloader, and add C0000000 (3GB) for paging. */
	. = 0xC0100000;
 
	/* First put the multiboot header, as it is required to be put very early
	   early in the image or the bootloader won't recognize the file format.
	   Next we'll put the .text section. */
	.text ALIGN (0x1000) : AT(ADDR(.text)-0xC0000000)
	{
		*(.multiboot)
		*(.text)
	}
 
	/* Read-only data. */
	.rodata ALIGN (0x1000) : AT(ADDR(.rodata)-0xC0000000)
	{
		*(.rodata)
	}
 
	/* Read-write data (initialized) */
	.data ALIGN (0x1000) : AT(ADDR(.data)-0xC0000000)
	{
		*(.data)
	}
 
	/* Read-write data (uninitialized) and stack */
	.bss ALIGN (0x1000) : AT(ADDR(.bss)-0xC0000000)
	{
		*(COMMON)
		*(.bss)
	}
 
	/* The compiler may produce other sections, by default it will put them in
	   a segment with the same name. Simply add stuff here as needed. */
}
songziming
Member
Member
Posts: 70
Joined: Fri Jun 28, 2013 1:48 am
Contact:

Re: Crash when enabling paging - but memdump looks fine?

Post by songziming »

Test on bochs, it tells you the type of exception and where it happens. You can also check paging structure with command `page`.
and it was initially working well until I tried to move the kernel to the higher half
You don't need to move the kernel, just remap it (to a different virtual address).

Where did you link your kernel code to? lower-half or higher-half?

After jumping to higher-half, did you keep the lower-half mapping? or you cleared them?
Reinventing the Wheel, code: https://github.com/songziming/wheel
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Crash when enabling paging - but memdump looks fine?

Post by Octocontrabass »

minater247 wrote:although I've tried the -d flags with qemu, I'm not getting much of anything.
If things aren't being logged, add "-accel tcg" to your QEMU command line. You should see a page fault with bit 0 of the error code set.
minater247 wrote:So it seems to be working properly, the page table's first entry is 0x104000, the next 4096 bytes, and the 768th entry is 0x105000, the next 4096 bytes after that.
You mean page directory? And those are the correct addresses, but they're not marked present (or writable), so as far as the CPU is concerned the entire page directory is empty.

Do you really need two separate page tables? It looks like you're using them to map the same memory, so you could share a single page table across both page directory entries.
minater247 wrote:And the linker file, since we're dealing with higher-half stuff:
You're missing wildcards on your input sections. That might cause problems for you in the future.
minater247
Posts: 17
Joined: Sat Jun 18, 2022 11:38 pm

Re: Crash when enabling paging - but memdump looks fine?

Post by minater247 »

Test on bochs, it tells you the type of exception and where it happens.
That's gotten me a little further! I'm still not getting anything from logs about the reset, though - but I do get to the magic breakpoint after enabling paging. So it looks like just setting CR0 for paging isn't causing the crash, but doing anything after that. Should have included this earlier, but here are the two lines of higher-half jump code I forgot, that comes right after enabling PG/PE:

Code: Select all

	lea (higher_half), %eax
	jmp *%eax

higher_half:
As for the structure and format of the pages, I just kept the identity mapping just in case I missed any direct memory accesses in the C code, figured it'd be easier to just let those work and remove/change the mapping later. And that's what I meant - I moved it to be linked to the higher-half, but located in the lower half, hence subtracting $VIRTUAL_BASE from my addresses - unless I've somehow massively screwed up the linker file.
You can also check paging structure with command `page`.
I'm guessing my version is different, I have `info tab` to show page tables, but even after enabling the PG and PE bits I still get:

Code: Select all

<bochs:14> info tab
paging off
<bochs:15>
Which is odd, but I guess the processor hasn't refreshed its state since the last instruction was enabling it.


If things aren't being logged, add "-accel tcg" to your QEMU command line. You should see a page fault with bit 0 of the error code set.
I tried this, and now I do get the CPU resets logged - but nothing else. No page fault is printed, either, which I know they were on another instance of qemu. At this point my command is `qemu-system-i386 -cdrom os.iso -monitor stdio -d int,cpu_reset -accel tcg`, in case that helps?
And those are the correct addresses, but they're not marked present (or writable), so as far as the CPU is concerned the entire page directory is empty.
You're completely right, I missed that! It still didn't change anything though, it still triple faults immediately on anything other than a breakpoint. Bits 0 and 1 are set now, like 0x104003, so it should detect them.
[...] you could share a single page table across both page directory entries.
That's a good point, I'd never thought about that. It would save quite a bit of memory, thanks for the tip! And I've added wildcards, so that should be good now.



Unfortunately the problem still persists through this all, and I still can't get either bochs or qemu to give me any useful information. It'd be helpful if it could analyze existing paging structures despite the paging bit being off! I'm looking for any more settings that could affect the logs now.
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Crash when enabling paging - but memdump looks fine?

Post by Octocontrabass »

minater247 wrote:At this point my command is `qemu-system-i386 -cdrom os.iso -monitor stdio -d int,cpu_reset -accel tcg`, in case that helps?
You've got the monitor and the debug logs both attached to stdio. Try either removing "-monitor stdio" or adding "-D qemu-debug.log" and see if that helps.
minater247 wrote:It still didn't change anything though, it still triple faults immediately on anything other than a breakpoint.
That's odd, I didn't see any other problems with your page tables. Are interrupts still disabled? Are there any unexpected bits set in CR4?
minater247
Posts: 17
Joined: Sat Jun 18, 2022 11:38 pm

Re: Crash when enabling paging - but memdump looks fine?

Post by minater247 »

There we go, those logs look much more familiar and have so much more detail! And I finally found the problem, although it was pretty much just a combination of problems! This final one was that although now I did set the flags for each table before loading the entries into the page directory, I never cleared them before using the value again - so all of my page table entries were offset by 3. I fixed that, and no more crash!

Image

(still a bit to work on, but I've been trying to get to that message for a while now, this is awesome!)

Thank you so much for all the help!! This was a really confusing one and I'm glad you were able to help me track down all the little problems!
Post Reply