Hi all, I started hitting a problem today, and I'm wondering if anyone can suggest anything stupid I'm doing or tips for debugging.
I have a kernel that has previously been working fine. I've been working on implementing MSI for PCI device interrupts, and decided I needed more than the 22 IRQs I currently have (the amount my IOAPIC reports) to assign to the MSIs, and added a bunch of ISRs.
I discovered that above a certain number of defined ISRs, I started getting strange #GP and #UD exceptions early after jumping into my kernel from the bootloader. Narrowing it down, it starts happening if I add enough ISRs that my kernel grows to require 100KiB of mapped pages.
I'm at a loss for how to debug this. The pages I'm allocating are free according to UEFI (it's UEFI that's giving them to me in the first place), and I always map 2 pages of font data right after that, so I know the memory area is usable anyway.
Connecting GDB to QEMU, I see it die at this switch statement - the values of the local variables look ok (the first descriptor's type is efi_boot_services_code, but instead of jumping to the proper case label, it jumps way off into the .data segment.
As for my UEFI bootloader, it reads the ELF kernel, finds the LOAD program header, and allocates enough pages to hold its vmem size, then loads the specified number of bytes from the file into that memory (bootloader main, ELF loader code). And here's the rest of the code at that commit. As is, that code works fine - but increase the IRQs up to vector 0x63 and the error occurs.
Any ideas?
[SOLVED] Invalid Opcode when my kernel grows >100KiB
Re: Invalid Opcode when my kernel grows >100KiB
Hah! I think I found it. Of course, posting about this made me think about it a bit differently. The error, in case it's ever helpful to anyone else: my simple bootloader ELF loader just loads one contiguous section described in the program header, instead of section by section from the section headers. This would be (mostly) fine, except I had forgotten that I added alignment to the sections in my linker script way back when I first wrote it:
So this apparently just happened to work as-is when my .text segment was roughly page-sized, but once it crossed over a page boundary I'm guessing the linker didn't want to put 4KiB of zeros into the output file, and so loading it flat stopped working.
Man, I love/hate some of the bugs you get to run into when doing OS development.
Code: Select all
ENTRY(_start)
SECTIONS
{
OFFSET = 0xFFFF800000000000;
. = OFFSET + 0x100000;
.header : {
__header_start = .;
KEEP(*(.header))
__header_end = .;
}
.text : {
*(.text)
}
.data ALIGN(0x1000) : {
*(.data)
*(.rodata)
}
.isrs : {
*(.isrs)
}
.bss ALIGN(0x1000) : {
__bss_start = .;
*(.bss)
__bss_end = .;
}
.note ALIGN(0x1000) : {
*(.note.*)
}
kernel_end = ALIGN(4096);
}
Man, I love/hate some of the bugs you get to run into when doing OS development.
Developing: jsix - UEFI-booted x64 kernel