Page 1 of 1

Best strategy to load a kernel elf binary to high vmem?

Posted: Fri Jan 08, 2016 2:15 pm
by cmpxchg64
Hi everyone,

I have a question regarding the best way of loading a kernel binary.
Currently I'm using an identity-mapped (base) kernel which sits inside (now over 320!) the reserved sectors of my FAT16 boot partition. This blob is then loaded by the bootsector code to 0x7e00 and above, which was acceptable while the kernel was small, but now I'm running out of "early memory", i.e. memory that I'm using for an "fixed allocator" which returns identity mapped pages from below 1MB. Also I'm not fond of abusing the FAT filesystem's reserved sector feature as a dump for binary data. To show you the mess, here is the linker script for generating the FAT16 image:

Code: Select all

OUTPUT_FORMAT(binary)
/*STARTUP(prog.o)*/
SECTIONS
{
	. = 0x7c00 ;
	bootsect :
	{
		prog.o (.bstext)
		prog.o (.bsdata)
		endbootsect = . ;
		FILL(0x00)
		/* partition table */
/*		. = 0x1be ;
		BYTE(0x80) *//* bootable */
/*		BYTE(0x00) *//* head 0 */
/*		BYTE(0x02) *//* sector 2 LBA 1 */
/*		BYTE(0x00) *//* cylinder 0 */
/*		BYTE(0x04) *//* type - fat16 */
		/* last sector 5760 CHS (18 per side) last is before cyl 160 */
/*		BYTE(0x01)
		BYTE(0x11)*/ /* 17 */
/*		BYTE(0x9f)*/
		/* start sector lba */
/*		LONG(0x00000001)*/
		/* end sector */
/*		LONG(5760)*/ /* minus a? */
		. = 0x1fe ;
		SHORT(0xaa55)
	}
	. = 0x7e00 ;
	rest :
	{
		rest = . ;
		idt.o (.idt)
		/*. = 0x1fe ;*/
		/*SHORT(0x1111)*/
		. = 0x200 ;
		idt.o (.trampo)
		. = 0x3fe ;
		idt.o (.idtext)
		FILL(0x00)
		. = 0x7fe ;
		SHORT(0x1d35)
		prog.o (.bsrmio)
		vga.o (.vga)
		. = 0xdfe ;
		SHORT(0xf1f0)
		endbase = . ;
		*(.text)
		*(.rodata*)
		. = ALIGN(0x1000) ;
		endtext = . ;
		*(.data)
		*(.bss)
		*(COMMON)
		FILL(0x00)
		. = 0x39dfe ;
		SHORT(0x789a)
		endrest = . ;
	}
	basesize = (endbase - rest) / 0x200 + 1;
	restsize = (endrest - rest) / 0x200 + 1;
	. = 0x47c00 ; /*(0x28000=512*512)*/
	fat16 :
	{
		/* media descriptor & 1 for sector 1 */
		SHORT(0xfff0)
		SHORT(0xffff) /* mandatory second entry */
		/* first test file */
		SHORT(0x0003)
		SHORT(0x0004)
		SHORT(0xffff)
		INCLUDE ifats.ld
		/* free clusters (2880-2*11-512) */
		FILL(0x00)
		. = 0x2c00 ; /* 11 sectors per fat */
	}
	fat16b : /* la meme chose */
	{
		SHORT(0xfff0)
		SHORT(0xffff)
		SHORT(0x0003)
		SHORT(0x0004)
		SHORT(0xffff)
		INCLUDE ifats.ld
		FILL(0x00)
		. = 0x2c00 ;
	}
	fat16f :
	{
		prog.o (.fat16dir)
		INCLUDE ifdirs.ld
		FILL(0x00)
		. = 0x1c00 ; /* 224*32 */
		/* fat16 files */
		prog.o (.fat16files)
		FILL(0x00)
		. = ALIGN(0x200) ;
		INCLUDE ifils.ld
	}
}
Symbols are added by automatically generating a .gdbinit file which contains the needed "add-symbol-file *.o ofs -s .abc sofs1" commands for gdb to understand the binary blob (works suprisingly well for debugging):

Code: Select all

add-symbol-file apic.o 0x0000000000011f01 -s .bss 0x000000000002d5ac -s .data 0x000000000002d0a0 -s .rodata 0x0000000000028a68 
add-symbol-file early.o 0x0000000000008c00 -s .bss 0x000000000002d3e4 -s .data 0x000000000002d000 -s .rodata 0x0000000000026fc8 
add-symbol-file framebuffer.o 0x000000000000abcc -s .bss 0x000000000002d3f0 -s .data 0x000000000002d020 -s .rodata 0x0000000000027648 
add-symbol-file i386-stub.o 0x000000000001622b -s .bss 0x000000000002d960 -s .data 0x000000000002d16c -s .rodata 0x000000000002994c 
add-symbol-file idt.o 0x0000000000008200 -s .idt 0x0000000000007e00 -s .idtext 0x0000000000008200 -s .trampo 0x0000000000008000 
add-symbol-file kbd.o 0x00000000000177b3 -s .bss 0x000000000002dcb8 -s .data 0x000000000002d180 -s .rodata 0x0000000000029b18 
add-symbol-file kernel.o 0x000000000000daa1 -s .bss 0x000000000002d400 -s .data 0x000000000002d060 -s .rodata 0x0000000000027880 
add-symbol-file kio.o 0x00000000000183db -s .bss 0x000000000002dcc4 -s .data 0x000000000002d2e0 -s .rodata 0x0000000000029bc4 
add-symbol-file libc.o 0x000000000001a7e9 -s .bss 0x000000000002dce0 -s .data 0x000000000002d340 -s .rodata 0x0000000000029ca0 
add-symbol-file mmgr.o 0x000000000001bdc3 -s .bss 0x000000000002dd80 -s .data 0x000000000002d37c -s .rodata 0x0000000000029d34 
add-symbol-file module.o 0x0000000000024416 -s .bss 0x000000000002de00 -s .data 0x000000000002d3c0 -s .rodata 0x000000000002b358 
add-symbol-file prog.o 0x0000000000007c00 -s .bsdata 0x0000000000007dc5 -s .bsrmio 0x0000000000008600 -s .bstext 0x0000000000007c00 
add-symbol-file smp.o 0x0000000000012a9c -s .bss 0x000000000002d5b0 -s .data 0x000000000002d0b0 -s .rodata 0x0000000000028d7c 
add-symbol-file tasks.o 0x0000000000013db4 -s .bss 0x000000000002d5e0 -s .data 0x000000000002d0c0 -s .rodata 0x000000000002944c 
add-symbol-file vga.o 0x0000000000008b84 -s .vga 0x0000000000008b84 
Later inside the C code I'm first loading a nm-style symbol table, followed by parsing it and then loading the module files (relocatable ELF output, practically the output of gcc -c), performing the relocations on them and then everything is set up.

The main incentive for staying with this rather arcane system is that I really don't want to duplicate the mmgr code, containing paging, PAE, ... for loading a kernel image into virtual addresses.

What are possible solutions for this? Compiling mmgr.c as position independent code? With this I could use it inside the bootloader and then relocate it into the kernel high-vmem. Other approaches?

Re: Best strategy to load a kernel elf binary to high vmem?

Posted: Fri Jan 08, 2016 7:22 pm
by Brendan
Hi,
cmpxchg64 wrote:What are possible solutions for this? Compiling mmgr.c as position independent code? With this I could use it inside the bootloader and then relocate it into the kernel high-vmem. Other approaches?
I setup a simple physical memory manager (just a "one bit per page" bitmap, and only for pages with 32-bit physical addresses) and setup paging before I load anything slightly large from disk (or from CD or network or wherever); so I can do a "for each page { load 4 KiB of data into buffer; allocate physical page; copy data from buffer into page; map page into virtual address space }" type of thing. This makes it immune to physical memory layout problems (for an extreme example; if boot loader fails to enable A20 I don't even care, and can load a 512 MiB "file containing many files" even if physical memory is fragmented into <= 1 MiB pieces).

Paging is already setup before kernel is started, so kernel doesn't need to do insane shenanigans to work around problems that shouldn't have existed (like having a special section at a freaky address). Kernel sets up its own physical memory manager by using data the boot code was using (e.g. "for each page that's marked as free in boot code's simple bitmap { add page to kernel's high-performance/complicated physical memory manager }") which means it doesn't need to tip-toe around RAM that is currently in use (for its own code, page tables, RAM used to store information from boot loader, etc).

Note that the memory management done by boot code should be relatively simple, and the memory management done by kernel should not be simple (e.g. taking into account multi-CPU locking and TLB shootdown, memory mapped files, various "copy on write" schemes, supporting the difference/s between user-space and kernel-space, NUMA optimisations, etc). If you can use the same code for both, then either your boot code is far too complicated or your kernel is far too simple.


Cheers,

Brendan

Re: Best strategy to load a kernel elf binary to high vmem?

Posted: Sat Jan 09, 2016 8:24 am
by cmpxchg64
Thanks Brendan, that sound very reasonable with the stripped down version.

Yeah until now I thought it would be overkill to move the memory manager into the boot code, as it has locking (which means I have to move the kernel lock-support into boot code), 7-bitmap levels lvl0-lvl7 (1KiB per 16MiB of physical memory, 4096 bits in lvl0 -> 32 bit in level7) and the other jazz (like 2800 lines of cruft).

But now I think I could let it work exclusively on level0 and then after the kernel starts up do a fill_bitmap-like initialization.

On loading: Are you relying on BIOS routines to get your disk data? I'm still a bit anxious to switch on-and-off PM plus paging to fetch more sectors from disk, there being more chance, that I accidentially change the state the BIOS expects the system to be in for proper execution.

With UEFI on the other hand, everything is fine and dandy, but relying on it might be a little too much, as I have still non-UEFI hardware lying aroumd.

Re: Best strategy to load a kernel elf binary to high vmem?

Posted: Sat Jan 09, 2016 10:11 pm
by Brendan
Hi,
cmpxchg64 wrote:Yeah until now I thought it would be overkill to move the memory manager into the boot code, as it has locking (which means I have to move the kernel lock-support into boot code), 7-bitmap levels lvl0-lvl7 (1KiB per 16MiB of physical memory, 4096 bits in lvl0 -> 32 bit in level7) and the other jazz (like 2800 lines of cruft).

But now I think I could let it work exclusively on level0 and then after the kernel starts up do a fill_bitmap-like initialization.

On loading: Are you relying on BIOS routines to get your disk data?
I rely on firmware for reading files from whatever the boot device is (and a bunch of other things).
cmpxchg64 wrote:I'm still a bit anxious to switch on-and-off PM plus paging to fetch more sectors from disk, there being more chance, that I accidentially change the state the BIOS expects the system to be in for proper execution.
For my boot code; there's a boot loader, then a "Boot Abstraction Layer" (BAL). The BAL always runs in 32-bit protected mode with paging enabled. The boot loader provides a "firmware wrapper" (e.g. similar to this for BIOS) that takes care of switching CPU modes, etc. For things like loading files, character output (before graphics mode is setup), getting a memory map, getting a list video modes and EDID for each monitor, finding ACPI tables, etc; the boot loader does it (and typically just gives the information to the BAL, via. an API provided by the BAL). The BAL does all the common grunt-work (building the OS's "physical address space map", decompressing the boot image, parsing ACPI tables, generating my "display information data" files from EDID, choosing a video mode, etc).

There are also "boot modules" that are started by the BAL for various things (e.g. if you have 2 monitors and 2 serial ports, the BAL might start 4 "boot log output" boot modules; plus another boot module for detecting CPU features, errata, etc. The boot modules run in their own virtual address space at CPL=3; and later (after kernel is started) are "promoted to processes" so that they end up looking like normal 32-bit processes running on the OS.

When an IRQ occurs while a boot module is running; the BAL's interrupt handers switch back to the virtual address space that the rest of the boot code uses (and back to CPL=0), and then call a boot loader's callback to handle the IRQ; and the boot loader switches back to whatever the firmware expects (real mode for BIOS) and gets the firmware to handle the IRQ. The reverse happens on the return path (boot loader switches back to protected mode with paging and returns to BAL, BAL switches virtual address space and returns to the interrupted boot module).

For UEFI, it'll be the same. The boot loader will create a firmware wrapper, and handle switching between (e.g.) 64-bit long mode and 32-bit protected mode.

Eventually (when firmware is no longer needed) the boot loader is discarded. Then the BAL uses a bunch of information (the features of all CPUs, how much memory there is, which kernels are present in the boot image, etc) to decide which kernel to start; and then starts the selected kernel's "Kernel setup module". If the selected kernel happens to want long mode, then its "kernel setup module" switches to long mode.

The BAL does a lot of work; but it doesn't know what the boot device is, or what the boot loader is, or what the firmware is, or (for the majority of its work) what the kernel will be.

Now...

After reading about this, are you still anxious to switch on-and-off PM plus paging? ;)
cmpxchg64 wrote:With UEFI on the other hand, everything is fine and dandy, but relying on it might be a little too much, as I have still non-UEFI hardware lying aroumd.
Even without BIOS, then there's still plenty of "non-UEFI" that you might care about. For a simple example; I want a something a little bit like kexec for very fast reboot (and kernel updates). In the past I've done "boot from ROM" (for Bochs) where the system's firmware contains a custom designed boot loader and the OS's boot files (which tends to make the OS boot extremely quickly - minimal firmware and no disk IO).


Cheers,

Brendan