Page 1 of 1

Kernel text mapping issue

Posted: Sat Aug 04, 2018 11:46 am
by CRoemheld
I am (still) trying to adhere to the linux kernel standards regarding mapping addresses. With your help I managed to get a better understanding about the linux kernel, but there's still a big issue which I cannot seem to solve and completely understand: The kernel and its text section mapping.

Up until now, I am following the guide for creating a 64-bit kernel using a separate loader. However, in a rather important section in this guide the author used code which is not really helpful: Section Loader, code:

Code: Select all

        #include "elf64.h" // Also requires elf64.c
 
        char* kernel_elf_space[sizeof(elf_file_data_t)];
        elf_file_data_t* kernel_elf = (elf_file_data_t*) kernel_elf_space;                                          /* Pointer to elf file structure (remember there is no memory management yet) */
 
        /* This function parses the ELF file and returns the entry point */
        void* load_elf_module(multiboot_uint32_t mod_start, multiboot_uint32_t mod_end){
                unsigned long err = parse_elf_executable((void*)mod_start, sizeof(elf_file_data_t), kernel_elf);    /* Parses ELF file and returns an error code */
                if(err == 0){                                                                                       /* No errors occurred while parsing the file */
                        for(int i = 0; i < kernel_elf->numSegments; i++){
			        elf_file_segment_t seg = kernel_elf->segments[i];                                   /* Load all the program segments into memory */
			                                                                                            /*  if you want to do relocation you should do so here, */
			        const void* src = (const void*) (mod_start + seg.foffset);                          /*  though that would require some changes to parse_elf_executable */
			        memcpy((void*) seg.address, src, seg.flength);
		        }
                        return (void*) kernel_elf->entryAddr;                                                       /* Finally we can return the entry address */
                }
                return NULL;
        }
The code and the structs members aren't documented, so I have no idea what seg.address or seg.foffset are (did he or she mean the p_offset member from the elf program header? Which address in the program header, p_vaddr or p_paddr?).

Currently the situation is as follows:

I am using the linker script for the kernel, which is similar to linux in structure:

Code: Select all

ENTRY(_entry)

KERNEL_VMA = 0xffffffff80100000;
KERNEL_OFF = 0xffffffff80000000;

SECTIONS
{
	. = KERNEL_VMA;
	_kernel = .;

	.text ALIGN(4K) : AT(ADDR(.text) - KERNEL_OFF)
	{
		_text = .;
		*(.text)
		_etext = .;
	}

...
As you can see, the addresses used are the same as in linux. However, when I am retrieving the kernel entry point address from the elf header (elf64_ehdr->e_entry), the addresses leads to a completely different instruction when jumping to it via long jump. So obviously I am missing something, but I really can't seem to find the problem.

The values are as follows:

Code: Select all

memmap: 0x0 -> 0x0, flags: 0x3, size: 0x400000
memmap: 0xffff880000000000 -> 0x0, flags: 0x3, size: 0x40000000
memmap: 0xffffffff80000000 -> 0x0, flags: 0x3, size: 0x20000000
Kernel module located at 0x193000 - 0xAA8AF0
Segment [0]: 0xffffffff80000000 -> 0x193000, size: 0x9130b0
Section [1]: 0xffffffff80100000 -> 0x293000: .text, size: 0x5711
Section [2]: 0xffffffff80106000 -> 0x299000: .rodata, size: 0x9ee
Section [3]: 0xffffffff801069f0 -> 0x2999f0: .eh_frame, size: 0x1990
Section [4]: 0xffffffff80109000 -> 0x29c000: .data, size: 0x88
Section [5]: 0xffffffff8010A000 -> 0x29d000: .bss, size: 0x8830
Section [6]: 0xffffffff80113000 -> 0x2a6000: .mm, size: 0x8000b0
The values above are calculated with the code below:

Code: Select all

/* Debugging */

void dbg_print_segments(elf64_ehdr_t *elf64_ehdr)
{
	elf64_phdr_t *elf64_phdr = get_elf64_phdr(elf64_ehdr);

	for(uint32_t i = 0; i < elf64_ehdr->e_phnum; i++) {
		elf64_phdr_t *elf64_segment = &elf64_phdr[i];

		uint64_t vaddr = elf64_segment->p_vaddr;
		uint64_t paddr = void_ptrtu32(elf64_ehdr) + elf64_segment->p_offset;
		uint64_t memsz = elf64_segment->p_memsz;
		
		info("Segment [%d]: 0x%016llx -> 0x%llx, Size: %llx", 
			i, vaddr, paddr, memsz);
	}
}

void dbg_print_sections(elf64_ehdr_t *elf64_ehdr)
{
	elf64_shdr_t *elf64_shdr = get_elf64_shdr(elf64_ehdr);

	/* Index 0 is reserved */
	for(uint32_t i = 1; i < elf64_ehdr->e_shnum; i++) {
		elf64_shdr_t *elf64_section = &elf64_shdr[i];

		char *sname = get_elf64_shdr_name(elf64_ehdr, elf64_section);
		uint64_t vaddr = elf64_section->sh_addr;
		uint64_t paddr = void_ptrtu32(elf64_ehdr) + elf64_section->sh_offset;
		uint64_t memsz = elf64_section->sh_size;

		if(sname != NULL) {
			info("Section [%d]: 0x%016llx -> 0x%llx: %s, Size: %llx", 
				i, vaddr, paddr, sname, memsz);
		}
	}
}
The kernel entry point is at 0xffffffff80101c68, according to the elf header of the kernel module. When jumping there, the instruction at the address is a completely different one and upon further examination, the address is not even in the text section of the kernel. After a while, I found out that the real entry point address is at 0xffffffff80294c68, so basically it is the address from the entry point stated by the elf header + the offset of the kernel text section in physical memory (0xffffffff80101c68 + 0x193000 = 0xffffffff80294c68). However, simply adding the offset to the entry point does not solve the problem, because all symbol addresses inside the kernel are around 0xffffffff80100000, meaning the physical offset is not in the address.

So I have quite the dilemma, since I cannot simply change the entry point address by adding an offset, because every function is inside the text section starting at 0xffffffff80100000, just as my linker script above intended to. What do I do?

Should I maybe try to create a minimal example and post it on GitHub?

EDIT: Added minimal crash example on GitHub: click here!

In this case, I removed uneccessary code to reproduce the issue, when I compiled the code, the entry point for the kernel was located at 0xffffffff80100af8 according to the elf header, but the actual entry point is at 0xffffffff80293af8, so it is located 0x193000 bytes further away, just as described above.

I am mapping the addresses:

Identity mapping 0x0 (virtual) to 0x0 (physical) with 4MiB length
Mapping of complete memory 0xffff880000000000 (virtual) to 0x0 (physical) with 1GiB length
Kernel text mapping 0xffffffff80000000 (virtual) to 0x0 (physical) with 512MiB length, all according to the Linux documentation.

Re: Kernel text mapping issue

Posted: Tue Aug 07, 2018 7:26 am
by CRoemheld
Sorry, I didn't intent to double post in a row, but since I now got a clearer understanding of what the root of my problem is, I wanted to clarify it in an own post:

Given the approach creating a 64-bit kernel using a separate loader, is there ANY chance to relocate the kernel to a fixed address, lets say, physical address 0x100000? If that would be possible, all of the problems described above would be solved.

My researches have brought me this far:
  • - Since the grub.cfg in the tutorial loads the kernel as a module into memory, there's no way to specify the address at which to load the kernel. This only works for the bootstrap elf, which is loaded as a multibootable elf.
    - My first thoughts were to relocate the kernel. I would have loaded the bootstrap elf somewhere else, so that the kernel could be copied to 0x100000. The copying itself would take place in the bootstrap elf, before the jump into the kernel. However, I have no idea how to relocate the kernel, which is loaded as a module. simply copying the kernel from memory does not work, the OS simply crashes.
Is there any way to accomplish this by using the approach using a separate loader?

Re: Kernel text mapping issue

Posted: Tue Aug 07, 2018 11:46 am
by simeonz
The type of code determines the action necessary after moving the code. Compiled c/c++ is either non-pic relocatable, pic relocatable or non-relocatable. Non-pic relocatable means that the code had been linked with -shared -Bstatic -Bsymbolic and compiled with -mcmodel=large. It must be adjusted by the loader through code fix-ups. Pic relocatable means that that the code had been linked with -shared -pie -Bstatic -Bsymbolic and compiled with -fpie. It will be adjusted by the loader through got fix-ups. Non-relocatable means code linked with -static and requiring no post-load adjustment. Note that all executables requiring any kind and amount of relocation fix-ups (pic or non-pic) have type ET_DYN and dynamic segment of type PT_DYNAMIC, but you can have pc-relative, or base-address-relative, or non-relocatable assembly code mixed in that executable as well.

I suspect that your code is currently non-relocatable. This means that the c/c++ language compiler has generated some absolute references and their computed addresses depend on the program segments being accessible at the p_vaddr location. Whether this location is mapped by paging or is the physical location prior to enabling paging is irrelevant. The code will assume that a given segment can be found and accessed at p_vaddr at the time when it is executed or referenced. p_vaddr and p_paddr can differ if you use the AT attribute in the linker script, and different parts of your code (such as coming from different intermediate object files) can be set to vastly different discontiguous parts of memory. This enables you to set only part of the code to execute from their physical addresses prior to enabling paging. Pc-relative and base-relative assembly can also execute from its physical load address, because such code will be able to execute anywhere.

There are multiple options. You can set in the linker script the virtual address of a portion of the kernel code, used during early initialization, to coincide to the physical address where it is loaded. This code will be executed directly and will be used to create the page directories that map the rest of the kernel in the desired higher memory address. Another possibility is to make the kernel relocatable and to apply the relocations to it using the bootstrapping loader. Another way to process the relocations is to use base-relative or pc-relative assembly initialization code in the kernel to perform the fix-up.

Non-pic code fix-up requires processing the relocation arrays pointed to by the dynamic segment. You have to establish which instruction needs fixing up and this requires inspecting each relocation entry individually. Pic fix-up is simpler, because all relocations are applied to the got section, and all are of type R_X86_64_RELATIVE. This means that the loader has to simply iterate from the start to the end of the section and add to each qword the difference between the designated and the actual virtual base address.

Re: Kernel text mapping issue

Posted: Tue Aug 07, 2018 3:49 pm
by CRoemheld
simeonz wrote:The type of code determines the action necessary after moving the code. Compiled c/c++ is either non-pic relocatable, pic relocatable or non-relocatable. Non-pic relocatable means that the code had been linked with -shared -Bstatic -Bsymbolic and compiled with -mcmodel=large. It must be adjusted by the loader through code fix-ups. Pic relocatable means that that the code had been linked with -shared -pie -Bstatic -Bsymbolic and compiled with -fpie. It will be adjusted by the loader through got fix-ups. Non-relocatable means code linked with -static and requiring no post-load adjustment. Note that all executables requiring any kind and amount of relocation fix-ups (pic or non-pic) have type ET_DYN and dynamic segment of type PT_DYNAMIC, but you can have pc-relative, or base-address-relative, or non-relocatable assembly code mixed in that executable as well.
So, given my current linker and compile flags, which category would my code fall in? It should be noted that when compiling and linking it with the command below, it creates an elf executable (ET_EXEC). If I add the -r flag to the linker, it creates an relocatable elf (ET_REL). However, moving the bytes of the kernel to the fixed address won't solve it either way:

Code: Select all

CCFLAGS 	:= -ffreestanding -mcmodel=large 
CCFLAGS		+= -mno-red-zone -mno-mmx -mno-sse -mno-sse2

LDFLAGS			:= -ffreestanding -O2 -nostdlib $(OBJS)
LGCC			:= -lgcc

...

$(KERNEL): $(OBJS) $(LINKER)
	$(LD) $(BINDIR)/*.o -T $(LINKER) -o $(KERNEL) $(LDFLAGS) $(LGCC)
simeonz wrote:I suspect that your code is currently non-relocatable. This means that the c/c++ language compiler has generated some absolute references and their computed addresses depend on the program segments being accessible at the p_vaddr location. Whether this location is mapped by paging or is the physical location prior to enabling paging is irrelevant. The code will assume that a given segment can be found and accessed at p_vaddr at the time when it is executed or referenced. p_vaddr and p_paddr can differ if you use the AT attribute in the linker script, and different parts of your code (such as coming from different intermediate object files) can be set to vastly different discontiguous parts of memory. This enables you to set only part of the code to execute from their physical addresses prior to enabling paging. Pc-relative and base-relative assembly can also execute from its physical load address, because such code will be able to execute anywhere.
My current linker file is taken from the OSDev tutorial from the 64-bit kernel. To show you, what my linker script would look like, if everything would work fine:

Code: Select all

ENTRY(_entry)

KERNEL_VMA = 0xffffffff80100000;
KERNEL_OFF = 0xffffffff80000000;

SECTIONS
{
	. = KERNEL_VMA;
	_kernel = .;

	.text ALIGN(4K) : AT(ADDR(.text) - KERNEL_OFF)
	{
		_text = .;
		*(.text)
		_etext = .;
	}

	.rodata ALIGN(4K) : AT(ADDR(.rodata) - KERNEL_OFF)
	{
		_rodata = .;
		*(.rodata)
		_erodata = .;
	}

	.data ALIGN(4K) : AT(ADDR(.data) - KERNEL_OFF)
	{
		_data = .;
		*(.data)
		_edata = .;
	}

	.bss ALIGN(4K) : AT(ADDR(.bss) - KERNEL_OFF)
	{
		_bss = .;
		*(COMMON)
		*(.bss)
		*(.kernel_heap)
		*(.kernel_stack)
		_ebss = .;
	}

	/DISCARD/ :
	{
		*(.comment)
	}

	_ekernel = .;
}
simeonz wrote:There are multiple options. You can set in the linker script the virtual address of a portion of the kernel code, used during early initialization, to coincide to the physical address where it is loaded. This code will be executed directly and will be used to create the page directories that map the rest of the kernel in the desired higher memory address. Another possibility is to make the kernel relocatable and to apply the relocations to it using the bootstrapping loader. Another way to process the relocations is to use base-relative or pc-relative assembly initialization code in the kernel to perform the fix-up.
I would go with the approach applying the relocations using the bootstrap loader. This would be my goal to solve this problem. Is this possible doing it by making the kernel relocatable in a way, so that I would simply need to copy all the sections in the kernel elf to my destined address?

Re: Kernel text mapping issue

Posted: Wed Aug 08, 2018 2:07 pm
by simeonz
CRoemheld wrote:I would go with the approach applying the relocations using the bootstrap loader. This would be my goal to solve this problem. Is this possible doing it by making the kernel relocatable in a way, so that I would simply need to copy all the sections in the kernel elf to my destined address?
Unfortunatelly, I recalled later that the amd64 abi requires relocations of the RELA type, meaning with explicit addend in the relocation, not implicit addend stored in the relocated address. Therefore, the got is probably a zero filled array, rather than array of image base relative offsets as on ia32. This leaves the harder approach - iterating the entries pointed to by the DT_RELA tag in the PT_DYNAMIC segment. This is actually easier for the bootstrap loader. The kernel code itself has the advantage that it can simply wrap the got section in two delimiting symbols using the linker script and iterate between them. But if the got does not directly hold the necessary information to make it usable by adding the rebasing offset, processing the RELA array is the only way to go. Another thing to note - even if the bootstrap loader performs one relocation pass before jumping into the kernel, the kernel has to perform another one after establishing the higher half mapping and before jumping into it.

In any case. My hard drive gave up a couple of days ago, leaving me with my system ssd. My VMs were on the hdd. I plan to reinstall the system, at which point I will be able to perform some basic tests and provide you with some more information. I also think I should confirm if the amd64 linker output makes the simpler got fix-up approach indeed impossible. But in the meantime, if someone has some better ideas, they are welcome to chime in :)

Re: Kernel text mapping issue

Posted: Tue Aug 14, 2018 2:27 pm
by CRoemheld
simeonz wrote:In any case. My hard drive gave up a couple of days ago, leaving me with my system ssd. My VMs were on the hdd. I plan to reinstall the system, at which point I will be able to perform some basic tests and provide you with some more information. I also think I should confirm if the amd64 linker output makes the simpler got fix-up approach indeed impossible. But in the meantime, if someone has some better ideas, they are welcome to chime in :)
That sounds pretty bad, I hope you get everything back to normal soon ;)

I am currently researching other topics in kernel development, so I guess this topic is on hold for now. I am however awaiting your response when you are ready :)