Help with debugging my ELF loader

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Help with debugging my ELF loader

Post by Hanz »

I am writing a x86_64 OS with Rust. My kernel works well until it grows too much. After that, the mutex (a spinlock) guarding VGA text buffer (the one at 0xb8000) breaks. After a couple dozen hours I managed to finally find the issue (or at least I think so): my ELF loader doesn't initialize the value properly. With smaller binary size, it properly copies zero from the ELF image. However, when binary size gets too big it doesn't work anymore, because the elf loader is copying wrong value.

I'm quite sure that the problem occurs because the ELF is loaded incorrectly. I checked the most common other issues, like paging and stack, with internal bochs debugger, but I was unable to find any issues with them. I used memory watch points and single-stepped and it looks like 0x69 keeps getting copied instead of the required 0x00.

The ELF loader is written in assembly. It (partially) verifies ELF file located at 0xA000 and loads it to the point specified in the ELF file, 0x100000 in my case. It loads the file according to the instructions in the ELF wiki page.

I loop through all the program headers, and do the following:
  1. Test that type is 1 - load: if not, jump to next header
  2. Clear p_memsz bytes at p_vaddr to 0
  3. Copy p_filesz bytes from p_offset to p_vaddr
  4. Next header
Since there is only one header that has the correct type, only that is loaded. What could cause this kind of problems when loading ELF binary?

Output from readelf -l looks like this:

Code: Select all

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ readelf -l build/kernel.bin 

Elf file type is EXEC (Executable file)
Entry point 0x100000
There are 2 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x00000000000000b0 0x0000000000100000 0x0000000000100000
                 0x000000000000b000 0x000000000005b000  RWE    10
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    10

 Section to Segment mapping:
  Segment Sections...
   00     .entry .text .rodata .data .bss 
   01
And relevant asm code: (rbx points to the current program header entry):

Code: Select all

    mov rdi, [rbx + 16] ; p_vaddr
    mov rcx, [rbx + 40] ; p_memsz

    ; Clear p_memsz bytes at p_vaddr to 0
.loop_clear:
    mov byte [rdi], 0
    inc rdi
    loop .loop_clear

    mov rsi, [rbx + 8]  ; p_offset
    add rsi, img_loc    ; ELF image location (0xA000)
    mov rdi, [rbx + 16] ; p_vaddr

    mov rcx, [rbx + 32] ; p_filesz
    rep movsb           ; copy p_filesz bytes from p_offset to p_vaddr
Boris
Member
Member
Posts: 145
Joined: Sat Nov 07, 2015 3:12 pm

Re: Help with debugging my ELF loader

Post by Boris »

Hi, where does your 0x00 value is located? in the image of your elf ?
Maybe you have a memory corruption:
- your physical allocator gave the same page twice
- your elf image has been altered
I'd extract your elf loader , and make it work on Linux ( or whatever the OS you are developing from is ) . make it print the resulting memory blob after loading. if it works outside your boot loader, you will know what part of it is the culprit.
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

Hi,

Sorry, I was a bit unclear in my first post. I was talking about my boot-stage ELF loader, that loads my kernel that is stored as ELF image. Good thing is that it looks like you still understood it correctly.

The 0x0 value is both in the ELF image, and in the resulting binary. When binary gets too large, some other value, 0x69 in this case, get copied from the image instead if the correct value, 0x0.

Memory corruption sounds like what is happening. However, it cannot be my physical memory allocator, since it isn't yet active at this stage. ELF image getting altered is probably the actual cause here. I have verified (using objdump and memory dumps) that the image is correct when it gets loaded from the disk to ram.
Boris wrote:I'd extract your elf loader , and make it work on Linux ( or whatever the OS you are developing from is ) . make it print the resulting memory blob after loading. if it works outside your boot loader, you will know what part of it is the culprit.
Why didn't I think about this before? I'm going to do so now.
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

I'm starting to think that the issue is actually with my linker script, or event in my code. I wouldn't be surprised about that.
Last edited by Hanz on Fri Dec 30, 2016 6:23 am, edited 1 time in total.
User avatar
dchapiesky
Member
Member
Posts: 204
Joined: Sun Dec 25, 2016 1:54 am
Libera.chat IRC: dchapiesky

Re: Help with debugging my ELF loader

Post by dchapiesky »

You might try a different loader just to see...

https://github.com/ReturnInfinity/Pure64

you will need to objcopy your kernel and concatenate it with pure64.sys - see the docs

Pure64 is well documented.. you might also compare notes

good luck!
Plagiarize. Plagiarize. Let not one line escape thine eyes...
Boris
Member
Member
Posts: 145
Joined: Sat Nov 07, 2015 3:12 pm

Re: Help with debugging my ELF loader

Post by Boris »

You must have a physical allocator of sorts, even if it is just a set of functions altering the multiboot header. Even if it is a set of functions probing memory using BIOS ints.

Remember those allocators often have overlapping areas.
Why didn't I think about this before? I'm going to do so now
Because you are like most of humans, thinking about making a result instead of building tools to help you make.
That's not a bad thing, getting results done is what brings you energy to continue your path.
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

I wrote it again from scratch with Python, and tested it, and got exactly same output from both. Dumped compiled image and compared it to the one in disk, and they were identical. Image is extended correctly. So, now I know that the problem isn't with my ELF loader, it is something about my linker script, compiler settings, or maybe the kernel code itself.
Boris wrote:You must have a physical allocator of sorts, even if it is just a set of functions altering the multiboot header. Even if it is a set of functions probing memory using BIOS ints.
Actually, I use my own boot loader with only fixed memory locations. No dynamic allocation of any kind is ever made. Well, I have one null-terminated array at 0x2000 for the memory map from BIOS, but everything else is fixed.
dchapiesky wrote:You might try a different loader just to see...

https://github.com/ReturnInfinity/Pure64
My current kernel booting process ins't multiboot compliant. Probably a big mistake, but I don't care at this point. Well, I'm going to try Pure64 (and probably GRUB too), next week if I don't manage to find the real root cause by then.
User avatar
bauen1
Member
Member
Posts: 29
Joined: Sun Dec 11, 2016 3:31 am
Libera.chat IRC: bauen1
Location: In your computer
Contact:

Re: Help with debugging my ELF loader

Post by bauen1 »

Are you maybe trying to read the bss section from the file? because it isn't included in the file and you could be reading code from unrelated sections after it.
myunix (version 3) (name suggestions are welcome!)
GPG Key fingerprint: 5ED6 D826 ACD4 3F8E D9D4 FBB2 FF0A AF5E 0812 BA9C
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

bauen1 wrote:Are you maybe trying to read the bss section from the file? because it isn't included in the file and you could be reading code from unrelated sections after it.
Since my knowledge about ELF files is almost completely based on OSDev wiki pages and reading outputs of hexdump, readelf and objdump, it certainly possible that this is the case. However, I believe that I am correctly zeroing out the bits (p_memsz), but I don't do any segment mapping, I just copy everything from the beginning, just like the wiki page says.
1 = load - clear p_memsz bytes at p_vaddr to 0, then copy p_filesz bytes from p_offset to p_vaddr;
Do I have to use the section info in some way? I thought that the linker already handled this for me so I could just copy the bits, since the readelf output shows only two program headers, and the second one is empty GNU_STACK.
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Help with debugging my ELF loader

Post by iansjack »

Hanz wrote: Since my knowledge about ELF files is almost completely based on OSDev wiki pages and reading outputs of hexdump, readelf and objdump, it certainly possible that this is the case.
Does that mean that you haven't read the specifications ( http://www.skyfree.org/linux/references/ELF_Format.pdf )? That would seem to be a sensible first step if you are going to use elf files.
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

iansjack wrote: Does that mean that you haven't read the specifications ( http://www.skyfree.org/linux/references/ELF_Format.pdf )? That would seem to be a sensible first step if you are going to use elf files.
Yes, it does. The bootloader is supposed to parse only the minimal required subset for my kernel. I tried only to make a simplified loader for now, and then rework it when I move to use a proper filesystem and not only a dd-dumped ELF image. I thought that the version in OSDev wiki would be sufficient for this, but it looks like I was wrong. Well, it look like it's time to read the specification now.

I'm using the ELF-64 format, so actually this (http://www.staroceans.org/e-book/elf-64-hp.pdf) should be the correct specification.
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

I read the manual, looked at my code and realized it was clearly incorrect. I "fixed" the code, read the manual again. Fixed the code again.

The I found the error in my disk loading code, and fixed that. Apparently they were both subtly broken. Loader didn't use segments correctly, just a simple mathematical error.
User avatar
dozniak
Member
Member
Posts: 723
Joined: Thu Jul 12, 2012 7:29 am
Location: Tallinn, Estonia

Re: Help with debugging my ELF loader

Post by dozniak »

So, does it work now?
Learn to read.
Hanz
Member
Member
Posts: 29
Joined: Sun Mar 09, 2014 10:14 am

Re: Help with debugging my ELF loader

Post by Hanz »

Probably. I really hope so.
Post Reply