GRUB2 Woes - Fixed (or not)

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

GRUB2 Woes - Fixed (or not)

Post by zerosum »

Hi all,

I'm extremely new to OS development, having followed various tutorials (and the intel / amd manuals) to get a i386 "Hello world" type "kernel." I put "kernel" in quotation marks because what I have completed so far could not really be called a kernel, but I do not know what else to call it; all it does so far is set up paging and identity maps all available physical memory, sets up interrupt handlers and has CGA output with a limited printf.

Anyway, what I have been wanting to do is created a pure 64-bit OS, with no backwards-compatibility for earlier, 32-bit processors. The problem is that due to device driver issues, even though I have a 64-bit processor, I am running 32-bit linux and so I need to cross-compile my "kernel."

I tried creating a tiny piece of code (with a GRUB2 multiboot header) which does nothing but hlt.I have managed to get it to compile doing the following:

nasm -felf64 -o test.o test.s
ld -o kernel -Ttext 0x10000 --oformat=elf64-x86-64 -melf_x86_64

Now I've run BOCHS, loaded up GRUB2 and tried to boot this test code and all I get is a triple-fault (GPF). I am assuming GRUB2 doesn't activate long mode, that it just handles elf64 and jumps to the loaded kernel in pmode, so while the output binary is elf64, the code in "test.s" is 32-bits.

Can anyone tell me why this is triple-faulting? I'm guessing there's something wrong with the output binary......?

Many thanks,
Lee
Last edited by zerosum on Wed Apr 09, 2008 11:00 pm, edited 3 times in total.
User avatar
Zenith
Member
Member
Posts: 224
Joined: Tue Apr 10, 2007 4:42 pm

Post by Zenith »

Welcome to the osdev forums! :)

A few things to note:

- Maybe use a linker script? -Ttext 0x10000 doesn't cut it (and that should be 0x100000, a big difference :wink: ). In the future, you'll actually have 64-bit code at a virtual address and this won't be enough

- GRUB2 (which incidentally, I'm using with my 64-bit ELF kernel too :wink:), or more specifically the current version has a lot of bugs. After experimentation, I found that it doesn't even support the new multiboot draft properly! And you are using the draft one, right?

- If all else fails, post the code in test.s and try objdumping your code.
"Sufficiently advanced stupidity is indistinguishable from malice."
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Thanks for your swift response, karekare0 :-)

heh. Good point about the 0x10000, I'll try changing it and see what happens, I do have a linker script for my 32-bit "kernel" and it does have 0x100000, I'm just not using it when linking the test code :-)

When I first created the test code, I had the old multiboot header, but then I saw a link to the new multiboot specification and I changed it to that...... I guess I should change it back. You would think GRUB2 would error out in the event it can't find a proper multiboot header, rather than cause a triple-fault!

There's nothing worth mentioning in the test.s code, literally all it has is a multiboot header, a _start label with a hlt and a couple of directives for the assembler.
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Okay, I fixed up the aforementioned linker flag, changed the multiboot header back to the old one and tried booting it again. This time I didn't get a triple-fault, but things did not go well either. The BOCHS log shows me the following, after GRUB2 tries to load the test code:

Code: Select all

00101746494i[CPU0 ] math_abort: MSDOS compatibility FPU exception
00101752626e[CPU0 ] SLDT: not recognized in real or virtual-8086 mode
Whatever is causing the bottom line seems to be repeating, as it fills the error log until I shut BOCHS down.

Here's the contents of test.s anyway, though I don't think it'll help solve the problem:

Code: Select all

[bits 32]
[global _start]
align 4

MULTIBOOT_HEADER_MAGIC	dd	0x1BADB002
MULTIBOOT_HEADER_FLAGS	dd	11b
CHECKSUM 		dd	-(0x1BADB002+11b)

_start:
    hlt
Any ideas?
User avatar
Zenith
Member
Member
Posts: 224
Joined: Tue Apr 10, 2007 4:42 pm

Post by Zenith »

I'd got that same error when working on the early stages of my port. I'm sorry I wasn't so clear, but GRUB2 only supports the multiboot draft (incompletely). It does not support the 'legacy' multiboot spec (for ELF64 files, AFAIK). And am I also right in assuming that the abort occurred in real mode in the first MB (as in my case)?

When I said support was incomplete, I meant that values of some things in the multiboot information struct were inaccurate, but GRUB 2 does actually load ELF files with the multiboot2 header.

And GRUB2 does warn when doesn't find the multiboot header...
"Sufficiently advanced stupidity is indistinguishable from malice."
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Ahh, sorry, I misundersood.

I've changed it back to the new multiboot header, but it's still not working. :-(
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

I seem to have fixed the problem now, although I'm not sure exactly what it was that caused it! :-)

It seems GRUB2 was never even making it to booting my kernel. I was entering the command 'multiboot /boot/kernel' at the GRUB2 shell and that was when the crashes were taking place.

I have now successfully gotten the test code to be loaded and executed using the OLD multiboot header (0x1BADB002 etc) with elf64.

I might also add that I am now cross-linking, which may have played a large part in the earlier failures.
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Okay, something strange is going on now.

Earlier, I got the test code to boot/execute fine using ELF64 and the old multiboot header, as stated above.

I then tried recompiling my "real" kernel and it failed because my cross-compiler was linked against something not in the linker's path. I added the path to ld.so.conf and ran ldconfig.

After this, my "real" kernel compiled fine, but GRUB2 wouldn't load it; it just causes GRUB2 to crash like it did earlier.

A little confused at this point, I recompiled the test code and tried to get GRUB2 to boot it. And it failed. I have absolutely no idea why or how, but it looks like adding that path and running ldconfig has screwed it up somehow :?
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

I'm an idiot. I think I had accidentally compiled the "working" elf64 binary as elf. So it seems I never had it working.

After going through the grub2 source code, I discovered that if you set the environmental variable "debug" to "all" while running in the GRUB2 shell, you get many interesting messages.

So, I set debug to all and if you have a look at the attached screenshot, you can see what's going on before GRUB2 dies.

As you may note, it outputs an error saying no GRUB2 multiboot header was found and then continues to load *something* anyway, which is where it crashes sometimes; other times it just hangs.

Anyway, my multiboot header is simple. It is as follows:

Code: Select all

MULTIBOOT_HEADER_MAGIC	dd	0xe85250d6
MULTIBOOT_HEADER_FLAGS	dd	0
Is there something wrong with that? I've tried 64-bit alignment (as per draft specs) and 32-bit alignment (as per old multiboot specs) and no alignment at all (as above), and every time, debug outputs that it doesn't find the multiboot header and proceeds to do god-knows-what.
Attachments
grub2.jpg
grub2.jpg (93.98 KiB) Viewed 4185 times
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Okay, I think I may have found the problem, I just have no idea how to fix it.

For some reason, my cross-linker is filling the elf64 with null bytes up until offset 0x100000 -- which just happens to be the entry point.

Why would it do this and how can I stop it from doing it?

To demonstrate, my native linker outputs a file of size 4690. The cross-linker is outputting a file over 1mb.

Since GRUB only looks for the multiboot magic numbers in the first 8kb, it's not finding mine, because the cross-linker is dumping that magic number at 0x100000. The only way that I've found so far to relocate this is to put the magic numbers in the data section, and tell the linker to put that section below the 8kb mark, which is simply not going to work.

Can anyone please tell me how to stop the cross-linker from padding the code like this?
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

Hi,

I know this sounds odd, but the following may be worth trying. When I started using elf64, the linker (built from the Cygwin GCC Cross-Compiler) adding in library calls, adding 2MiB to the start of my kernel. Try adding:

Code: Select all

-nostdinc -nostdlib
to the linker flags, even though you are assembling with nasm. As I said - it doesn't sound a likely solution to an assembled file, but it seemed to do the trick for me.

Cheers,
Adam
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Hi Adam,

Thanks for your post :-)

I gave that a shot and it produced exactly the same output.

Seems to me like the linker thinks it needs to pad the code out to the absolute address of the entry point, which is, of course, not the case.

I thought linking with the -r (relocatable) flag may have helped, but this does not work either (grub can't read it).

Thanks again :-)
Lee
Laksen
Member
Member
Posts: 140
Joined: Fri Nov 09, 2007 3:30 am
Location: Aalborg, Denmark

Post by Laksen »

it seems to me that the gnu linker is teh b0rkens. Try outputting to elf64-little instead of elf64-x86_64. On my linker it screws up the architecture flag but this can be fixed everytime you compile. but it doesn't add the padding
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Post by AJ »

As yet another suggestion, you may also like to see if it works if you use a linker script, manually defining the location of .text, .data, .rodata and .bss.

Cheers,
Adam
zerosum
Member
Member
Posts: 63
Joined: Wed Apr 09, 2008 6:57 pm

Post by zerosum »

Okay, I have it working, although I don't understand how it is working.

I was under the impression that the -Ttext ld flag was similar to the org assembler directive; i.e. it told the linker that offsets were relative to the given argument.

Based on that belief, I was giving ld the -Ttext 0x100000 flag, since this is where the kernel was going to be loaded.

I have now removed the -Ttext 0x100000 and the padding is no longer present.

Now my next question is, how can I generate an elf64 binary with 32-bit code in it? It's easy in asm, I just specify [bits 32] but use the -felf64 flag and it's all good. Now I've tried compiling my C++ code with -m32, and it outputs 32-bit code.... but ld won't link it all and output an elf64 binary.

The reason I ask this is because I want to do most of the ground work for enabling long mode (paging etc) in C++, rather than asm, but if ld/g++ will only output elf64 with 64-bit code, then I seem to have no way to setup long mode in C++.

Does anyone have any thoughts / suggestions on this?
Post Reply