Page 1 of 1

Strange error after moving to 64-bit

Posted: Sat Oct 17, 2009 12:22 am
by zity
Hello again :)

I have finally got some time to play around with 64-bit OS development. I've decided to continue using grub as my bootloader, and I switched to 64-bit ELF using AOUT_KLUDGE. I've now managed to get the computer into long mode, and I can jump to my C kernel. I all seem to work very well, but I've encountered a strange error while trying to implement a simple text mode video driver. My clear screen function uses memsetw() to clear the screen buffer. In the following I've created to different implementations of this function, one that uses pointers (1) and one that use the screenbuffer as an array (2). I my 32-bit OS I use the one with pointers (1) and it works very well. In my new 64-bit test system, it causes a reboot in both qemu and bochs, but (2) works fine.

Number 1, with pointers

Code: Select all

uint16_t *memsetw(uint16_t *dest, uint16_t val, int len)
{
   uint16_t *dp = (uint16_t *)dest;
   while(len-- > 0) *dp++ = val;
   return dest;
}
Number 2, as an array

Code: Select all

uint16_t *memsetw(uint16_t *dest, uint16_t val, int len)
{
   while(len-- > 0) dest[len] = val;
   return dest;
}
Anybody who have an idea about, why it does not work with pointers, but as an array? Bochs returns the following error:

Code: Select all

00055164694e[CPU0 ] interrupt(long mode): IDT entry extended attributes DWORD4 TYPE != 0  
00055164694e[CPU0 ] interrupt(long mode): IDT entry extended attributes DWORD4 TYPE != 0  
00055164694e[CPU0 ] interrupt(long mode): IDT entry extended attributes DWORD4 TYPE != 0  
00055164694i[CPU0 ] CPU is in long mode (active)                                          
00055164694i[CPU0 ] CS.d_b = 16 bit                                                       
00055164694i[CPU0 ] SS.d_b = 32 bit                                                       
00055164694i[CPU0 ] EFER   = 0x00000501
My linker script and assembler stub can be downloaded here:
http://pub.mitsted.dk/link.ld
http://pub.mitsted.dk/boot.asm

Re: Strange error after moving to 64-bit

Posted: Sun Oct 18, 2009 6:06 am
by zity
I've been playing around with my source for a while now, and I simply can't seem to figure out the problem.. I doesn't seem to have anything to do with the long mode enabling code (mapping, gdt and so on), because even if I jump to the main function in my C kernel immediately after the start label in my assembly stub (when I'm still in protected mode) it does not work.

I think it has something to do with the way grub loads my kernel or the way it's linked. But I don't know what to try next. I've changed my assembly stub a little, so the stack is setup properly (or at least i think so), but it did not change anything. I change my linker script too, without any difference :(

Re: Strange error after moving to 64-bit

Posted: Sun Oct 18, 2009 6:28 am
by NickJohnson
Did you try looking at the assembly output for both functions? As far as I can tell, both should produce the same result, but have very different mechanisms. The first one doesn't use offsets and fills from low to high addresses, and the seconds uses offsets and fills from high to low offsets. Both should have short assembly representations, so you could post them.

Re: Strange error after moving to 64-bit

Posted: Sun Oct 18, 2009 6:43 am
by zity
I went to wash the dishes, when I began to think whether my parameters for gcc could cause all this trouble. Guess what, THEY DID. I have always been using -O3 optimization flag, but if I use -O0 (or anything below 3) instead, my code works just fine. So the optimization must have broken my code. Guess I'll stop using -O3 then :)

Re: Strange error after moving to 64-bit

Posted: Sun Oct 18, 2009 6:54 am
by Creature
The only difference I can see at the first glance is that pointers in 64-bit are usually 64-bit and in 32-bit they are 32-bit. But that shouldn't change anything in the functions themselves, unless you are doing some crazy operations on the pointers that assume they are 32-bit or something.

I'm also not sure (haven't tried long mode yet), but isn't your CS supposed to be 32-bits (unless you're in 16-bit protected mode or real mode or something)?

EDIT: Nevermind :P, optimizations tended to break my memcpy and memset too, until I put volatile all over the place to stop the bastard from optimizing it away.

Re: Strange error after moving to 64-bit

Posted: Sun Oct 18, 2009 7:08 am
by zity
Nick: I actually though about getting the assembly output, but luckily I checked gcc first :)

Creature: I'm wondering about CS being 16 bit too, but it seems to work anyway. Maybe someone with a little more knowledge on 64-bit can answer this question? :)

Re: Strange error after moving to 64-bit

Posted: Sun Oct 18, 2009 12:51 pm
by NickJohnson
zity wrote:I went to wash the dishes, when I began to think whether my parameters for gcc could cause all this trouble. Guess what, THEY DID. I have always been using -O3 optimization flag, but if I use -O0 (or anything below 3) instead, my code works just fine. So the optimization must have broken my code. Guess I'll stop using -O3 then :)
The thing is, optimizations don't break correctly written code. If your function works with -O0, it may still be technically incorrect, just not enough to break at that level. I would really look at the assembly and try to figure out what -O3 did to break it, and try to fix that in C. Just a suggestion though - you can still have a functioning kernel with -O0 of course, but it may break with other compilers and architectures.

Re: Strange error after moving to 64-bit

Posted: Mon Oct 19, 2009 9:56 am
by dosfan
Those messages look like a page fault occuring without a valid IDT caused by the screwy compiler ouput.

BTW Bochs reports 16 bit CS for me also in long mode. Haven't looked into it yet

Re: Strange error after moving to 64-bit

Posted: Mon Oct 19, 2009 10:57 am
by stlw
dosfan wrote:Those messages look like a page fault occuring without a valid IDT caused by the screwy compiler ouput.

BTW Bochs reports 16 bit CS for me also in long mode. Haven't looked into it yet
Bochs will always print debugdump with 16-bit CS in long mode.
"16-bit CS" here just means that CS.D=0 (64-bit mode indicated by CS.L=1, CS.D=0. CS.L=1, CS.D=1 is illegal combination).

Stanislav