Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
I have finally got some time to play around with 64-bit OS development. I've decided to continue using grub as my bootloader, and I switched to 64-bit ELF using AOUT_KLUDGE. I've now managed to get the computer into long mode, and I can jump to my C kernel. I all seem to work very well, but I've encountered a strange error while trying to implement a simple text mode video driver. My clear screen function uses memsetw() to clear the screen buffer. In the following I've created to different implementations of this function, one that uses pointers (1) and one that use the screenbuffer as an array (2). I my 32-bit OS I use the one with pointers (1) and it works very well. In my new 64-bit test system, it causes a reboot in both qemu and bochs, but (2) works fine.
I've been playing around with my source for a while now, and I simply can't seem to figure out the problem.. I doesn't seem to have anything to do with the long mode enabling code (mapping, gdt and so on), because even if I jump to the main function in my C kernel immediately after the start label in my assembly stub (when I'm still in protected mode) it does not work.
I think it has something to do with the way grub loads my kernel or the way it's linked. But I don't know what to try next. I've changed my assembly stub a little, so the stack is setup properly (or at least i think so), but it did not change anything. I change my linker script too, without any difference
Did you try looking at the assembly output for both functions? As far as I can tell, both should produce the same result, but have very different mechanisms. The first one doesn't use offsets and fills from low to high addresses, and the seconds uses offsets and fills from high to low offsets. Both should have short assembly representations, so you could post them.
I went to wash the dishes, when I began to think whether my parameters for gcc could cause all this trouble. Guess what, THEY DID. I have always been using -O3 optimization flag, but if I use -O0 (or anything below 3) instead, my code works just fine. So the optimization must have broken my code. Guess I'll stop using -O3 then
The only difference I can see at the first glance is that pointers in 64-bit are usually 64-bit and in 32-bit they are 32-bit. But that shouldn't change anything in the functions themselves, unless you are doing some crazy operations on the pointers that assume they are 32-bit or something.
I'm also not sure (haven't tried long mode yet), but isn't your CS supposed to be 32-bits (unless you're in 16-bit protected mode or real mode or something)?
EDIT: Nevermind , optimizations tended to break my memcpy and memset too, until I put volatile all over the place to stop the bastard from optimizing it away.
When the chance of succeeding is 99%, there is still a 50% chance of that success happening.
Nick: I actually though about getting the assembly output, but luckily I checked gcc first
Creature: I'm wondering about CS being 16 bit too, but it seems to work anyway. Maybe someone with a little more knowledge on 64-bit can answer this question?
zity wrote:I went to wash the dishes, when I began to think whether my parameters for gcc could cause all this trouble. Guess what, THEY DID. I have always been using -O3 optimization flag, but if I use -O0 (or anything below 3) instead, my code works just fine. So the optimization must have broken my code. Guess I'll stop using -O3 then
The thing is, optimizations don't break correctly written code. If your function works with -O0, it may still be technically incorrect, just not enough to break at that level. I would really look at the assembly and try to figure out what -O3 did to break it, and try to fix that in C. Just a suggestion though - you can still have a functioning kernel with -O0 of course, but it may break with other compilers and architectures.
dosfan wrote:Those messages look like a page fault occuring without a valid IDT caused by the screwy compiler ouput.
BTW Bochs reports 16 bit CS for me also in long mode. Haven't looked into it yet
Bochs will always print debugdump with 16-bit CS in long mode.
"16-bit CS" here just means that CS.D=0 (64-bit mode indicated by CS.L=1, CS.D=0. CS.L=1, CS.D=1 is illegal combination).