GPF, probably caused by incorrect GDT

Ready4Dis · Post by **Ready4Dis** » Thu Apr 17, 2008 11:40 am

Try setting up your GDT like so:

[bits 32]
align 8   ;Align on 8-byte boundary
gdt_begin: ;This is our NULL descriptor AND gdt_descriptor, why waste free space?
  dw gdt_end-gdt__begin-1
  dd  gdt_begin
  dw 0  ;Padding because GDT entry size is 2-bytes more than gdt descriptor
;0x08 - Code Selector
  dw 0xffff
  dw 0
  db 0
  db 10011010b
  db 11001111b
  db 0
;0x10 - Data Selector
  dw 0xffff
  dw 0
  db 0
  db 10010010b
  db 11001111b
  db 0
gdt_end:

Your gdt descriptor is aligned on 8-bytes, and everything should be nice and good. You would just change lgdt [gdtr] to lgdt [gdt_begin], and you should be good on the whole setup of the GDT, if it still crashes after this, you can safely assume your GDT is not the cause.

I wouldn't touch your call init_gdt code, it's fine, you call it knowing that you are in a valid code segment, and return in a valid code segment, you aren't doing a far call or far return, so it's not touching your CS once you set it. If you really want to do away with the jump call inside of your init_gdt, you can do this:

Code: Select all

push dword 0x08 ;Code segment on stack for the far return to pop!
call init_gdt

Then in your init_gdt, you would have a retf (far return) instead of a normal return, and when returning would call 0x08:ReturnAddr. It's a bit non-obvious though, so I would just stick with what you have, it will work perfectly fine.

Hamster1800 · Post by **Hamster1800** » Thu Apr 17, 2008 7:45 pm

Okay I have made the suggested changes, and obtain the same behavior, so as requested before, I will post the code.

boot.asm
gdt.asm

I have tested my screen printing functions (clrscr, movcur, kprintf, kscroll) and am relatively certain that they are correct (also, none of them are being called currently). In addition, I am relatively certain that the IDT is not causing the error, since placing an infinite loop with interrupts enabled between call init_gdt and call init_idt results in a triple fault (indicating that the GPF fired).

edfed · Post by **edfed** » Thu Apr 17, 2008 8:06 pm

null descriptor SHALL be set to 0
it is clearlly stated in all documents from INTEL.
and it is not a waste, it is only 8 bytes.

Code: Select all

align 8
dw 0   ; will align the gdt part to a dword boundary.
gdtr:
dw gdt.end-gdt-1
dd gdt

gdt:
.null dq 0
.code:
bla bla 
.data:
blablah
.end:

something is hurting my eyes in the first page of this thread:
the ret after reseting of gdtr and segment registers.

even if the sytem is loaded from grub, after, it will be a stand alone system. then, no ret.
and how to be sure the return address will return to the right place?

this is very dirty. some bad design to avoid, possible to do, but bad.

Zenith · Post by **Zenith** » Thu Apr 17, 2008 8:07 pm

Oh my <insert diety here>...

@Ready4Dis: The NULL selector is not free space! It is reserved, and must be set to zero!!!

@Hamster1800:
From the looks of this, you have not followed my advice (or Ready4Dis's advice, for that matter) at all. You're still calling init_gdt, and doing a near from gdt_return. Why would you expect anything to change? Of course you'll still get a GPF!

Ready4Dis · Post by **Ready4Dis** » Thu Apr 17, 2008 8:20 pm

Only suggestion I can make, is trying to put random infinite loops and checking to see which line is creating the crash. Does it crash while trying to jump to init_gdt, or trying to return? Is it when you try doing the far call to set CS? Just try to narrow it down and see where it's screwing up/crapping out. From what I can see, the code looks fine. Maybe try doing nothing in your init_gdt call and see if it still crashes just jump to and returning back from. It's hard to say, I use a completely different environment, but everything looks ok so far.

I am not sure what was being talked about earlier, as you are NOT doing a far call, it's a normal call, so there should only be the EIP on the stack for the call (which you can verify with infinite loops and/or printing out ESP at random intervals to see how much it's moved), so when you return, it doesn't care what code segment you called with as it didn't store it, it will jump back using the code segment you set (0x08) and the EIP that was stored (I beleive the multi-boot says it must be a flat memory model, so the EIP should still be valid). You can try checking what the EIP pushed is (easiest way, is just POP EAX at the start of the function, then infinite loop and check bochs registers, EAX will == EIP return value), and see if it matches the location you are trying to return to.

Zenith · Post by **Zenith** » Thu Apr 17, 2008 8:34 pm

@Ready4Dis: We already determined earlier that the near return was causing the GPF, as replacing it with an infinite loop caused the GPFs to stop.

@Hamster1800: Before reading Ready4Dis's not so reliable suggestion, please try mine first. Please?

@Self: This constant multi-replying is getting annoying...

Hamster1800 · Post by **Hamster1800** » Thu Apr 17, 2008 8:45 pm

karekare0: I did try inlining my code into the boot sequence, and it gave me the GPF. Also, I am using call to call init_gdt. Other kernels do the same. I still do not understand your objection.

all: I placed infinite loops at each line in boot.asm looking for the faulty one, moving it up each time. Please tell me if I was doing something wrong, but I enabled interrupts to check for a triple fault. Each time, bochs told me that interrupt 13 had fired and not been handled causing a triple fault. Indeed, even if I made the code look as follows:

Code: Select all

multiboot_entry:
        sti
        jmp $

it triple faults.

edfed · Post by **edfed** » Thu Apr 17, 2008 8:48 pm

now, it is clear, it is the IDT that i s not set correctlly.

0x13 or 13 ?

Ready4Dis · Post by **Ready4Dis** » Thu Apr 17, 2008 8:49 pm

karekare0 wrote:Oh my <insert diety here>...

@Ready4Dis: The NULL selector is not free space! It is reserved, and must be set to zero!!!

that is not correct, it's a NULL descriptor because it is used to catch NULL exceptions, not because it must remain NULL. Please read up on it before calling me a liar, I use this in my OS which runs on many hardware configs without issue, and all the intel documents agree. Please do some research before calling people out on things that you don't know, just because some guy said one time that it's a NULL descriptor and to set everything to NULL doesn't make it right (just as me telling you that it's ok to use is right, do the research yourself and find the answer, but I can tell you, I did my own research and I'm right

.

And please, did you read his code man? He's not doing a far call, why on earth would a near return cause a GPF? Even if he DID a far call, and a near return, it would just waste 4-bytes of his stack and cause no ill side effects, please learn how these things work before making assumptions. Stop calling my information unreliable, then spatting off crap and read up some more. Let me walk you through why my 'not so reliable suggestion' is more reliable than you think:

Directly from his code (did you even read it before responding?):

Code: Select all

init_gdt:
	lgdt [gdt_begin]

	mov ax,0x10
	mov ds,ax
	mov es,ax
	mov gs,ax
	mov fs,ax
	mov ss,ax
	
	jmp 0x08:gdt_return

gdt_return:
	ret

Code: Select all

call init_gdt

Where in the world do you see a far call there? It looks like a regular call, not a callf, nor a call 0x08:init_gdt. It's a plain jain, 32-bit function call, it pushes EIP on the stack, and EIP only. A normal return is all that's required, and Even if, for some god for saken reason, it decided to push a random value on the stack and treat a normal call as a far call, the segment descriptor is pushed FIRST, and EIP second, so the near return will still return to the correct EIP, just leave the stack 4-bytes off! Now, if he did a far return, and a near call, it would be popping an invalid CS off the stack, but the other way around will not cause his GPF, sorry, keep digging you didn't solve it. not saying it's not a stack or call frame problem, but if he's doing a normal call with a normal return, then there is no issue. Now, I would try looking at where GRUB leaves you, does it leave you in a flat memory model, or does it actually use an offset for something? Are you sure the stack is getting setup properly with valid memory (maybe try setting ESP to a known free memory address to make sure, like 0x5000 or something)? Is your code binary, or a relocateable format? Is it linked to the correct location? These are the thins to be looking for, and is what I was trying to narrow down to find out what exactly is going on. Please stop talking crap that you don't know, I was trying to be nice, but now you're just blatantly calling people liars with bad information of your own.

Ready4Dis · Post by **Ready4Dis** » Thu Apr 17, 2008 8:56 pm

Yes, make sure you don't enable interrupts until your IDT is completely setup, and interrupt handlers in place! also, make sure to disable the PIC interrupts until they are handled as well. If you do an STI before your setup your IDT it will crash any time an interrupt is called. 0x13 is a SIMD instruction fault. 13 (0xd) is a page fault, neither of which should go off from any code you've shown us (unless something is trying to access a bad memory address somewhere, like a bad IDT pointer or jump, etc). Leave interrupts disabled, and see if it gets all the way passed the gdt code.

Hamster1800 · Post by **Hamster1800** » Thu Apr 17, 2008 9:00 pm

@edfed: Of course the IDT isn't set up correctly, I'm putting it in an infinite loop with interrupts enabled before I set it up. My question is why the interrupt is firing in the first place. Also, it is 13 (0xD).

@Ready4Dis: Let me see if I can get all of your questions.

Now, I would try looking at where GRUB leaves you, does it leave you in a flat memory model, or does it actually use an offset for something? I check multiboot specifications. The offset is defined to be 0 and limit is 0xFFFFFFFF for all segment registers.

Are you sure the stack is getting setup properly with valid memory (maybe try setting ESP to a known free memory address to make sure, like 0x5000 or something)? I am setting esp to a location directly after a sizeable resb. If that's not a known free memory address, please correct me.

Is your code binary, or a relocateable format? Is it linked to the correct location? I am not sure exactly what a relocatable format is, but I am using an ELF binary (which hopefully is sufficient to answer your question).

I would be most embarrassed if it were a PIC interrupt, but I will check and try to determine if it is.

I am not certain if I was clear about the behavior based on your response. If I leave interrupts disabled, the code returns correctly and gets through initializing the GDT and IDT until the infinite loop after the call to init_idt. I am simply at a loss as to why the interrupt is being triggered in the first place.

Also, I haven't been able to find anything about disabling only PIC interrupts. Can these be treated separately or not?

neon · Post by **neon** » Thu Apr 17, 2008 9:02 pm

For the sake of debugging purposes: Take the call and ret out completely, and leave it that way. You will still get your #gpf, but at least it will take several possibilities out to ease debugging the problem.

With this, everything Ready4Dis posted is very true. I want to take the possibility of a stack problem out, though, which is why I suggest removing the call completely.

Also, why do you enable hardware interrupts as soon as you create your idt? Of course, your idt is valid; but no irqs are there. So, what happens when the PIT fires off an irq on the next tick? #gpf do to invalid opcode as the irq does not exist.

Keep hardware interrupts disabled (Dont use sti) until you both a) set up irqs and b) remap both the pic and the pit to its proper handler.

*edit:

oops--Ready4Dis beat me to it

Basically I am suggesting that removing your sti instruction in your multiboot_entry routine might work.

Hamster1800 · Post by **Hamster1800** » Thu Apr 17, 2008 9:10 pm

Okay, I will try to proceed with initializing IRQs and remapping the PIC, but I would like something cleared up first.

When I get the interrupt, bochs prints an error message fetch_raw_descriptor: GDT: index (207)40 > limit (17)
I know what the error message means, but is bochs figuring this out from the interrupt, or is it internally finding this alongside the interrupt? In the former case, could an interrupt "trick" bochs into thinking that it is a GPF when it is actually something else?

edfed · Post by **edfed** » Thu Apr 17, 2008 9:13 pm

page fault means try to access a page that doesn't exists.
did you enable paging?
did the grub set a set of page tables?
why do you use grub to load your first DIY OS?
the best way to understand the PM at BOOT is to make it without any external intervention.

bosch is very dumb (and slow), it don't handle interupts correctly, my keyboard IRQ @ 21h is not executed at all with bosch, but in real boot ( from 386 to PIII ) , it works very good.

could an interrupt "trick" bochs into thinking that it is a GPF when it is actually something else?

yes.

neon · Post by **neon** » Thu Apr 17, 2008 9:28 pm

Hamster1800 wrote:When I get the interrupt, bochs prints an error message fetch_raw_descriptor: GDT: index (207)40 > limit (17)
I know what the error message means, but is bochs figuring this out from the interrupt, or is it internally finding this alongside the interrupt? In the former case, could an interrupt "trick" bochs into thinking that it is a GPF when it is actually something else?

Neither. It gets its information from the descriptor offset containing the irq requested stored in your IDT. There is no trickery involved, its standard processor tests. Please see this tutorial, section How Interrupts Work: Detail for information on what the processor does when an interrupt is generated.

bosch is very dumb (and slow), it don't handle interupts correctly, my keyboard IRQ @ 21h is not executed at all with bosch, but in real boot ( from 386 to PIII ) , it works very good.

I personally never have much problems with Bochs at all. (Besides the known timing issue, anyways.) If you code simply fails in an emulator, consider it a compatibility problem with your code, and dont blame the tools.