OSDev.org

Posted: **Sat Mar 21, 2020 12:30 pm**

Hello.
I'm working on a 64 bit kernel for the first time and stuck with interrupts.

What my problem is :

When I load my IDT and call

Code: Select all

sti

my kernel
continously fire interrupt no 13.(General Protection Fault)

I temporarily disabled PIC and that went away. (I was
thinking that maybe the problem was with the fact that I
haven't remapped the PIC yet)

However when I try to raise a software interrupt(using int
$0x1 etc) I again get the general protection fault. I read
here that gpe interrupt pushes RIP(instruction pointer) to
help figure out the faulty instruction. So I took this value
and compared it with the objdump of my kernel. It points
to some instruction in my

Code: Select all

isr_common_stub

Here's how I setup the IDT :

Code: Select all

%macro Fill_IDT_Entry 2
mov qword rax, %1Handler%2
mov word [rbx], ax
mov word [rbx+2], 0x08
mov word [rbx+4], 0b1000111100000000
shr qword rax, 16
mov word [rbx+6], ax
shr qword rax, 16
mov dword [rbx+8], eax
mov dword [rbx+12], 0
add rbx, 16
%endmacro

setup_interrupts:
    ; disbale PIC
    mov al, 0xff
    out 0xa1, al
    out 0x21, al

    mov qword rbx, idt

    ; ISR
    Fill_IDT_Entry isr, 0
    Fill_IDT_Entry isr, 1
    Fill_IDT_Entry isr, 2
    Fill_IDT_Entry isr, 3
    Fill_IDT_Entry isr, 4
    Fill_IDT_Entry isr, 5
    Fill_IDT_Entry isr, 6
    Fill_IDT_Entry isr, 7
    Fill_IDT_Entry isr, 8
    Fill_IDT_Entry isr, 9
    Fill_IDT_Entry isr, 10
    Fill_IDT_Entry isr, 11
    Fill_IDT_Entry isr, 12
    Fill_IDT_Entry isr, 13
    Fill_IDT_Entry isr, 14
    Fill_IDT_Entry isr, 15
    ...

    lidt [idt.idt_pointer]

    ret

Here's my interrupt handling code :

Code: Select all

%macro ISR_Handler 1
isrHandler%1:
    cli
    mov rdi, %1
    mov rsi, 0
    pop rdx
    jmp isr_common_stub
%endmacro

%macro ISR_Handler_ERR 1
isrHandler%1:
    cli
    mov rdi, %1
    pop rsi
    pop rdx
    jmp isr_common_stub
%endmacro

%macro pushallregisters 0
push rax
push rcx
push rdx
push rdi
push rsi
push r8
push r9
push r10
push r11
%endmacro

%macro popallregisters 0
pop r11
pop r10
pop r9
pop r8
pop rsi
pop rdi
pop rdx
pop rcx
pop rax
%endmacro

isr_common_stub:
    pushallregisters
    call kernel_isr_handler
    popallregisters
    sti
    iretq

ISR_Handler 0
ISR_Handler 1
ISR_Handler 2
ISR_Handler 3
ISR_Handler 4
ISR_Handler 5
ISR_Handler 6
ISR_Handler 7
ISR_Handler_ERR 8
ISR_Handler 9
ISR_Handler_ERR 10
ISR_Handler_ERR 11
ISR_Handler_ERR 12
ISR_Handler_ERR 13
ISR_Handler_ERR 14
ISR_Handler 15
...

Thank you.

Posted: **Sat Mar 21, 2020 12:53 pm**

I don't know exactly how to fix it, but the interrupt #13 (GPF) cannot be simply iretq-ed (because 64-bit, if it were 32-bit it would have been iretd). Once you got a double fault, you gotta do your best to alert the user and shut down.

Using a debugger or a disassembler (to figure out where your code crashes) could be pretty good.

On 32-bit (not 64-bit, but perhaps you can port it), I can find out the EIP where the program crashed (RIP in your case) by simply popping a 32-bit (64-bit in your case) element from the stack. You could also push the other registers to help you.

Posted: **Sat Mar 21, 2020 2:01 pm**

There are multiple things wrong with your code. Let's begin:

Your ISR handlers all start with CLI. This is not necessary if you install them as interrupt gates. And personally I never understood why trap gates were a thing. I have only ever used interrupt gates. In a preemptible kernel you may want to use STI to enable interrupts if conditions permit it, but in the first-level interrupt handler there is usually no point, and lots of things that can go wrong. Also STI before IRET is foolish, since IRET pops the flags off the stack, so there is only a minuscule amount of time before the interrupt flag is overwritten by whatever is on stack.

All of your interrupt handlers clobber registers. Before jumping to isr_common_stub, they overwrite RDI and RSI. Not good. You must ensure all registers return to the state they were in before the interrupt. Else you'll get bugs like you wouldn't believe.

Not remapping the PIC is going to be a problem. If it is initialized to PC standard, you will get 18 double faults per second as soon as you enable IRQ0. The aftermath of IBM misunderstanding Intel's spec back in 1980.

There is no need to disable the PIC while initializing the IDT. Before initializing the IDT, your interrupt flag should be zero, and that means the PIC can stand on its hands and waggle its feet and it won't impress the CPU. If the IF is not zero, then fix that post-haste, as you are not in a position to handle interrupts in that state. Also, you set the gate type to 15, which means "trap gate". Set it to 14 ("interrupt gate") to automatically clear the IF before even the first instruction is executed. As I said, I never understood why that feature (that being trap gates) even exists.

What else could it be? Is your TSS set up correctly? And the task register loaded correctly?

Posted: **Sat Mar 21, 2020 7:42 pm**

iProgramInCpp wrote:I don't know exactly how to fix it, but the interrupt #13 (GPF) cannot be simply iretq-ed (because 64-bit, if it were 32-bit it would have been iretd). Once you got a double fault, you gotta do your best to alert the user and shut down.

Using a debugger or a disassembler (to figure out where your code crashes) could be pretty good.

On 32-bit (not 64-bit, but perhaps you can port it), I can find out the EIP where the program crashed (RIP in your case) by simply popping a 32-bit (64-bit in your case) element from the stack. You could also push the other registers to help you.

Great idea. Thank you.

Posted: **Sat Mar 21, 2020 7:53 pm**

nullplan wrote:There are multiple things wrong with your code. Let's begin:

Your ISR handlers all start with CLI. This is not necessary if you install them as interrupt gates. And personally I never understood why trap gates were a thing. I have only ever used interrupt gates. In a preemptible kernel you may want to use STI to enable interrupts if conditions permit it, but in the first-level interrupt handler there is usually no point, and lots of things that can go wrong. Also STI before IRET is foolish, since IRET pops the flags off the stack, so there is only a minuscule amount of time before the interrupt flag is overwritten by whatever is on stack.

All of your interrupt handlers clobber registers. Before jumping to isr_common_stub, they overwrite RDI and RSI. Not good. You must ensure all registers return to the state they were in before the interrupt. Else you'll get bugs like you wouldn't believe.

Not remapping the PIC is going to be a problem. If it is initialized to PC standard, you will get 18 double faults per second as soon as you enable IRQ0. The aftermath of IBM misunderstanding Intel's spec back in 1980.

There is no need to disable the PIC while initializing the IDT. Before initializing the IDT, your interrupt flag should be zero, and that means the PIC can stand on its hands and waggle its feet and it won't impress the CPU. If the IF is not zero, then fix that post-haste, as you are not in a position to handle interrupts in that state. Also, you set the gate type to 15, which means "trap gate". Set it to 14 ("interrupt gate") to automatically clear the IF before even the first instruction is executed. As I said, I never understood why that feature (that being trap gates) even exists.

What else could it be? Is your TSS set up correctly? And the task register loaded correctly?

Yep, when you pointed out I see how foolish sti before IRETQ is. Didn't think it through. I will look into interrupt gates .

And as for modifying registers, I was thinking I'm not using them before that point so modifying is not a problem BUT I FORGOT they can fire at any time and I don't know what their values will be. This really is a major problem and I'm grateful you pointed it out.

Right. I will def look into PIC as soon as I get software interrupts working. (I hope it's possible to ignore PIC till then)

As for TSS and task register, honestly I don't know what they even mean. I'm gonna google now

Lastly, thank you soooo much for taking your time to write this lengthy reply.

Posted: **Sat Mar 21, 2020 11:03 pm**

So I fixed register problems like nullplan pointed out and now the gpe is gone.
Software interrupts are working as they should.

Though when I do a page fault on purpose, it first fires page fault then also fire general protection fault. Is this normal or should I look into it?

Posted: **Sun Mar 22, 2020 2:07 am**

It sounds as if you have a problem with your page-fault handler. Don't forget that it pushes an error code onto the stack. You should be able to handle a page fault without further exceptions being triggered.

Posted: **Sun Mar 22, 2020 3:33 am**

iansjack wrote:It sounds as if you have a problem with your page-fault handler. Don't forget that it pushes an error code onto the stack. You should be able to handle a page fault without further exceptions being triggered.

Yep I only get general protection fault when raising an interrupt that pushes error code.

I fixed it. Problem was I was adding 4 to rsp when it should've been 8.(to clean up the error code)

One more thing. When I do a software interrupt(let's say divide by zero) I only get one interrupt BUT when hardware does it (like when I do something like this : int a = 5/0;) I get the same interrupt continuously.

Is this normal? How can I get it to stop?

Posted: **Sun Mar 22, 2020 4:31 am**

The divide-by-zero exception (along with several others) is classified as a "fault". This means that instead of pushing the address of the next instruction to the stack - as a software interrupt and most hardware interrupts do - it pushes the address of the faulting instruction Thus on return the processor executes the same instruction again; if that still cause the fault it will happen time after time. When you call the divide-by-zero instruction as a software interrupt it acts as you would expect and pushes the instruction following the faulting one (well, there is no faulting instruction in that case).

The point is that faults are errors that can be handled. For example, a page fault may mean that the corresponding memory page resides in a swap file and needs to be reloaded. In that case the exception handler will correct the error, by loading the page from the disk, then return to execute the memory-access instruction again. I guess you can see that there are various ways that a divide-by-zero error could be handled - for example, put a value in the dividend register that represents "not-a-number", put a 1 in the divisor register and then retry the divide. That would return "not-a-number" to the calling routine. Or you might want to jump somewhere else entirely by adjusting the stored return address.

With some other exceptions, for example an overflow exception, you are more likely to want to make some correction and then continue without retrying the instruction that caused the exception. These are classified as "traps" and they push the address of the instruction following the one that caused the exception. (There are also "abort" exceptions, which pretty much mean "bang!" - the system cannot be sensibly recovered, so shut it down. These are generally hardware errors.)

The Intel Software Developer's Manual explains all this, and tells you which exceptions are in which category. It's well worth reading the chapter on Interrupt and Exception Handling.

Posted: **Sun Mar 22, 2020 4:53 am**

iansjack wrote:The divide-by-zero exception (along with several others) is classified as a "fault". This means that instead of pushing the address of the next instruction to the stack - as a software interrupt and most hardware interrupts do - it pushes the address of the faulting instruction Thus on return the processor executes the same instruction again; if that still cause the fault it will happen time after time. When you call the divide-by-zero instruction as a software interrupt it acts as you would expect and pushes the instruction following the faulting one (well, there is no faulting instruction in that case).

The point is that faults are errors that can be handled. For example, a page fault may mean that the corresponding memory page resides in a swap file and needs to be reloaded. In that case the exception handler will correct the error, by loading the page from the disk, then return to execute the memory-access instruction again. I guess you can see that there are various ways that a divide-by-zero error could be handled - for example, put a value in the dividend register that represents "not-a-number", put a 1 in the divisor register and then retry the divide. That would return "not-a-number" to the calling routine. Or you might want to jump somewhere else entirely by adjusting the stored return address.

With some other exceptions, for example an overflow exception, you are more likely to want to make some correction and then continue without retrying the instruction that caused the exception. These are classified as "traps" and they push the address of the instruction following the one that caused the exception. (There are also "abort" exceptions, which pretty much mean "bang!" - the system cannot be sensibly recovered, so shut it down. These are generally hardware errors.)

The Intel Software Developer's Manual explains all this, and tells you which exceptions are in which category. It's well worth reading the chapter on Interrupt and Exception Handling.

Thanks so much. Nicely explained. Will look in to intel software developer's manual.

OSDev.org

Problem with interrupts

Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts

Re: Problem with interrupts