Double Fault triggered after enabling hardware Interrupts when executing sti

dragon7307 · Post by **dragon7307** » Thu Jan 23, 2025 8:38 am

Hey,
I have undertaken the journey of starting to develop an operating system (or kernel rather) and am at the point at which I have set up interrupts and enabled the pic. However, as I'm executing sti to enable hardware-triggered interrupts I am receiving a double fault. Actually it depends; if i declare the interrupt on vector 8 in such a way that it pushes the error code itself (which I am aware it should not), then it triggers a double fault. If i don't and expect the error code, then i first get a double fault, followed by self-repeating Page Faults, which I think are due to the fact that the error code hasn'Ät been pushed and therefore the stack becomes unbalanced as things are cleared up that even though the should have been present weren't.
I should mention that my interrupt-routine in general works fine, at least for software triggered interrupts (executed with the int n instruction), but as soon as I execute sti my program breaks at one point or another with a double fault (or at least the exception is received on interrupt vector

, but once again no error code is pushed.

I don't know if this might be an IRQ but as far as I'm concerned 8 is not a standard IRQ vector.
I have written above that i initialized the pic but whether or not the pic is initialized beforehand does not impact the observed behaviour. A such the issue doesn't seem PIC-related.

Has anyone a clue what this might be about?
Thanks already.

PS:

Here's the Github repo: https://github.com/draconware-dev/DraconOS/tree/dev
The dev branch contains the problematic code.

Uncomment sti in 'src/arch/x86/shared/kernel_start.asm' to see the issue in action.

Here's an example log from a test run in which I had registered interrupt vector 8 as a routine that expects an error code.

Code: Select all


// SMM: enter / SMM: after RSM a couple of times (these happen regardless of sti)i

     0: v=08 e=0000 i=0 cpl=0 IP=0008:c01012c0 pc=c01012c0 SP=0010:c0007afc env->regs[R_EAX]=00000000
EAX=00000000 EBX=00023000 ECX=000000a1 EDX=00000000
ESI=0000be00 EDI=00102e00 EBP=c0007b00 ESP=c0007afc
EIP=c01012c0 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     c000069c 00000017
IDT=     c0100030 000007ff
CR0=80000011 CR2=00000000 CR3=00023000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000010 CCD=c0007af0 CCO=ADDL
EFER=0000000000000000
check_exception old: 0xffffffff new 0xe
     1: v=0e e=0000 i=0 cpl=0 IP=0008:c0100028 pc=c0100028 SP=0010:c0007af4 CR2=00000280
EAX=00000001 EBX=00023000 ECX=000003d5 EDX=000003d4
ESI=0000be00 EDI=00102e00 EBP=c0007b00 ESP=c0007af4
EIP=c0100028 EFL=00000292 [--S-A--] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     c000069c 00000017
IDT=     c0100030 000007ff
CR0=80000011 CR2=00000280 CR3=00023000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000008 CCD=c0007af4 CCO=ADDL
EFER=0000000000000000
check_exception old: 0xffffffff new 0xe

[many further and equivalent repetitions of Page fault]

If i push the error code myself, the same applies, except that the log ends after the double fault.

So effectively I'm trying to figure out why that first interrupt without error code on vector 8 occurs after executing sti.

Thanks a lot and appreciate any help

nullplan · Post by **nullplan** » Thu Jan 23, 2025 10:31 am

I would mask out all interrupts (except IRQ 2) when initializing the PIC pair. If the interrupt now doesn't happen, you know it is because of an external interrupt. Then the next question to look at is if the IDTR or the IDT itself are correct.

dragon7307 · Post by **dragon7307** » Thu Jan 23, 2025 10:42 am

Thanks for the reply. However I have already masked all the interrupts. As for the correctness of the idt and/or idtr - is it possible their wrong if software interrupts (int n) work?

nullplan · Post by **nullplan** » Thu Jan 23, 2025 11:12 am

dragon7307 wrote: ↑Thu Jan 23, 2025 10:42 am I have already masked all the interrupts.

I should clarify that I mean the PIC's IMR, not the CPU's interrupt flag. And the code you have published on github enables all interrupts except IRQ 2.

dragon7307 wrote: ↑Thu Jan 23, 2025 10:42 am is it possible their wrong if software interrupts (int n) work?

No, then it should be correct. Of course, it is still possible for the 8th entry to be wrong when the 7th worked correctly.

dragon7307 · Post by **dragon7307** » Thu Jan 23, 2025 11:31 am

I tried masking the pic's registers with no effect::
taken straight from the wiki:

Code: Select all

outb(PIC1_DATA, 0xff);
outb(PIC2_DATA, 0xff);

My Interrupts are stubbed, using all a macro to call a shared handler in c. So either all of them should work or none.

EDIT: This topic fixed my issue: viewtopic.php?p=350763#p350763

Octocontrabass · Post by **Octocontrabass** » Thu Jan 23, 2025 9:55 pm

You deleted or hid your code before I could take a look, but it sounds like the actual problem is that you were enabling interrupts with a STI instruction before you initialized the interrupt controllers. This is a common bug when copying code: there are several examples that have a STI instruction right after the LIDT instruction.

dragon7307 · Post by **dragon7307** » Fri Jan 24, 2025 9:02 am

Well, I have enabled sti after initializing my interrupt controller (PIC), yet even if i unmask it with the new method of masking all interrupts from the beginning, I get a Page Fault as soon as I unmask the IRQ0.2 So this seems to only have worked temporarily. My only guess can be that something might be off with my pic initialization code, though I wouldn't know what as I followed the tutorial on your wiki.
As a reference, here is my assembly that's the first thing that gets called when control is transfered to the kernel.

Code: Select all

[bits 32]
section .boot

extern kernel_main
extern kernel_premain

global _start
_start:

lidt [IDT]

; remove identity mapping
mov ebx, cr3
mov dword [ebx], 0
mov cr3, ebx

call kernel_premain
; mov al, 11111110b
; out 0x21, al
sti
call kernel_main
jmp $

; include IDT and ISRs
%defstr IDT_INC_PATH arch/ARCH/BITS/shared/idt.inc
%include IDT_INC_PATH

kernel_premain those something akin to this:

Code: Select all

pic8259_remap(32, 40);

and my pic implementation looks as follows:

Code: Select all

#include <io.h>
#include <pic8259.h>

void pic8259_remap(uint8_t pic1_offset, uint8_t pic2_offset)
{
    uint8_t pic1_mask;
    uint8_t pic2_mask;

    pic1_mask = inb(PIC1_DATA);
    pic2_mask = inb(PIC2_DATA);

    outb(PIC1_CMD, ICW1_INIT | ICW1_ICW4);
    io_wait();
    outb(PIC2_CMD, ICW1_INIT | ICW1_ICW4);
    io_wait();

    outb(PIC1_DATA, pic1_offset);
    io_wait();
    outb(PIC2_DATA, pic2_offset);
    io_wait();

    outb(PIC1_DATA, 4);
    io_wait();
    outb(PIC2_DATA, 2);
    io_wait();

    outb(PIC1_DATA, ICW4_8086);
    io_wait();
    outb(PIC2_DATA, ICW4_8086);
    io_wait();

    outb(PIC1_DATA, pic1_mask);
    outb(PIC2_DATA, pic2_mask);
}

void pic8259_disable()
{
    outb(PIC1_DATA, 0xFF);
    outb(PIC2_DATA, 0xFF);
}

void pic8259_eoi(uint8_t num)
{
    if(num >= 8)
    {
        outb(PIC2_CMD, 0x20);
    }

    outb(PIC1_CMD, 0x20);
}

void pic8259_mask(uint16_t port, uint8_t mask)
{
    uint8_t value = inb(port) | mask;
    outb(port, value);
}

void pic8259_unmask(uint16_t port, uint8_t mask)
{
    uint8_t value = inb(port) & ~mask;
    outb(port, value);
}

I have got an ISR registered for Interrupts 32 to 48.
I should also mention that the only IRQS i can't unmask without getting a Page Fault are IRQ0 and IRQ4.
So doing

Code: Select all

mov al, 00010001b
out 0x21, al

doesn't break, but unmasking any more would.
Unless my pic initialization is in some way wrong, I'm very confused as to what's happening here.

Thanks ahead of time

Octocontrabass · Post by **Octocontrabass** » Fri Jan 24, 2025 11:18 am

dragon7307 wrote: ↑Fri Jan 24, 2025 9:02 amMy only guess can be that something might be off with my pic initialization code, though I wouldn't know what as I followed the tutorial on your wiki.

The PIC can't direct IRQ0 to interrupt vector 0xE, so it sounds like your PIC initialization is correct and the problem is somewhere else.

dragon7307 wrote: ↑Fri Jan 24, 2025 9:02 amAs a reference, here is my assembly that's the first thing that gets called when control is transfered to the kernel.

I don't have enough context to be able to tell if this code is correct. I see you're removing the identity mapping, but is this code executing from identity-mapped memory? Why are kernel_premain and kernel_main separate functions?

dragon7307 wrote: ↑Fri Jan 24, 2025 9:02 amand my pic implementation looks as follows:

Why does pic8259_remap restore the previous mask register values instead of discarding them and masking all interrupts? Why does pic_disable mask the "IRQ2" interrupt cascade line?

dragon7307 wrote: ↑Fri Jan 24, 2025 9:02 amI should also mention that the only IRQS i can't unmask without getting a Page Fault are IRQ0 and IRQ4.

The PIC also can't direct IRQ4 to interrupt vector 0xE, so it still sounds like the problem is not your PIC initialization code.

dragon7307 · Post by **dragon7307** » Sat Jan 25, 2025 6:10 am

After realizing my outb method sometimes read the stack incorrectly, and turning of optimizations, the issue seems to be a result of either of those.
You're right in that the premature masking only delayed the problem.
Is there a way to mark a topic on this forum as closed/completed?

Octocontrabass · Post by **Octocontrabass** » Sat Jan 25, 2025 1:27 pm

You can edit the title of the first post, but it's not really solved if turning off optimizations makes the problem go away.

MichaelPetch · Post by **MichaelPetch** » Sat Jan 25, 2025 4:24 pm

I know this was mentioned earlier in this thread but can you make your Github repo visible? While you provided a link in your post it isn't available for the rest of us to view. I concur with the view that turning off optimizations isn't a fix. It is masking a problem, one of which could/may be manifesting itself in other ways.

OSDev.org

Double Fault triggered after enabling hardware Interrupts when executing sti

Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti

Re: Double Fault triggered after enabling hardware Interrupts when executing sti