Very lost on botched interrupts causing triple fault

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Very lost on botched interrupts causing triple fault

Post by austanss »

This is the repo I will be referring to in this post:
https://github.com/novavita/Novix
(branch: work/rizet+pontaoski/keyboard-input)

I am contributing to an open source OS project, with some of my friends helping out. (not solo here, I didn't write everything)

When I arrived, they had numerous boot bugs, and I spent the last week resolving them. Fixed the multiboot header, and fixed the triple faults.

Or so I thought.

My "fix" for the triple faults involved adding a `cli` instruction before the idling loop, which comes after execution of kernel_main. When time came to write the keyboard driver, this was pointed out to me, and once I removed it, triple faults. I have done a lot of investigation into this.

Please refer to the repo, I don't want too much text on the screen.

In *.../i686/Boot.S*, you can see a `cli` instruction and an `sti` instruction. Upon removal of these lines, the triple faults stop, but the interrupts do not work. If *sti* exists at all the OS triple faults. This seems like nonsensical behavior, so I concluded that the interrupts are botched. I didn't write the interrupts. The person who did has no idea what it does and what the problem may be (i suspect copypasta). I could try to make sense out of this, but I've spent hours and I have gotten nowhere. I turned to this forum for help.
Screenshot_20201012_191400.png
Screenshot_20201012_191400.png (23.75 KiB) Viewed 3358 times
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Octocontrabass
Member
Member
Posts: 5601
Joined: Mon Mar 25, 2013 7:01 pm

Re: Very lost on botched interrupts causing triple fault

Post by Octocontrabass »

Add exception handlers, have the exception handlers dump the CPU state at the time the exception occurs, and then use that information to locate the fault.

(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

Octocontrabass wrote:Add exception handlers, have the exception handlers dump the CPU state at the time the exception occurs, and then use that information to locate the fault.

(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)
Screenshot_20201012_192216.png
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Octocontrabass
Member
Member
Posts: 5601
Joined: Mon Mar 25, 2013 7:01 pm

Re: Very lost on botched interrupts causing triple fault

Post by Octocontrabass »

Great, you have the log. So, which exceptions occur shortly before the triple fault? What is the CPU state at the time those exceptions occur?
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

Octocontrabass wrote:Great, you have the log. So, which exceptions occur shortly before the triple fault? What is the CPU state at the time those exceptions occur?
Paste: https://pastebin.com/utfPSLVr

Like I said, I have no idea how to read this.
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Octocontrabass
Member
Member
Posts: 5601
Joined: Mon Mar 25, 2013 7:01 pm

Re: Very lost on botched interrupts causing triple fault

Post by Octocontrabass »

Code: Select all

     0: v=08 e=0000 i=0 cpl=0 IP=0008:080492a2
Interrupt 8 (v=08) is triggered by external hardware (i=0) while your code is running at address 0x080492a2.

Code: Select all

check_exception old: 0xffffffff new 0xd
     1: v=0d e=0042
The CPU raises an exception. In this case it's #GP (v=0d) and the error code (e=0042) indicates a fault with the IDT entry 8.

Since you haven't implemented exception handlers, the subsequent exceptions are not interesting.

Should external hardware be raising interrupt 8?
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

Octocontrabass wrote:

Code: Select all

     0: v=08 e=0000 i=0 cpl=0 IP=0008:080492a2
Interrupt 8 (v=08) is triggered by external hardware (i=0) while your code is running at address 0x080492a2.

Code: Select all

check_exception old: 0xffffffff new 0xd
     1: v=0d e=0042
The CPU raises an exception. In this case it's #GP (v=0d) and the error code (e=0042) indicates a fault with the IDT entry 8.

Since you haven't implemented exception handlers, the subsequent exceptions are not interesting.

Should external hardware be raising interrupt 8?
From reading the IRQ table, IRQ 8 is the CMOS clock... as I'm virtualizing in QEMU via the qemu command rather than a full configured machine... I don't see why. As well, our interrupt handler doesn't do anything with IRQ 8...

from GDT.cpp

Code: Select all

struct Registers
{
        uint32_t ds;
        uint32_t edi, esi, ebp, esp, ebx, edx, ecx, eax;
        uint32_t interruptNumber, errorCode;
        uint32_t eip, cs, eflags, useresp, ss;
};


void ISRHandlerImpl(Registers& registers)
{
        if (registers.interruptNumber == 1 || registers.interruptNumber == 33) {
                auto keycode = inb(0x60);

                if (keycode < 0) {
                        return;
                }

                Terminal::instance->write(keycode);
                Terminal::instance->write("\n");
        }
}

void ISRHandler(Registers registers)
{
        outb(0xA0, 0x20);
        ISRHandlerImpl(registers);
        outb(0x20, 0x20);
}

Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Octocontrabass
Member
Member
Posts: 5601
Joined: Mon Mar 25, 2013 7:01 pm

Re: Very lost on botched interrupts causing triple fault

Post by Octocontrabass »

rizxt wrote:From reading the IRQ table, IRQ 8 is the CMOS clock...
IRQ 8 is not necessarily mapped to interrupt 8. The mapping between IRQs and interrupts is determined by the interrupt controller. Did you configure the interrupt controller?

You probably don't want any IRQs mapped to interrupt 8, since the CPU uses interrupt 8 for one of its exceptions.
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

Octocontrabass wrote:
rizxt wrote:From reading the IRQ table, IRQ 8 is the CMOS clock...
IRQ 8 is not necessarily mapped to interrupt 8. The mapping between IRQs and interrupts is determined by the interrupt controller. Did you configure the interrupt controller?

You probably don't want any IRQs mapped to interrupt 8, since the CPU uses interrupt 8 for one of its exceptions.
We have not yet configured the interrupt controller.

Either way, we can see that I'm getting a triple fault due to lack of exception handling. Would I be able to resolve the issue purely based on exception handling? If so, could you give me links to resources on that? Took me a while to figure out the IDT and ISR stuff...
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Octocontrabass
Member
Member
Posts: 5601
Joined: Mon Mar 25, 2013 7:01 pm

Re: Very lost on botched interrupts causing triple fault

Post by Octocontrabass »

rizxt wrote:I'm not quite sure if the interrupt controller was configured.
You need to be sure. You can't handle interrupts if you don't configure the interrupt controller.
rizxt wrote:Either way, we can see that I'm getting a triple fault due to lack of exception handling. Would I be able to resolve the issue purely based on exception handling?
Probably not. You have hardware raising interrupt 8, and you won't be able to tell if interrupt 8 was caused by an IRQ or by a #DF exception. It'll be more helpful once you get the interrupt controller configured to not overlap with exceptions.
rizxt wrote:If so, could you give me links to resources on that? Took me a while to figure out the IDT and ISR stuff...
The best resources are the Intel and AMD manuals. However, there's a pretty good overview on the wiki.
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

Octocontrabass wrote:
rizxt wrote:I'm not quite sure if the interrupt controller was configured.
You need to be sure. You can't handle interrupts if you don't configure the interrupt controller.
I edited my post, to clarify, I haven't configured the interrupt controller.
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Octocontrabass
Member
Member
Posts: 5601
Joined: Mon Mar 25, 2013 7:01 pm

Re: Very lost on botched interrupts causing triple fault

Post by Octocontrabass »

Then you need to configure the interrupt controller.
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

End of story. Thank you! This explains why clearing interrupts prevented the triple fault. Thank you for explaining it so clearly to me!
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
MichaelPetch
Member
Member
Posts: 799
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: Very lost on botched interrupts causing triple fault

Post by MichaelPetch »

I don't know anything about anything but the fact there appears to be a Linux user space address is probably the result of your link options. I looked in your code and I see nothing that does such a mapping which tells me 0x080492a2 is likely bogus and still a result of bad linking. It really is as if your linker script isn't being picked up.

I suspect your triple fault isn't just a matter of not configuring the PIC to remap the interrupts but also because your interrupt routine address is bogus. I happened to amend the build options in Meson (I got it working after I went and pulled out the latest Meson builds, Meson 0.45 is not enough but your docs don't specify a minimum version). Wonder what would happen if you add`-fno-pic` in your main meson.build along with `-static`, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option. The `-z` option is causing the `-T` option to be ignored which tells the linker what linker script to use. When I made these changes it didn't triple fault but the output of `-d int` with QEMU does show the timer is coming in on Interrupt 8 and keyboard on Interrupt 9 since you aren't remapping the the master PIC yet. The end result is the keyboard and timer interrupts are running the wrong service routine but doesn't fault with changes I have made.
Last edited by MichaelPetch on Mon Oct 12, 2020 8:16 pm, edited 2 times in total.
User avatar
austanss
Member
Member
Posts: 377
Joined: Sun Oct 11, 2020 9:46 pm
Location: United States

Re: Very lost on botched interrupts causing triple fault

Post by austanss »

MichaelPetch wrote:Wonder what would happen if you change `-static` to `-fno-pic` in your main meson.build, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option.
Easy question. Replacing "-static" to "-fno-pic" breaks the multiboot header, and getting rid of "-z" would have no effect, as clang already ignores "-z" and "-T", anyway. :mrgreen:
Skylight: https://github.com/austanss/skylight

I make stupid mistakes and my vision is terrible. Not a good combination.

NOTE: Never respond to my posts with "it's too hard".
Post Reply