Very lost on botched interrupts causing triple fault
Very lost on botched interrupts causing triple fault
This is the repo I will be referring to in this post:
https://github.com/novavita/Novix
(branch: work/rizet+pontaoski/keyboard-input)
I am contributing to an open source OS project, with some of my friends helping out. (not solo here, I didn't write everything)
When I arrived, they had numerous boot bugs, and I spent the last week resolving them. Fixed the multiboot header, and fixed the triple faults.
Or so I thought.
My "fix" for the triple faults involved adding a `cli` instruction before the idling loop, which comes after execution of kernel_main. When time came to write the keyboard driver, this was pointed out to me, and once I removed it, triple faults. I have done a lot of investigation into this.
Please refer to the repo, I don't want too much text on the screen.
In *.../i686/Boot.S*, you can see a `cli` instruction and an `sti` instruction. Upon removal of these lines, the triple faults stop, but the interrupts do not work. If *sti* exists at all the OS triple faults. This seems like nonsensical behavior, so I concluded that the interrupts are botched. I didn't write the interrupts. The person who did has no idea what it does and what the problem may be (i suspect copypasta). I could try to make sense out of this, but I've spent hours and I have gotten nowhere. I turned to this forum for help.
https://github.com/novavita/Novix
(branch: work/rizet+pontaoski/keyboard-input)
I am contributing to an open source OS project, with some of my friends helping out. (not solo here, I didn't write everything)
When I arrived, they had numerous boot bugs, and I spent the last week resolving them. Fixed the multiboot header, and fixed the triple faults.
Or so I thought.
My "fix" for the triple faults involved adding a `cli` instruction before the idling loop, which comes after execution of kernel_main. When time came to write the keyboard driver, this was pointed out to me, and once I removed it, triple faults. I have done a lot of investigation into this.
Please refer to the repo, I don't want too much text on the screen.
In *.../i686/Boot.S*, you can see a `cli` instruction and an `sti` instruction. Upon removal of these lines, the triple faults stop, but the interrupts do not work. If *sti* exists at all the OS triple faults. This seems like nonsensical behavior, so I concluded that the interrupts are botched. I didn't write the interrupts. The person who did has no idea what it does and what the problem may be (i suspect copypasta). I could try to make sense out of this, but I've spent hours and I have gotten nowhere. I turned to this forum for help.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Very lost on botched interrupts causing triple fault
Add exception handlers, have the exception handlers dump the CPU state at the time the exception occurs, and then use that information to locate the fault.
(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)
(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)
Re: Very lost on botched interrupts causing triple fault
Octocontrabass wrote:Add exception handlers, have the exception handlers dump the CPU state at the time the exception occurs, and then use that information to locate the fault.
(Or use an emulator that will dump the CPU state at the time of the exceptions for you, such as QEMU with "-d int".)
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Very lost on botched interrupts causing triple fault
Great, you have the log. So, which exceptions occur shortly before the triple fault? What is the CPU state at the time those exceptions occur?
Re: Very lost on botched interrupts causing triple fault
Paste: https://pastebin.com/utfPSLVrOctocontrabass wrote:Great, you have the log. So, which exceptions occur shortly before the triple fault? What is the CPU state at the time those exceptions occur?
Like I said, I have no idea how to read this.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Very lost on botched interrupts causing triple fault
Code: Select all
0: v=08 e=0000 i=0 cpl=0 IP=0008:080492a2
Code: Select all
check_exception old: 0xffffffff new 0xd
1: v=0d e=0042
Since you haven't implemented exception handlers, the subsequent exceptions are not interesting.
Should external hardware be raising interrupt 8?
Re: Very lost on botched interrupts causing triple fault
From reading the IRQ table, IRQ 8 is the CMOS clock... as I'm virtualizing in QEMU via the qemu command rather than a full configured machine... I don't see why. As well, our interrupt handler doesn't do anything with IRQ 8...Octocontrabass wrote:Interrupt 8 (v=08) is triggered by external hardware (i=0) while your code is running at address 0x080492a2.Code: Select all
0: v=08 e=0000 i=0 cpl=0 IP=0008:080492a2
The CPU raises an exception. In this case it's #GP (v=0d) and the error code (e=0042) indicates a fault with the IDT entry 8.Code: Select all
check_exception old: 0xffffffff new 0xd 1: v=0d e=0042
Since you haven't implemented exception handlers, the subsequent exceptions are not interesting.
Should external hardware be raising interrupt 8?
from GDT.cpp
Code: Select all
struct Registers
{
uint32_t ds;
uint32_t edi, esi, ebp, esp, ebx, edx, ecx, eax;
uint32_t interruptNumber, errorCode;
uint32_t eip, cs, eflags, useresp, ss;
};
void ISRHandlerImpl(Registers& registers)
{
if (registers.interruptNumber == 1 || registers.interruptNumber == 33) {
auto keycode = inb(0x60);
if (keycode < 0) {
return;
}
Terminal::instance->write(keycode);
Terminal::instance->write("\n");
}
}
void ISRHandler(Registers registers)
{
outb(0xA0, 0x20);
ISRHandlerImpl(registers);
outb(0x20, 0x20);
}
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Very lost on botched interrupts causing triple fault
IRQ 8 is not necessarily mapped to interrupt 8. The mapping between IRQs and interrupts is determined by the interrupt controller. Did you configure the interrupt controller?rizxt wrote:From reading the IRQ table, IRQ 8 is the CMOS clock...
You probably don't want any IRQs mapped to interrupt 8, since the CPU uses interrupt 8 for one of its exceptions.
Re: Very lost on botched interrupts causing triple fault
We have not yet configured the interrupt controller.Octocontrabass wrote:IRQ 8 is not necessarily mapped to interrupt 8. The mapping between IRQs and interrupts is determined by the interrupt controller. Did you configure the interrupt controller?rizxt wrote:From reading the IRQ table, IRQ 8 is the CMOS clock...
You probably don't want any IRQs mapped to interrupt 8, since the CPU uses interrupt 8 for one of its exceptions.
Either way, we can see that I'm getting a triple fault due to lack of exception handling. Would I be able to resolve the issue purely based on exception handling? If so, could you give me links to resources on that? Took me a while to figure out the IDT and ISR stuff...
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Very lost on botched interrupts causing triple fault
You need to be sure. You can't handle interrupts if you don't configure the interrupt controller.rizxt wrote:I'm not quite sure if the interrupt controller was configured.
Probably not. You have hardware raising interrupt 8, and you won't be able to tell if interrupt 8 was caused by an IRQ or by a #DF exception. It'll be more helpful once you get the interrupt controller configured to not overlap with exceptions.rizxt wrote:Either way, we can see that I'm getting a triple fault due to lack of exception handling. Would I be able to resolve the issue purely based on exception handling?
The best resources are the Intel and AMD manuals. However, there's a pretty good overview on the wiki.rizxt wrote:If so, could you give me links to resources on that? Took me a while to figure out the IDT and ISR stuff...
Re: Very lost on botched interrupts causing triple fault
I edited my post, to clarify, I haven't configured the interrupt controller.Octocontrabass wrote:You need to be sure. You can't handle interrupts if you don't configure the interrupt controller.rizxt wrote:I'm not quite sure if the interrupt controller was configured.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 5568
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Very lost on botched interrupts causing triple fault
Then you need to configure the interrupt controller.
Re: Very lost on botched interrupts causing triple fault
End of story. Thank you! This explains why clearing interrupts prevented the triple fault. Thank you for explaining it so clearly to me!
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
-
- Member
- Posts: 797
- Joined: Fri Aug 26, 2016 1:41 pm
- Libera.chat IRC: mpetch
Re: Very lost on botched interrupts causing triple fault
I don't know anything about anything but the fact there appears to be a Linux user space address is probably the result of your link options. I looked in your code and I see nothing that does such a mapping which tells me 0x080492a2 is likely bogus and still a result of bad linking. It really is as if your linker script isn't being picked up.
I suspect your triple fault isn't just a matter of not configuring the PIC to remap the interrupts but also because your interrupt routine address is bogus. I happened to amend the build options in Meson (I got it working after I went and pulled out the latest Meson builds, Meson 0.45 is not enough but your docs don't specify a minimum version). Wonder what would happen if you add`-fno-pic` in your main meson.build along with `-static`, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option. The `-z` option is causing the `-T` option to be ignored which tells the linker what linker script to use. When I made these changes it didn't triple fault but the output of `-d int` with QEMU does show the timer is coming in on Interrupt 8 and keyboard on Interrupt 9 since you aren't remapping the the master PIC yet. The end result is the keyboard and timer interrupts are running the wrong service routine but doesn't fault with changes I have made.
I suspect your triple fault isn't just a matter of not configuring the PIC to remap the interrupts but also because your interrupt routine address is bogus. I happened to amend the build options in Meson (I got it working after I went and pulled out the latest Meson builds, Meson 0.45 is not enough but your docs don't specify a minimum version). Wonder what would happen if you add`-fno-pic` in your main meson.build along with `-static`, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option. The `-z` option is causing the `-T` option to be ignored which tells the linker what linker script to use. When I made these changes it didn't triple fault but the output of `-d int` with QEMU does show the timer is coming in on Interrupt 8 and keyboard on Interrupt 9 since you aren't remapping the the master PIC yet. The end result is the keyboard and timer interrupts are running the wrong service routine but doesn't fault with changes I have made.
Last edited by MichaelPetch on Mon Oct 12, 2020 8:16 pm, edited 2 times in total.
Re: Very lost on botched interrupts causing triple fault
Easy question. Replacing "-static" to "-fno-pic" breaks the multiboot header, and getting rid of "-z" would have no effect, as clang already ignores "-z" and "-T", anyway.MichaelPetch wrote:Wonder what would happen if you change `-static` to `-fno-pic` in your main meson.build, and in your arch i686 directory in the meson.build you get rid of the weird `-z` option.
Skylight: https://github.com/austanss/skylight
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".
I make stupid mistakes and my vision is terrible. Not a good combination.
NOTE: Never respond to my posts with "it's too hard".