OSDev.org

Posted: **Sat Sep 21, 2013 5:29 pm**

I have gotten interrupts online, but there is a particular problem, and that is that whenever an interrupt is invoked, the whole OS triple faults. The way my interrupt system is designed is so for each interrupt, there is a respective assembly handler that pushes some processor data and the interrupt number onto the stack, and calls a c handler which accesses this data by means of a struct. It then takes the respective c handler for this specific interrupt (which is in an array; the array is indexed by the interrupt number. For example, the c handler for interrupt 32 is the 32nd entry in the array). Here is the code that is related to my assembly interrupt handler stubs and a stub:

Code: Select all

.intel_syntax noprefix
.extern generalinterrupthandler


.macro saveRegisters
	pushad
	push gs
	push fs
	push es
	push ds
.endm


.macro restoreRegisters 
	pop ds
	pop es
	pop fs
	pop gs
	popad
.endm


.macro PushData interruptnum
	saveRegisters
	push \interruptnum
.endm



.macro CleanUp
	add esp, 4
	restoreRegisters
.endm


.global interrupt0interrupthandler
.type interrupt0interrupthandler, @function
interrupt0interrupthandler:
 	cli
	PushData 0
	call generalinterrupthandler
	CleanUp
	sti
	iret

I must be cleaning up too much of the stack or too little; that must be the only way I am causing a triple fault as it is probably. But how can I? I popped everything other than the interrupt number, which I cleaned up by adding to esp by four bytes; nothing more or less. Can someone help me?

Posted: **Sun Sep 22, 2013 2:16 am**

The idea of having interrupt vectors calling a central routine, just to followed by a switch case or function table, is generally considered redundant unless you have very special intention.

A more natural way to do this is to have the linker resolved the functions, instead of resolve them in run time with the switch case or function pointer table:

Code: Select all

align 16
_INT_00:
    pusha
    call    INT_00_IN_C
    popa
    iretd

void INT_00_IN_C (void) {
    kprintf ("INT00 : #DE Divide Error Exception.\n");
    ...
}

Furthermore, for some exceptions the cpu push additional information on the stack, so you can't really have a nice single interface for all the combinations.

For the triple fault problem, there is a lack of information to even guess what cause the problem, so general debugging skill apply - breakpoint, code elimination, etc.

Posted: **Sun Sep 22, 2013 9:17 am**

First off, the reason I liked the idea of a function pointer array better was because if I get to the point where I am dynamically loading drivers, it should be a nicer transition for me instead of having all of my interrupt handlers be resolved at link time. Second, sorry if I didn't provide enough info, but I have reduced my triple-fault problem area to these lines of code in my assembly handler stubs that have to do with stack manipulation:

Code: Select all

PushData InterruptNumber 
;Call the central handler 
CleanUp

Which expands into:

Code: Select all

pushad 
push gs
push fs
push es
push ds
push InterruptNumber ; should be a 32 bit number according to GAS... 
; Call central routine 
add esp, 4 ; should clean up the interrupt number
pop ds 
pop es 
pop fs 
pop gs 
popad

When I removed these pieces of code, it wouldn't triple-fault; with them in, it triple-faults. So I must again be cleaning too much of my stack or too little. The register clean up shouldn't be responsible, since I simply pushed them, then popped them, which leaves the interrupt number cleanup, which leaves these two lines of code responsible:

Code: Select all

push InterruptNumber 
add esp, 4

adding to esp can't be causing the problem, since replacing the add to esp command with a pop instruction causes a triple-fault still:

Code: Select all

push InterruptNumber 
pop eax

So now I am sort of stuck as to where I am going wrong...

Posted: **Sun Sep 22, 2013 12:16 pm**

Do you suppose a debugger could help you find out what is happening and what you are doing wrong? It's got to be worth a try.

Posted: **Sun Sep 22, 2013 8:28 pm**

Hi,

ScropTheOSAdventurer wrote:First off, the reason I liked the idea of a function pointer array better was because if I get to the point where I am dynamically loading drivers, it should be a nicer transition for me instead of having all of my interrupt handlers be resolved at link time.

For exception handlers, a common interrupt handler doesn't make sense, as different exception handlers have different requirements (e.g. double fault and NMI using TSS, page fault needing to save CR2 as soon as possible, etc).

Using a common interrupt handler doesn't make sense for "special" interrupts either (IPIs, the kernel's API, spurious IRQs, any timer IRQs that the scheduler relies on, the local APIC's thermal status and performance monitoring interrupts, etc).

For normal IRQs, a common interrupt handler (that takes care of interrupt sharing, sending EOI, etc) is good idea; partly because the actual interrupt handlers have to be setup dynamically after the corresponding device driver tells you what IRQ priority it wants.

ScropTheOSAdventurer wrote:So now I am sort of stuck as to where I am going wrong...

Assuming you're testing this with software interrupts (and not IRQs that need EOI, or exceptions that may push their own extra error code); I suspect the problem has nothing to do with stack and is caused by messed up data segment registers.

For an example; imagine if your boot loader loads a GDT where entry 0x00010 is a "flat 4 GiB data" descriptor and sets GS = 0x0010; then (later) your OS installs its own GDT where entry 0x0010 is something completely different, and it doesn't set/correct GS. Now GS is using values (base, limit, etc) from a descriptor that doesn't exist or isn't a sane data descriptor anymore. Everything would work fine like this (including the interrupt handler's "push gs" instruction and anything that uses GS), until your interrupt handler does "pop gs" and tries to load the old/invalid value back into GS (causing a general protection fault due to trying to load something invalid into GS).

Cheers,

Brendan

Posted: **Wed Sep 25, 2013 2:41 pm**

To your comment about no need for a common interrupt, thanks; I'll change it to where only the IRQs go to it; I just saw a central C handler be used for ISRs in the osdever kernel tutorials, and I thought it wouldn't be that bad of an idea to use for ALL interrupts (I'm new to this). About corrupted data segment registers, I'll look into it. Thanks!

Posted: **Wed Sep 25, 2013 2:51 pm**

I don't think popping into GS is it. I added a panic function call to my general handler (and made it the first line), and the processor triple-faulted before the panic call happened, and thus before I popped into GS. Any other possibilities?

Posted: **Thu Sep 26, 2013 12:08 am**

Hi,

ScropTheOSAdventurer wrote:I don't think popping into GS is it. I added a panic function call to my general handler (and made it the first line), and the processor triple-faulted before the panic call happened, and thus before I popped into GS. Any other possibilities?

At the moment, the only information we have is that something somewhere causes an exception (and the CPU can't start the exception handler so it generates a double fault, and the CPU can't start the double fault exception handler so it generates a triple fault); and some code that looks OK. The problem/s could be anywhere; possibly including failing to load the GDT or IDT correctly, incorrect GDT or IDT entries, bad/wrong segment registers, bad stack (bad address, too small, etc), forgetting to enable A20, forgetting to reconfigure the PIC or IO APIC, a mistake in a page table, a bug in the boot loader (only loading 10 KiB of the kernel when the kernel is 16 KiB), memory management bugs (e.g. accidentally using a page of ROM as RAM), a random piece of code that (e.g.) uses an un-initialised pointer and trashes something, a mistake in the build scripts or makefiles or linker script or something, a bug in the compiler or assembler or linker or emulator (unlikely), etc.

Basically, we know nothing and the bug could be anything (and there may even be multiple bugs); and you need more information about what's going on. Did you try using something like Bochs debugger?

With Bochs alone (no debugger, just looking at the log) you should be able to determine which instruction caused the crash and if it was (e.g.) a general protection fault, a page fault, which values were in various registers when the triple fault occurred, etc. With Bochs debugger, you should be able to put a magic breakpoint ("xchg bx,bx") just before a software interrupt (e.g. "int 0x20") and single-step from the breakpoint up to the instruction that causes the problem; then examine the contents of registers, memory, the IDT, GDT, paging structures, etc (and double-check that they are all correct until you find something that isn't).

Cheers,

Brendan

Posted: **Thu Sep 26, 2013 8:28 am**

Brendan wrote:For exception handlers, a common interrupt handler doesn't make sense, as different exception handlers have different requirements (e.g. double fault and NMI using TSS, page fault needing to save CR2 as soon as possible, etc).

The common exception handler becomes exceedingly useful once you get into the realm of changing how your exceptions are handled,
By reassigning handlers later during run time.

I personally use one for the sake of my Page Fault handler, Breakpoint handler, and Syscall handler.
(ie. I have a separate Page Fault handler for PAE Paging as opposed to legacy Paging).

So the use of a common handler stub is very conditional in my opinion.
It's design dependent.

ScropTheOSAdventurer wrote:I don't think popping into GS is it. I added a panic function call to my general handler (and made it the first line), and the processor triple-faulted before the panic call happened, and thus before I popped into GS. Any other possibilities?

I'm gonna wonder here, are you sure you setup the structure of your IDT entries correctly?
Remember that most C compilers need some extra encouragement do get the setup just right. (ie. __attribute__((packed)).)
Also, an issue I ran into accidentally was I got the lower and higher handler function offsets backwards,
Which basically ended up making my interrupts point into garbage memory.

Posted: **Sat Sep 28, 2013 11:01 am**

Ok, I used the debugger, and I found the error messages that appear right before the triple fault (nice register dump

). Here it is in an image attachment (sorry, I must be doing something wrong, because I can't get it to log to file, only to screen; was too impatient to fix it).

Posted: **Sat Sep 28, 2013 11:25 am**

I see ESP=3, this suggest the stack is seriously messed up.
I suggest you check what is at 10090c and 100474, using tools like objdump or disassembler.

Posted: **Sat Sep 28, 2013 11:54 am**

Ok, I'll continue debugging

Posted: **Sat Sep 28, 2013 7:53 pm**

This is going to sound so stupid(since I should be able to find it), but when I use objdump on my object files and use a search command to find 10090c, I can't find it anywhere. How am I messing this up (sorry, I've just never used objdump)?

Posted: **Sun Sep 29, 2013 12:06 am**

Use the -d switch to disassemble, for example:

Code: Select all

> x86_64-elf-objdump -d kernel64.bin

kernel64.bin:     file format elf64-x86-64

Disassembly of section .text:

ffffffff80101000 <.text>:
ffffffff80101000:	48 bf 00 90 10 00 00 	movabs $0x109000,%rdi
ffffffff80101007:	00 00 00 
ffffffff8010100a:	48 b9 00 00 12 00 00 	movabs $0x120000,%rcx
ffffffff80101011:	00 00 00 
ffffffff80101014:	48 29 f9             	sub    %rdi,%rcx
ffffffff80101017:	48 c1 e9 03          	shr    $0x3,%rcx
ffffffff8010101b:	31 c0                	xor    %eax,%eax
ffffffff8010101d:	f3 48 ab             	rep stos %rax,%es:(%rdi)
....

Then look for the address to see what its doing there.

As you mentioned it fault only when interrupt enabled, however the syndrome only visible with a specific push and pop - while this logically should has no side effect it however suggest the stack might be in bad state to begin with, either point to bogus location, overflowing, or re-entrance issue.

Posted: **Sun Sep 29, 2013 7:25 am**

I understand how to disassemble, but I can't find a full address like 10090c. Instead, I get something like this:

Code: Select all

00000004 <panic>:
   4:   83 ec 1c                sub    $0x1c,%esp
   7:   8b 44 24 20             mov    0x20(%esp),%eax
   b:   89 44 24 04             mov    %eax,0x4(%esp)
   f:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  16:   e8 fc ff ff ff          call   17 <panic+0x13>
  1b:   fa                      cli
  1c:   f4                      hlt

So how am I supposed to find an address like 10090c in data like this?

OSDev.org

Interrupt handlers corrupting stack?

Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?

Re: Interrupt handlers corrupting stack?