Page 1 of 1

Long Mode Interrupt Handler Error Code Existence

Posted: Tue Apr 07, 2009 7:03 pm
by Gondolin
Okay, I've read both the AMD64 and Intel 64 manuals a few times, and both are extremely unclear about how to handle the error codes that may or may not be pushed during an exception transition. I don't mean how do you use them, I mean how do you know one has been pushed, and therefore should be used and needs to be popped before the iretq? There are obviously well-defined interrupt vectors that CAN have error codes pushed, but Intel's docs claim that the error codes are NOT pushed regardless of vector number if the interrupt is triggered externally or through int N. Specifically I am referring to the end of 5.13 in Volume 3A of Intel® 64 and IA-32 Architectures Software Developer’s Manual:
Error codes are not pushed on the stack for exceptions that are generated externally (with the INTR or LINT[1:0] pins) or the INT n instruction, even if an error code is normally produced for those exceptions.
This says to me that the error code presence is almost completely undefined from the point of view of the handler itself.

I assume no interrupts that CAN generate error codes can be triggered externally, which I believe is true but it is very misleading that the Intel doc says this a special case when the error code is not pushed?

There is no way you can push the error code if you call "int N" because the error code is pushed after the return RIP.

You can't assume the return RIP is anywhere, so if you try to manually check the opcode of the instruction just before the one pointed to by the return RIP and you were wrong (meaning you used the error code as the return RIP) you will probably triple fault.

The maximum value of the pushed error code is 0xFFFF since that covers the only defined non-zero fields for either normal codes or page fault codes. I suppose you could just never map the virtual memory pages that correspond to 0x0-0xFFFF and then check if [RSP + 8] <= 0xFFFF to guarantee error code detection. But stepping back, that is a terrible solution.

And finally, you can just say: never explicitly call an interrupt with "int N" that normally auto-generates an error code. Although I cannot find a statement like this in the docs, it seems to be almost reasonable.

Unless I don't understand part of the documentation available, there is no good solution to this problem? How do you handle it?

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Tue Apr 07, 2009 8:37 pm
by mystran
Now, I don't know anything about long mode.. but this seems general:

You know whether an exception pushes error code or not by consulting the Intel (or AMD?) manuals. It is true that you do not get any exception codes for any IRQs, but you would typically map the IRQs to some (pick any) set of interrupt vectors that are not used by any of the exceptions.

For the old-school PIC there's http://wiki.osdev.org/PIC and for APICs see http://wiki.osdev.org/APIC

As for software interrupts (with INT) you can set the vectors which are used by the exceptions to only be available from ring0, which will cause GPF instead if ring3 code attempts to call those (except the special int3 debugging thingie, IIRC, but there's no exception code for that one me thinks).

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Tue Apr 07, 2009 11:32 pm
by Gondolin
Thanks for the reply, that clarifies a few things. It seems to me that Intel's manual makes an implication with the above statement that I quoted, which is:

It is impossible for an interrupt handler routine that takes an error code when triggered as an exception to know whether it really had an error code pushed after the return RIP or not. Therefore you MUST avoid calling such an interrupt vector with a software interrupt (INT n) or external hardware interrupt (by making sure your IRQs are not mapped to any of these vectors). If this is done then these routines may safely assume the error code is present.

That is quite a lot of information to imply I'd say. The AMD64 manuals don't even mention the entire case. However I think it makes sense to me now.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 12:14 am
by Love4Boobies
I think the Intel manual is pretty clear on what that actually means. Think a bit about what an exception is. If you use INT to call that exception's handler, why would anyone generate an error code for you? The exact same thing applies when you map an IRQ over that exception. That would be a bad idea anyway since you'd need to have the same handler handling both the exception and whatever else that IRQ is asking for. In other words, you only get error codes for normal exceptions, cause under normal conditions.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 12:57 am
by Gondolin
Yes, I realize all of those things now. It would be much less confusing and cleaner though if a separate table was used for exceptions, one that you could not access through "INT n" or an external interrupt (because as you say, neither of those options makes any sense). I expected at least some orthogonality when I saw that the IDT of 256 interrupt handlers could be accessed through INT n, which is not the case at all. However, there are a lot of things you could do to make the Intel architecture cleaner if it weren't for backwards compatibility.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 1:02 am
by Love4Boobies
Well, it actually might make sense to use INT for calling those handlers sometimes. One obvious reason would be testing. I can't think of anything else right now but I'm sure that's just me.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 1:58 am
by Gondolin
Hm, it is true you could use that access for testing, but that would mean you would have to rewrite the routine to handle a "test" error code passed through a different mechanism (register maybe?) and it would not be able to simultaneously handle actual exceptions (a no win situation). If I wanted to test such an exception routine through the interrupt interface I would just link it through another temporary interrupt vector. I really can't think of a reason to allow such access, but you are right, it can be viewed as flexibility.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 2:10 am
by Love4Boobies
There's nothing stopping you from pushing your own error codes on the stack using PUSH.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 11:35 am
by Gondolin
I believe there is something stopping you. The error code is pushed at the very end of the interrupt stack frame, which is not even the same stack as used by the task calling "INT n". Even if it were (which I think it can be if not in Long Mode) it is pushed after the return RIP and other saved state, so you can't just PUSH an error code before the "INT n" instruction. That is what I mean in that it is impossible to replicate the way the error code is handed to the handler during an exception using "INT n". Am I misunderstanding something because I don't see what you mean?

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 12:29 pm
by mystran
Gondolin wrote:I believe there is something stopping you. The error code is pushed at the very end of the interrupt stack frame, which is not even the same stack as used by the task calling "INT n". Even if it were (which I think it can be if not in Long Mode) it is pushed after the return RIP and other saved state, so you can't just PUSH an error code before the "INT n" instruction. That is what I mean in that it is impossible to replicate the way the error code is handed to the handler during an exception using "INT n". Am I misunderstanding something because I don't see what you mean?
Actually it is the same stack IFF you trigger the interrupt while you're in kernel (whether or not you trigger it by an "INT n" or by external source such as IRQ). You only get another stack when coming from user mode.

On the other hand, you can fake exceptions from kernel mode by doing the work manually: push the relevant stuff to the stack and JMP to the exception handler.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 10:01 pm
by JohnnyTheDon
Actually it is the same stack IFF you trigger the interrupt while you're in kernel (whether or not you trigger it by an "INT n" or by external source such as IRQ). You only get another stack when coming from user mode.
That is true unless you use an IST, which is a good idea for execptions because it allows you to run them on a seperate known good stack and prevent double faults.

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Wed Apr 08, 2009 11:02 pm
by mystran
JohnnyTheDon wrote:
Actually it is the same stack IFF you trigger the interrupt while you're in kernel (whether or not you trigger it by an "INT n" or by external source such as IRQ). You only get another stack when coming from user mode.
That is true unless you use an IST, which is a good idea for execptions because it allows you to run them on a seperate known good stack and prevent double faults.
Ah, right. I personally don't see a point of using a separate interrupt stack except for possibly double fault handler (say if you wanna do a blue screen rather than just trigger the 3rd fault), but I guess I'm just thinking in terms of my own kernel designs.. which would be to never cause an exception from kernel code in the first place, and panic if you see one (not entirely true though as at least some generations of my kernel used to trigger page faults from kernel too)... maybe that's just me. :)

Re: Long Mode Interrupt Handler Error Code Existence

Posted: Thu Apr 09, 2009 1:33 am
by Brendan
Hi,

Quick notes....

You can test exception handlers by causing an exception (and shouldn't need to use a software interrupt to test an exception handler).

Having a separate table for exceptions would be nice, but it's just as easy to split the IDT into separate sections (e.g. 0x00 to 0x1F for exceptions, and 0x20 to 0xFF for everything else).

You can detect if an IRQ caused the interrupt by testing the PIC or APIC's "In Service Register", but you'd need to worry about race conditions, etc (it's tricky).

If all software only ever uses a few values for CS (e.g. one CPL=3 code segment and one CPL=0 code segment) then you might be able to search for CS on the stack to determine if an error code is present or not. This would work unless EIP happens to be a valid value for CS, but for most OSs the first page is marked "not present" (to catch null pointers), so EIP is always greater than 0x00001000 and valid values for CS could always be less than that.

If a kernel supports nested interrupts, then each task's kernel stack would need to be large (in case kernel code is interrupted by an IRQ that's interrupted by an IRQ that's interrupted by an IRQ, etc). In this case, if all IRQs use IST then each task's kernel stack could be much smaller (e.g. 10000 tasks at 1 KiB per task, rather than 10000 tasks at 8 KiB per task).

If the kernel is interruptible and supports nested IRQs; I have a theory that IST could be used for dynamic IRQ handler stacks. For e.g. have a small (1 KiB) kernel stack for each task/thread with additional stacks for IRQ handling, rather than having a larger (8 KiB) kernel stack for each task/thread (in case the kernel is interrupted by an IRQ, which is interrupted by another IRQ, which is interrupted by another IRQ, etc). If there's a large number of tasks/threads, this could save you a large amount of RAM (and it might also help a little for "IRQ stack cache locality").


Cheers,

Brendan