Page 1 of 2

Interrupt Handling

Posted: Thu Sep 18, 2014 7:51 pm
by Isaac
Hi,

I have a question about interrupt handling. Each gate in the IDT contains a pointer to a location in memory at which there is a handler for that interrupt. What should I write in these handlers?

Let's say for example, a divide-by-zero exception occurs, what should I do about it? How should my code handle the exception?

Re: Iterrupt Handling

Posted: Thu Sep 18, 2014 8:04 pm
by neon
Whatever makes sense. It really depends on architecture design. The exceptions can be very tricky to write properly and should be done with care.

For the divide by zero, determine the currently executing process and raise the process divide by 0 signal. The process will either handle it or terminate execution. If there is an attached debugger, insert a break point and send the exception request.

If the exception occurs in the kernel or kernel mode driver or service, send the request to the kernels trap method or raise the exception on the service or driver to give it time to fix the problem or terminate. The service or driver may issue a system call to panic the system if it is a critical service. This gives the system the opportunity to invoke the kernel mode debugger and log the event.

Re: Iterrupt Handling

Posted: Thu Sep 18, 2014 10:18 pm
by KemyLand
It depends completely on your OS-model.

For example, if you think (or better, know) that the error is really critical, let's call Panic() 8)
But if the exception ocurrs in user-space, let's send a SIGKILL :roll:

In a #DE (Divide by Zero), you must first check if this happened in userspace. If true, get the currently running process and kill it, or if you won't kill a little inocent process that just made an error, just tell him to repair things :D (not recommended). Try to compile this in GNU/Linux and run it via Bash:

Code: Select all

int main(int argc, char *argv)
{
    return 1/0;
}
GCC will warn you, but anyway. Run it and you'll get a beautiful SIGFPE (Signal Floating-Point Exception) with core being dumped. That means, Linux likes option 1, and you should, too.

Returning to handling of a #DE, if it happens in kernel-space, you're in a big problem. You must first check if the offending thread is part of a service/driver or the core kernel itself. If it's the core kernel, Panic(). If it's a driver, signal it. NEVER kill a critical kernel thread. You know this will end in a panic, but in a data-loss one. You must also disable COMPLETELY the scheduler and yield directly to that driver's signal handler. If the handler fails, we have lost the war, Panic(). If it returns successfully, good! You've just survived a kernel error! Know it's time to throw an oops to the entire system and re-enable the scheduler.

Anyway, remember to read the wiki. ISR code can't be done in C(++)! And it should be done in assembly. You could do a full stub of assemblies and jump to C code :wink: . It is always vital to do a well-coded iret.

Re: Iterrupt Handling

Posted: Fri Sep 19, 2014 12:41 am
by Brendan
Hi,
Isaac wrote:Each gate in the IDT contains a pointer to a location in memory at which there is a handler for that interrupt. What should I write in these handlers?
Exceptions

I'm going to assume you've got some sort of general purpose function/s to handle "code crashed and can't continue doing what it was doing". This general purpose function might decide to do a kernel panic or terminate the process (depending on who crashed); or to allow the code to continue doing something else (e.g. signals); and can include reporting the problem somehow (adding info to a log, sending an email with any info needed for debugging to the developers, doing a full core dump, etc) where the idea is to help people fix the bug. For software developers (rather than normal end users) it'd be really nice if you were able to launch a debugger, auto-attach it to the crashed process and told the debugger about the problem.

The general purpose function/s to handle "code crashed and can't continue doing what it was doing" would be used by all exception handlers as a last resort (if nothing else can be done). Please note that this generic function may used by more than just exception handlers - e.g. if the kernel detects that it has used a spinlock wrong, or if a process attempts to violate security (e.g. by calling kernel API with dodgy parameters or diddling with segment registers or something in a "seems malicious to me" way), etc.

For a simplified/generic summary:
  • Divide error #0, Overflow #4, Bound range exceeded #5. All these mean code did something silly and crashed - no recovery, go to generic crash handler.
  • Debug exception #1. This is mostly used by debuggers. For example, if one process (the process being debugged) causes a debug exception then you might need to send the details to a different process (the debugger).
  • NMI #2. This mostly indicates a hardware problem (maybe go directly to "kernel panic"?). However it's possible for an OS to use it for something else (e.g. a bad way of profiling kernel code, watchdog timer, etc). It's tricky to support properly for a variety of reasons (many potential problems due to its ability to interrupt anything at any time, including itself).
  • Breakpoint #3. This is mostly used by debuggers (see "Debug exception #1").
  • Invalid opcode #6. The general idea here is to emulate instructions that the CPU doesn't support, so that newer software works on older CPUs.
  • Device not available #7. If an FPU is present, this is mostly used for doing "postponed FPU/MMX/3DNow state" saving by the OS, to improve the performance of task switching (by not loading/saving FPU/MMX/3DNow state when it can be avoided). If an FPU is not present, this exception can be used for emulating an FPU in software.
  • Double fault #8. This means an exception occured and the CPU couldn't start your exception handler. In some (rare) OSs this might not always be a crash (e.g. page fault handler couldn't be started because kernel stack was a "not present" page, so double fault handler fixes the problem by mapping a page there and everything can continue). For most OSs it is always a kernel crash (maybe go directly to "kernel panic").
  • Invalid TSS #10. Excluding extremely rare and bizarre OSs (similar to what's described in "Segment not present" but worse), this always means your kernel attempted to do a hardware task switch that wasn't possible (maybe go directly to "kernel panic").
  • Segment not present #11. For some (rare) OSs that use segmentation, this may be used for virtual memory management (e.g. if the segment isn't present, load the data needed from disk, change the GDT/LDT entry to "present" and continue).
  • Stack fault #12. For some (rare) OSs that use segmentation, this may be used for dynamically increasing stack size (e.g. allocate some more RAM, change stack segment limit to suit and continue).
  • General protection fault #14. Some OSs use this for various tricks (e.g. I don't allow CPL=3 code to use RDTSC directly and emulate it in the general protection fault handler instead, so that I can make it look like RDTSC returns "cycles this thread has consumed so far", which solves a lot of hassles for people using it to measure how fast a piece of code is). It also plays a significant role (for emulation) in your "virtual machine monitor" if you're using virtual8086 mode.
  • Page fault #14. This is almost always used for virtual memory management tricks; including "lazy TLB invalidation", allocation on demand, copy on write, swap space support, memory mapped files, etc.
  • Floating point error #16. This has various causes. One is "FPU register stack underflow/overflow" which can be used to extend the FPU (so that it looks like there are more than 8 FPU registers, and so that software doesn't have to worry about running out when calling functions).
  • Alignment check #17. This exception doesn't indicate a crash and is only used to identify potential performance problems. Maybe send the details to a profiler or add the details to some sort of "profiling log". If there is no profiler or anything then this exception should be disabled.
  • Machine check #18. This mostly indicates a hardware problem (it's like NMI on steroids). Unlike NMI you can determine the cause of the problem and (e.g.) inform the user or log the problem somehow, so that the user knows which piece of hardware needs to be replaced. In some cases recovery is possible (e.g. if the problem is that the CPU's L2 cache is faulty, then maybe you could disable any CPUs using that cache and continue running using other CPUs); but most OSs don't do things like that (and testing that it actually works properly is almost impossible if you do try). This exception can be left disabled - if you do that then you'll get a triple fault instead, and with no information to suggest otherwise the user will assume their hardware is fine and that your kernel is buggy and blame you.
  • SIMD floating point exception #19. This is like the "Floating point error #16" except for SSE/AVX and not FPU.
IRQs

These are sent by PIC chips and/or IO APICs and are mostly used by device drivers. Support for IRQs (including EOIs, etc) and theway device drivers interact with the rest of the OS is a major topic on it's own, and I won't attempt to go into all the little details here.

Other Interrupts

There are more types of interrupts that aren't exceptions or device IRQs. Here's some info on them:
  • Spurious IRQs. Both PIC chips and APICs may generate spurious IRQs; where the kernel is meant to ignore them. I'd recommend keeping track of them (e.g. with three "number of spurious IRQs" counters, for master PIC, slave PIC and APICs) because an excessive number of spurious IRQs can indicate something is dodgy (and people like statistics!).
  • IPIs (Inter-Processor-Interrupts). These are used for multi-CPU, where one CPU wants one or more other CPUs to do something. A common example is "multi-CPU TLB shootdown" (where one CPU changes page tables and has to tell other CPUs to invalidate the effected TLB entry). What you do to handle these depends on what you're using them for.
  • Thermal Sensor. This is sent by the local APIC when the CPU's temperature changes (goes above a "high threshold" or below a "low threshold"). It may be used to influence scheduling and power management decisions (e.g. if a CPU is hot, shift higher priority threads to other CPUs and decrease that CPU's clock speed, in an attempt to avoid thermal throttling that will cripple performance). It can also indicate hardware failures (e.g. if a CPU had very little load and still gets hot, then fan may have failed and/or heatsink may be clogged with dust).
  • Performance monitoring. This interrupt is also sent by local APIC. It's part of the CPU's performance monitoring support; and mostly you'd send details to a profiler. However it can be used for other things. One of my ideas is to use "micro-ops retired" for scheduling instead of time, so that scheduling can be "fair" when different CPUs are running at different speeds.
  • Local APIC timer, HPET, PIT, RTC. These IRQs are all typically used by the kernel itself for a number of things - keeping track of "wall clock time", measuring delays and time-outs, scheduling (maybe!), etc. Normally the kernel would use whatever timer/s are present to implement a "more abstracted" API, where everything else uses the API without caring what the underlying hardware actually is.

Cheers,

Brendan

Re: Iterrupt Handling

Posted: Fri Sep 19, 2014 12:41 am
by neon
Anyway, remember to read the wiki. ISR code can't be done in C(++)! And it should be done in assembly. You could do a full stub of assemblies and jump to C code :wink: . It is always vital to do a well-coded iret.
ISR's can actually be written in C and C++ provided the build environment has support for naked functions or better system level extensions to the language. However it is recommended to write them in assembly language for portability between build environments that do not have these extensions.

In addition, the general census is against the use of a common ISR stub. ISRs do not have common ground and in writing a common stub creates more problems in the long run. The component that implements a specific ISR should be the only component responsible for the ISR and request the kernel or executive to install the given ISR. The ISRs must follow a well defined protocol that allows IRQ sharing. This should be enforced behind well defined operating system interfaces.

Having a common ISR stub tightly couples initial interrupt gates to a single ISR and is completely useless.

Re: Iterrupt Handling

Posted: Fri Sep 19, 2014 1:24 am
by alexfru
KemyLand wrote:

Code: Select all

int main(int argc, char *argv)
{
    return 1/0;
}
GCC will warn you, but anyway. Run it and you'll get a beautiful SIGFPE (Signal Floating-Point Exception) with core being dumped. That means, Linux likes option 1, and you should, too.
That results in undefined behavior and you may not even get a division exception from code like the above. Using the -O2 or -O3 optimization option may eliminate this division. If you really want to cause one, use volatiles.

Compare this run and this run.

See the difference?

In the second case it's still undefined behavior, but the compiler is blinded and does the division because accesses to volatiles must be honored.

Re: Iterrupt Handling

Posted: Fri Sep 19, 2014 1:22 pm
by Gigasoft
An exception is not necessarily a crash, as some people seem to imply. An exception is just an unusual condition. Threads should be able to register exception handlers in some way. If you want to be advanced, you can try to make this compatible with the exception support in your favourite compiler, or if you are rolling your own compiler, feel free to be creative here. Not having any exception support beyond panicking or getting murderous is not too impressive. If the system does not handle an exception, then for a normal user mode application it should just pass the exception to the application. For a Virtual 8086 monitor, you would emulate the behaviour of the faulting instructions. If there is no registered exception handler, or there is a problem with performing the invocation, or the handler indicates failure, then you have a crash. A very common use for exceptions is dealing with application buffers in a system call or a driver. The system must be able to continue on and handle the error if the application passes an invalid pointer.

So, first of all, you perform any behaviour that should happen automatically behind the user's back (such as virtual memory, FPU state management), then you check if the exception should be passed to a debugger, and then you pass the exception to any registered exception handler for kernel mode or user mode, depending on what was executing. Another way of determining how to handle an exception, is to look at the address of the faulting instruction. This is done in the X86-64 ABI, for example.

A software interrupt should just copy its parameters, call the requested function and then return.

For IRQs, you'll at least need to pass a parameter to each driver which will handle the interrupt, so that the driver knows which device it is supposed to handle, in the case of multiple identical devices. Multiple devices may share the same interrupt line, so each device on that interrupt line must be checked. When you are handling an interrupt, the priority of the interrupted thread is completely irrelevant, so you need to make a note of the fact that a CPU is handing an interrupt. You could then disable thread switching completely, or use an alternate value for the current thread's priority instead of the original priority of the interrupted code. After handling all interrupts and before returning to normal code, you would then check if there is a more important thread to be run, which you didn't switch to during the interrupt because of its elevated priority.

Re: Iterrupt Handling

Posted: Fri Sep 19, 2014 1:38 pm
by Isaac
Exceptions 20-31 are reserved. What does that mean? How can I write code for reserved interrupts?

Re: Iterrupt Handling

Posted: Fri Sep 19, 2014 2:02 pm
by Gigasoft
Reserved means that they are currently unused, but may be defined to have a meaning in the future. You should just ignore them, and avoid using these interrupt numbers for IRQs and software interrupts. You can leave these IDT entries blank.

Re: Iterrupt Handling

Posted: Sat Sep 20, 2014 7:45 pm
by Isaac
Can somebody please post some example code in assembly that handles an exception?

Re: Iterrupt Handling

Posted: Sat Sep 20, 2014 8:58 pm
by alexfru
Isaac wrote:Can somebody please post some example code in assembly that handles an exception?
Are you banned from google.com or the Internet?

Re: Iterrupt Handling

Posted: Sun Sep 21, 2014 1:59 pm
by Isaac
I did some googling and still did not find what I want.

Re: Iterrupt Handling

Posted: Sun Sep 21, 2014 5:07 pm
by neon
As previously mentioned, the specific details is operating system dependent. You should be able to implement what you need (Brendan gave an excellent overview) on your own and really need to drop the notion of using tutorials and existing source code to copy from. They do not always exist.

Re: Iterrupt Handling

Posted: Mon Sep 22, 2014 10:30 am
by SpyderTL
Isaac wrote:Can somebody please post some example code in assembly that handles an exception?
Just to get you started, I'll tell you what (most) of my interrupt handlers are currently doing.

My interrupt handlers increment a counter (each interrupt from 0-255 has a 1 byte counter in memory). Then they call IRET to attempt to return to the code that was running.

In addition to that, the following interrupts have additional code:

All of my master PIC interrupts (32-39 in my case) write 0x20 to port 0x20 to notify the PIC that the interrupt has been handled.
All of my slave PIC interrupts (40-48 in my case), write 0x20 to both ports 0xa0 and 0x20 to notify both PICs that the interrupt has been handled.

Also, my timer interrupt handler (32 in my case) has additional code to handle task switching, but it is currently commented out. :)

All of my CPU exception interrupts (0-31) have the default behavior (incrementing a counter and IRET), but most of these will end up in an endless loop, or a triple fault if they are ever triggered. (I'm still working out how to handle these in my system...)

The bottom line is, in order for your system to remain stable, you must at least handle your master and slave PIC interrupts*, and have them notify the PIC that the interrupt has been handled, and then IRET back to the code that was interrupted.

* Your master and slave PIC interrupts are mapped to IRQ 0-16 by default, but you'll probably want to move them, at some point, so they don't conflict with the CPU exception interrupts (0-31).

Re: Interrupt Handling

Posted: Mon Oct 06, 2014 12:13 pm
by Isaac
Do I have to write code for IRQ2 (cascade for slave PIC)?