Page 1 of 1

Fatal Interrupts?

Posted: Mon Feb 16, 2009 1:17 pm
by Creature
I was wondering about processor interrupts; suppose you get a processor interrupt (like a page-fault), when should they be considered fatal? What I mean is; suppose you get a page-fault, should you shutdown your OS or restart it for safety reasons or can you just 'warn' the user of the page-fault and continue happily? I'm not really sure what to do as a divide by zero exception seems pointless to reboot for whilst something more serious like a page-fault seems more or less too dangerous to continue.

When should I best consider continuing or shutting down?

Creature

Re: Fatal Interrupts?

Posted: Mon Feb 16, 2009 1:33 pm
by EQvan
This is a pretty wide-open question. Many kinds of faults are normal operation. A page-fault, in particular, indicates an attempt to reference an area of virtual memory that isn't present in physical RAM, and they happen all the time on a busy system. Normally your application should never see a page fault -- the memory manager should handle it by swapping the desired page into active RAM.

In general, an error is "fatal" only if your code can't recover from it. If an arithmetic error corrupts your data in unknown ways, you may have to report this to the user and exit the program. If the error only corrupts the current activity, however, report that, and continue on.

Many types of computing, embedded, real-time or un-attended software for example, can't report an error to a user. This kind of software must be "fault-tolerant", i.e., it must always be able to recover, even if, in the worst case, the recovery proceeds by way of a reboot. That's a worst-case scenario, however -- consider if a flight-control device aboard an aircraft just rebooted every time it wasn't sure of itself! Or a fire-alarm system, or even a soda machine.

I'll go out on a bit of a limb (because as soon as I say "never", someone will be sure to come up with an exception), but for an ordinary PC-based application program, there should never be a time when fault-handling results in a reboot.

Here's a link that will tell you more than you probably want to know about fault handling; you can find more by entering "fault handling" into Google.

http://www.eventhelix.com/RealtimeMantra/FaultHandling/

Re: Fatal Interrupts?

Posted: Mon Feb 16, 2009 1:42 pm
by teraflop
You should probably design your kernel such that processor exceptions can't happen in kernel mode. If you do get, say, a page fault, that probably means some important data structure got corrupted, and thus there's not much you can do anyway besides panic. (Bear in mind that there's a difference between exceptions, which are generated by the CPU, and interrupts, which come from external hardware and don't mean anything bad has happened.)

In user mode, you have a lot more options. You certainly don't need (or want) to panic because an application tries to dereference a null pointer; you can just remove the process and move on. If it's a hardware interrupt, it's more reasonable to suspend the process, handle the interrupt and resume where you left off.

For more flexibility, the Unix strategy is to translate exceptions into signals, and push them down to the application level. By default, exceptions are assumed to be unrecoverable, but an application that wants to handle page faults etc. can make a system call to register its own handler. In that case, you would want to switch the process's context to the handler and probably provide it with some way to resume if it wants to.

Re: Fatal Interrupts?

Posted: Tue Feb 17, 2009 9:33 am
by Creature
Thanks for your information. I was getting confused about whether or not to make things fatal. Perhaps I should just (for debugging purposes) print a message that something bad happened, so that I know something is wrong somewhere that needs fixing instead of breaking down the entire OS for one CPU interrupt.