Recovering from exceptions?

pcmattman · Post by **pcmattman** » Fri Feb 23, 2007 4:06 pm

My OS is working alright, I just have one problem, when an exception fires (ie. GPF) my OS basically crashes, nothing I can do about it... Problem is, I want to be able to say to the user "Hey you, a GPF just ocurred? What on earth were you doing?" and then give them an option to continue running the OS or just sit there staring at some trippy black and white text...

How exactly do I recover from an exception to be able to continue where the exception left off?

mathematician · Post by **mathematician** » Sat Feb 24, 2007 11:17 am

Not much point in asking the user what he thinks he is up to, because a GPF would also certainly be the result of a bug in the program, and there is nothing he can do about that. Although a General Protection Exception comes complete with an error code, which in theory might allow the handler to sort out what's wrong an put it right, in practice probably all you can do is to display an error message, and close down the application.

The Intel manual gives a list of 31 possible causes of a GPF, and if you wanted to try and recover from an exception your handler would have to treat each one seperately.

mathematician · Post by **mathematician** » Sat Feb 24, 2007 11:18 am

Not much point in asking the user what he thinks he is up to, because a GPF would also certainly be the result of a bug in the program, and there is nothing he can do about that. Although a General Protection Exception comes complete with an error code, which in theory might allow the handler to sort out what's wrong an put it right, in practice probably all you can do is to display an error message, and close down the application.

The Intel manual gives a list of 31 possible causes of a GPF, and if you wanted to try and recover from an exception your handler would have to treat each one seperately.

Otter · Post by **Otter** » Sat Feb 24, 2007 11:59 am

Does your exception handler work ? Then, of course you can do some text output and ask the user.

But if you want the interrupted program to continue at the instruction after the one which has thrown the exception, you have a problem. The return address you got is the address of the instruction which has thrown the error, not the address of the instruction after this one and it's impossible to find out the address of the next instruction, except you have a disassembler-module which is able to determine the length of an instruction.

But there is another way: You can pass the exception to the exception handler of the process which has thrown the exception. Lets say the process uses "try...catch", the exception handler can continue at the "catch" part.

pcmattman · Post by **pcmattman** » Sat Feb 24, 2007 3:51 pm

The exceeption handler works, that's not a problem.

I'm assuming that it is very difficult, then, to return to the code just after the exception - what if I modified the EIP passed to the exception?

pcmattman · Post by **pcmattman** » Sat Feb 24, 2007 4:21 pm

Hmmm... it worked! All I have to do is increment the EIP that is passed to the exception handler (struct regs* r) and then let the stub surrounding the handler take control.

GLneo · Post by **GLneo** » Sat Feb 24, 2007 6:05 pm

ummm by how much are you incrementing the EIP?

pcmattman · Post by **pcmattman** » Sat Feb 24, 2007 6:08 pm

By one, to skip the code that brought the exception along.

GLneo · Post by **GLneo** » Sat Feb 24, 2007 6:38 pm

but instructions have a variable length, it may work for "cli"'s but some instructions are more bytes long and you'll end up JMPing in the middle of an instruction, like Otter said you'll need "a disassembler-module which is able to determine the length of an instruction"

Brendan · Post by **Brendan** » Sat Feb 24, 2007 6:48 pm

Hi,

pcmattman wrote:By one, to skip the code that brought the exception along.

How do you know that the instruction is one byte long?

How do you know later code doesn't rely on things that were meant to be done by the instruction you've skipped?

Do you test to see if the code that was running at the time actually caused the exception? For e.g. a messed up IDT descriptor for an IRQ can cause a general protection fault (where the code that was running before the CPU tried to start the IRQ handler has absolutely nothing to do with the problem).

In general when a critical error occurs (i.e. an exception that isn't intended, or some sort of condition detected by the kernel that isn't right) you want to put some debugging information somewhere (to help developers fix their problems) and terminate what-ever caused the problem.

The debugging information could mean displaying the values that were in registers *before* the instruction tried to execute (e.g. blue screen of death) or adding these details to the end of a system log, it could be a core dump, or it could be a nice little dialog box saying "This program crashed, do you want to automatically email diagnostic information to the developers of this program?".

You can even have different actions depending on what crashed - for example, a blue screen of death if the kernel crashed, a dialog box if an application crashed and some details added to a log if a device driver crashes.

Cheers,

Brendan

Mikae · Post by **Mikae** » Sun Feb 25, 2007 5:34 am

I didn't try it yet, but may be it will work. What if to place trap gate instead of int gate in the descriptor serving GPF, for example? AFAIR, trap gate differs from int gate only in value of EIP saved: in the case of trap gate EIP after instruction which arises exception is saved. But I don't know, allow CPU to do it or not. I think that differences between traps and ints are hardwired in CPU by numbers (exceptions N1 and N3 are traps, other are interrrupts), so it doesn't matter what stands in descriptor serving interrupt, but may be I am wrong

.

Combuster · Post by **Combuster** » Sun Feb 25, 2007 6:53 am

exceptions are either traps or faults: traps store the eip of the next instruction, faults store the eip of the failing instruction.

Interrupt gates and trap gates work different: interrupt gates clear IF upon entry, while trap gates keep IF as-is.

conclusion: the IDT has no influence of wether the eip before or after the faulty instruction is stored.