OSDev.org

Posted: **Sat Jan 30, 2021 8:35 am**

I know that when a CPU makes a call, it pushes the (R)(E)IP register onto the stack.

When a CPU jumps to an ISR, it does that and pushes a few more pieces of data.

So, with non-fatal CPU exceptions, like Divide by Zero errors, (of course talking about userland here, if any exception arises from my kernel, it is a cause for concern), could you simply resolve the exception by moving the IP forward to the next instruction?

Any insight is appreciated!

Posted: **Sat Jan 30, 2021 9:17 am**

rizxt wrote:I know that when a CPU makes a call, it pushes the (R)(E)IP register onto the stack.

When a CPU jumps to an ISR, it does that and pushes a few more pieces of data.

So, with non-fatal CPU exceptions, like Divide by Zero errors, (of course talking about userland here, if any exception arises from my kernel, it is a cause for concern), could you simply resolve the exception by moving the IP forward to the next instruction?

Any insight is appreciated!

If you know what you do and it doesn't cause unwanted side-effects, I suppose you can. My exception handler deals with invalid selector loads by loading a null selector instead and if it wasn't just a load, but the selector actually is used, it will fault later when it is used. Most invalid selector loads in multicore systens are just reloads that are not used, and so this eliminates those exceptions without affecting real invalid usages. Actually, this is how Intel / AMD should have defined it in the first place.

OTOH, I decode the instruction and emulate the result by loading zero. Just skipping it would not work. In the case of divide by zero, you could place maxint in the result and then skip the instruction, something that should be relatively safe to do, but this also requires decoding the instruction.

I also use the exception handler to resolve syscalls by modifying the code in two steps.

Posted: **Sat Jan 30, 2021 10:27 am**

rizxt wrote:I know that when a CPU makes a call, it pushes the (R)(E)IP register onto the stack.

When a CPU jumps to an ISR, it does that and pushes a few more pieces of data.

So, with non-fatal CPU exceptions, like Divide by Zero errors, (of course talking about userland here, if any exception arises from my kernel, it is a cause for concern), could you simply resolve the exception by moving the IP forward to the next instruction?

Any insight is appreciated!

The only thing worse than an application crashing is an application NOT crashing when it should and providing incorrect data.

Divide by 0 is undefined, therefore your program causing it is invalid and needs to be fixed.

Posted: **Sat Jan 30, 2021 1:56 pm**

The CPU is a fixed function, if you feed it certain inputs when it is at a certain state, it will produce the expected output. Unless there's a hardware bug.

Thus, inside ISR, if you set return address and iret, CPU will execute from the return address you set. This is how "software multitasking" works and is used by all mainline OSes all the time. The CPU couldn't care less, it is probably even optimized for such usage.

But you have to make the call as to whether altering program state like this in software is a good idea. Also keep in mind that x86 is trickier as it is a variable length instruction set so you might need to do some machine code parsing around the fault PC to do this reliably.

An example of where you might want to do things like this:

If you have a test app that handles segfault by fixing a bad pointer. "Everything" works except that GCC generated code that preload the bad pointer into a register before using it, so the segfault handler can't really fix the problem effectively as simply fixing the pointer in memory and return would get it another segfault right away.

In this case you can have the part that handles the handler return try to determine that "Oh this stupid segfault test app again" and actually back up the PC to before the instruction that "preload the bad pointer", so the fault can be fixed.

Sounds like an ugly patch? Yes, it truly is. So don't use this too liberally.

Posted: **Sat Jan 30, 2021 9:12 pm**

I really, really wouldn't do this, for two reasons: (1) x86 is a variable-length instruction set, so you have no idea how long an instruction is (other than the fact that it can be at most 15 bytes), and (2) this will just cause other issues in userland. If you skip issues like this you'll end up just producing undefined behavior problems because you have no idea what the program is doing when that ISR is invoked. All you know is that it happened and its your job to fix it somehow. Though this does make me wonder how other OSes handle a divide-by-zero error and other errors like that.

Posted: **Sat Jan 30, 2021 9:13 pm**

Isn't the div instruction always two operands??

Posted: **Sat Jan 30, 2021 9:14 pm**

Also, other OSes handle this by: throwing an exception, or throwing a wrench into the turning gears of a program, or just killing a program.

Posted: **Sat Jan 30, 2021 9:55 pm**

Hi,

The DIV instruction can still be 2-15 bytes depending on its encoding. You would need to decode it if the intent is to step over it. However...integer division is not infinity. It isn't 0. It is undefined. If you were to skip over it, the program will most probably just crash over and over again. It is a fatal error as it makes no sense as it is, mathematically, not resolvable.

OS's check for an active debugger and sends a signal to it. Kernels and boot loaders can do this as well by catching exceptions and sending a request for an active debugger. The original cause should be corrected not ignored.

Posted: **Sat Jan 30, 2021 9:59 pm**

Aight so kudos.

I understand division by zero should be resolved differently depending on scenario:

1) if it's kernel, crash
2) if it's userland code, kill it

Posted: **Sat Jan 30, 2021 11:10 pm**

Division by zero is the only sketchy exception on x86, where anything you said could apply. That is because most other architectures don't trap on division by zero, but return INT_MIN or just some undefined value and set an overflow flag. I mean, what are you going to do for an unresolvable page fault?

In the end, you will have to figure out how to deal with faulting processes. The most common approach is to somehow throw an exception into userspace. Each OS has different mechanism for that, each with benefits and drawbacks. The POSIX way is to just use signals, SIGFPE in your case here. Send that signal to the process. If the signal is blocked or ignored, unblock it and restore the default handler. And the default handler is to either terminate or terminate with core dump. This admits some way for a language runtime (like, e.g. Java) to handle exceptions on its own, with its own mechanism, without having any prescriptions from the kernel.

Terminating the process should always be an overridable default.

Posted: **Mon Feb 01, 2021 2:41 am**

nullplan wrote: In the end, you will have to figure out how to deal with faulting processes. The most common approach is to somehow throw an exception into userspace. Each OS has different mechanism for that, each with benefits and drawbacks. The POSIX way is to just use signals, SIGFPE in your case here. Send that signal to the process. If the signal is blocked or ignored, unblock it and restore the default handler. And the default handler is to either terminate or terminate with core dump. This admits some way for a language runtime (like, e.g. Java) to handle exceptions on its own, with its own mechanism, without having any prescriptions from the kernel.

Terminating the process should always be an overridable default.

In general, it's the compiler's runtime library that determines how exception handling is supposed to work, and not the OS. The application will typically hook exception vectors for its exception handler's and then the OS will deliver them to those as they happen. Signals are just a specific way of doing it. Win32 sets up exception vectors by taking over the exception handlers. Then there is usually a debugger API too which a debugger can use to inform the OS how to deliver exceptions to the debugger.

Posted: **Mon Feb 01, 2021 2:47 am**

xeyes wrote: If you have a test app that handles segfault by fixing a bad pointer. "Everything" works except that GCC generated code that preload the bad pointer into a register before using it, so the segfault handler can't really fix the problem effectively as simply fixing the pointer in memory and return would get it another segfault right away.

In this case you can have the part that handles the handler return try to determine that "Oh this stupid segfault test app again" and actually back up the PC to before the instruction that "preload the bad pointer", so the fault can be fixed.

Sounds like an ugly patch? Yes, it truly is. So don't use this too liberally.

I don't fixup corrupt flat segment registers in user space. That would mask problems with modifying them in a syscall and not restoring them. Actually, I only fix it when a segment register is popped from the stack or loaded from a register. If it's loaded from memory, I don't fix it. This makes it easier since it is only a few opcodes that are involved.

Posted: **Mon Feb 01, 2021 1:11 pm**

rizxt I saw something slightly similar but also somewhat different in development versions of ms windows server 2003 x64
Development versions of that x64 OS did not skip the faulting instruction but patched it instead and then reexecuted again (now without causing the exception)
the exceptions were generated by unaligned stack when using xmm registers with memory reference using stack pointer
the stack was expected to be aligned at 16 but when mistakenly misaligned (the RSP address did not end with 0, but ended with 8 hexadecimally) the execution of aligned version of instruction with xmm register caused an exception at which the OS patched the instruction from aligned to unaligned version like from this:

Code: Select all

movaps xmm11,[rsp+xxx0]

into this:

Code: Select all

movups xmm11,[rsp+xxx0]

later it disappeared from the OS kernel, it was there only temporary to speedup the development and fix problems later

OSDev.org

Can you resolve CPU exceptions by skipping the instructions?

Can you resolve CPU exceptions by skipping the instructions?

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi

Re: Can you resolve CPU exceptions by skipping the instructi