Page 3 of 3

Re: The cost of a system call

Posted: Sun Dec 30, 2012 2:48 pm
by Gigasoft
Brendan wrote:IST doesn't work though. The first NMI switches to a specific stack, and before the NMI handler can execute its first instruction a second NMI could occur and trash the return RIP of the first NMI handler. You can't do anything to prevent this in the NMI handler (because you didn't get a chance to execute a single instruction).
That's not right. When an NMI occurs, NMIs become disabled until the next IRET instruction is executed. The NMI handler should be written so as not to invoke further exceptions or enable maskable interrupts. If returning from an MCE inside an NMI is desired, which is usually not the case, you can check if the saved RSP is within the NMI stack, and then perform a manual return without using IRET.

If an NMI handler needs to handle nested exceptions, it can do the following: Point the IST pointer at 40 bytes before the end of the stack. At the start, set IST index to 0. If the saved RSP is not within the NMI stack, move the exception frame 40 bytes ahead and adjust RSP. At the end, set the IST pointer equal to the current RSP and restore the IST index. If the saved RSP was within the NMI stack, the NMI handler should just return immediately. To prevent a stack overflow, the MCE handler should still have a check for RSP within the NMI stack.

Re: The cost of a system call

Posted: Sun Dec 30, 2012 4:49 pm
by Brendan
Hi,
Gigasoft wrote:
Brendan wrote:IST doesn't work though. The first NMI switches to a specific stack, and before the NMI handler can execute its first instruction a second NMI could occur and trash the return RIP of the first NMI handler. You can't do anything to prevent this in the NMI handler (because you didn't get a chance to execute a single instruction).
That's not right. When an NMI occurs, NMIs become disabled until the next IRET instruction is executed. The NMI handler should be written so as not to invoke further exceptions or enable maskable interrupts. If returning from an MCE inside an NMI is desired, which is usually not the case, you can check if the saved RSP is within the NMI stack, and then perform a manual return without using IRET.
The NMI is meant to block further NMIs until the next IRET instruction; but sadly SMI (and machine check exceptions) aren't blocked and the firmware's SMI handler (and/or the OS's MCE handler) can do IRET causing NMI to become unblocked. This means the worst case is NMI, followed by SMI that unblocks NMI, followed by a second NMI. Basically, you can't assume that an NMI actually will block further NMIs.

The only way to defend against this is to avoid IST (and task gates in protected mode) and use a normal interrupt gate. Of course this has it's own problems (NMI at the exact wrong time - e.g. when the kernel is in the middle of doing a task switch and half the CPU's state is wrong), but these problems are at least "work-around-able".


Cheers,

Brendan