MP and the RESET# pin

Colonel Kernel · Post by **Colonel Kernel** » Fri Feb 11, 2005 12:41 am

This may be a dumb question... I think I know what the answer is, but I just want to be sure.

Let's say you have an MP system (dual Xeons, hyperthreaded P4, whatever). If you intentionally cause a triple fault on one processor, will it cause all processors to reset? IIRC, triple fault asserts RESET#... It would make sense to me if this signal is sent to all processors, not just the one that triple-faulted. Just want to be sure. It's hard to find such a specific statement in the manuals.

Also, if triple-faulting does indeed reset all processors, what about using the 8042 trick? They both assert RESET#, right...?

In case you're wondering, I'm thinking about how to do a kind of "uber-panic" that would happen if an exception is raised during "panic" itself.

Brendan · Post by **Brendan** » Fri Feb 11, 2005 1:24 am

Hi,

Colonel Kernel wrote: This may be a dumb question... I think I know what the answer is, but I just want to be sure.

IMHO it's not a dumb question - I've been wondering about CPU reset and triple faults myself..

Colonel Kernel wrote:Let's say you have an MP system (dual Xeons, hyperthreaded P4, whatever). If you intentionally cause a triple fault on one processor, will it cause all processors to reset? IIRC, triple fault asserts RESET#... It would make sense to me if this signal is sent to all processors, not just the one that triple-faulted. Just want to be sure. It's hard to find such a specific statement in the manuals.

Also, if triple-faulting does indeed reset all processors, what about using the 8042 trick? They both assert RESET#, right...?

I don't think you'll find this in Intel manuals as it depends on how the motherboard manufacturer wanted to wire things up. This is much the same as how the Intel CPU manuals don't specify if a single CPU will reset after a triple fault as the actual resetting is done by external electronics - the chip halts without it.

For both of my dual-CPU servers a triple fault on any CPU does reset the entire computer, but I'm wondering if different computers handle triple faults differently

. A server designed for fault tolerence could be able to use one CPU to restart another CPU if it triple faults, without bringing the whole computer down (ie. a suitably designed OS would be able to recover). Another multi-CPU computer might wire the CPUs to warning LED's, so that if one CPU triple faults it halts and a light comes on (and nothing resets).

In any case I think it's safe to assume that a 8042 reset would reset all CPUs, as it's a deliberate thing.

Cheers,

Brendan

Pype.Clicker · Post by **Pype.Clicker** » Fri Feb 11, 2005 2:52 am

<warning>purely speculative</warning>

I'd say that 3xfaulting one CPU should reset the whole computer. If things have been going *that* bad on one CPU, there are little chance the environment is still reliable on the other CPUs.

If that's not the way things happen, i'd expect the local APIC of other CPUs to receive an Inter-Processor Interrupt to notify it of the other CPU's death, so the final answer would be in the local APIC datasheets ...

bubach · Post by **bubach** » Fri Feb 11, 2005 6:42 am

This might seem like an even dumber question but, don't you guys have more important things to worry about?

Personally I can't even being to speculate how far down the priority list multiprocessor support would go...

Brendan · Post by **Brendan** » Fri Feb 11, 2005 7:45 am

Hi,

Pype.Clicker wrote:I'd say that 3xfaulting one CPU should reset the whole computer. If things have been going *that* bad on one CPU, there are little chance the environment is still reliable on the other CPUs.

On my OS something as simple as messing up kernel's ESP can cause a triple fault, but it's not something that'd be fatal. If the other CPU/s detected the triple fault they could terminate the thread/process that was running and try to continue (or at least attempt to do a soft shutdown - flushing data to disk, etc).

Pype.Clicker wrote:If that's not the way things happen, i'd expect the local APIC of other CPUs to receive an Inter-Processor Interrupt to notify it of the other CPU's death, so the final answer would be in the local APIC datasheets ...

...or the motherboard's NMI connections, or in the machine check MSRs after a machine check exception, or buried deep within motherboard-specific firmware/ACPI code, or BIOS code invoked via the system management interrupt.

Due to lack of available facts, I'm mostly using Intel's System Programmer's Guide as a basis for my guesses. Most specifically, the paragraph in section 5.14, "EXCEPTION AND INTERRUPT REFERENCE" in the part that discusses double faults:

If another exception occurs while attempting to call the double-fault handler, the process enters shutdown mode. This mode is similar to the state following execution of an HLT instruction. In this mode the processor stops executing instructions until an NMI interrupt, SMI interrupt, hardware reset, or INIT# is received. The processor generates a special bus cycle to indicate that it has entered shutdown mode. Software designers may need to be aware of the response of hardware when it goes into shutdown mode. For example, hardware may turn on an indicator light on the front panel, generate an NMI interrupt to record diagnostic information, invoke reset initialization, generate an INIT initialization, or generate an SMI. If any events are pending during shutdown, they will be handled after an wake event from shutdown is processed (for example, A20M# interrupts).

IMHO for the purpose of resetting the computer it's safer to just use the 8042 reset.

I'm more interested in finding out if there's any computers/motherboards that don't reset all CPUs when one triple faults, and what I'd need to do to detect it.

Cheers,

Brendan

Candy · Post by **Candy** » Fri Feb 11, 2005 8:10 am

Brendan wrote:
If another exception occurs while attempting to call the double-fault handler, the process enters shutdown mode. This mode is similar to the state following execution of an HLT instruction. In this mode the processor stops executing instructions until an NMI interrupt, SMI interrupt, hardware reset, or INIT# is received. The processor generates a special bus cycle to indicate that it has entered shutdown mode. Software designers may need to be aware of the response of hardware when it goes into shutdown mode. For example, hardware may turn on an indicator light on the front panel, generate an NMI interrupt to record diagnostic information, invoke reset initialization, generate an INIT initialization, or generate an SMI. If any events are pending during shutdown, they will be handled after an wake event from shutdown is processed (for example, A20M# interrupts).

Would there be a way to connect a LED on my computer to this internal bit, so I could have a triple-faulted-light on my computer? That'd kind of help in OS debugging to distinguish between infinite loops and triple faults...

Colonel Kernel · Post by **Colonel Kernel** » Fri Feb 11, 2005 10:42 am

bubach wrote: This might seem like an even dumber question but, don't you guys have more important things to worry about?
Personally I can't even being to speculate how far down the priority list multiprocessor support would go...

I guess I'm just getting paranoid. I'd like to be able to handle really egregious failures in a predictable way.

What spooks me is that there is a lot of code in my equivalent of panic() that could cause exceptions (at least in theory). Rather than rely on panic() to not fail (code & pray is *not* failsafe IMO), I'd like to set things up so that if an exception happens while running within panic(), the machine will halt/reboot/whatever in a predictable manner. Then I thought about the MP case and realized that I didn't know for sure how it would work.

Paranoia can be hard on the brain...

OSDev.org

MP and the RESET# pin

MP and the RESET# pin

Re:MP and the RESET# pin

Re:MP and the RESET# pin

Re:MP and the RESET# pin

Re:MP and the RESET# pin

Re:MP and the RESET# pin

Re:MP and the RESET# pin