This may be a dumb question... I think I know what the answer is, but I just want to be sure.
Let's say you have an MP system (dual Xeons, hyperthreaded P4, whatever). If you intentionally cause a triple fault on one processor, will it cause all processors to reset? IIRC, triple fault asserts RESET#... It would make sense to me if this signal is sent to all processors, not just the one that triple-faulted. Just want to be sure. It's hard to find such a specific statement in the manuals.
Also, if triple-faulting does indeed reset all processors, what about using the 8042 trick? They both assert RESET#, right...?
In case you're wondering, I'm thinking about how to do a kind of "uber-panic" that would happen if an exception is raised during "panic" itself.
MP and the RESET# pin
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
MP and the RESET# pin
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager
Re:MP and the RESET# pin
Hi,
For both of my dual-CPU servers a triple fault on any CPU does reset the entire computer, but I'm wondering if different computers handle triple faults differently . A server designed for fault tolerence could be able to use one CPU to restart another CPU if it triple faults, without bringing the whole computer down (ie. a suitably designed OS would be able to recover). Another multi-CPU computer might wire the CPUs to warning LED's, so that if one CPU triple faults it halts and a light comes on (and nothing resets).
In any case I think it's safe to assume that a 8042 reset would reset all CPUs, as it's a deliberate thing.
Cheers,
Brendan
IMHO it's not a dumb question - I've been wondering about CPU reset and triple faults myself..Colonel Kernel wrote: This may be a dumb question... I think I know what the answer is, but I just want to be sure.
I don't think you'll find this in Intel manuals as it depends on how the motherboard manufacturer wanted to wire things up. This is much the same as how the Intel CPU manuals don't specify if a single CPU will reset after a triple fault as the actual resetting is done by external electronics - the chip halts without it.Colonel Kernel wrote:Let's say you have an MP system (dual Xeons, hyperthreaded P4, whatever). If you intentionally cause a triple fault on one processor, will it cause all processors to reset? IIRC, triple fault asserts RESET#... It would make sense to me if this signal is sent to all processors, not just the one that triple-faulted. Just want to be sure. It's hard to find such a specific statement in the manuals.
Also, if triple-faulting does indeed reset all processors, what about using the 8042 trick? They both assert RESET#, right...?
For both of my dual-CPU servers a triple fault on any CPU does reset the entire computer, but I'm wondering if different computers handle triple faults differently . A server designed for fault tolerence could be able to use one CPU to restart another CPU if it triple faults, without bringing the whole computer down (ie. a suitably designed OS would be able to recover). Another multi-CPU computer might wire the CPUs to warning LED's, so that if one CPU triple faults it halts and a light comes on (and nothing resets).
In any case I think it's safe to assume that a 8042 reset would reset all CPUs, as it's a deliberate thing.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:MP and the RESET# pin
<warning>purely speculative</warning>
I'd say that 3xfaulting one CPU should reset the whole computer. If things have been going *that* bad on one CPU, there are little chance the environment is still reliable on the other CPUs.
If that's not the way things happen, i'd expect the local APIC of other CPUs to receive an Inter-Processor Interrupt to notify it of the other CPU's death, so the final answer would be in the local APIC datasheets ...
I'd say that 3xfaulting one CPU should reset the whole computer. If things have been going *that* bad on one CPU, there are little chance the environment is still reliable on the other CPUs.
If that's not the way things happen, i'd expect the local APIC of other CPUs to receive an Inter-Processor Interrupt to notify it of the other CPU's death, so the final answer would be in the local APIC datasheets ...
Re:MP and the RESET# pin
This might seem like an even dumber question but, don't you guys have more important things to worry about?
Personally I can't even being to speculate how far down the priority list multiprocessor support would go...
Personally I can't even being to speculate how far down the priority list multiprocessor support would go...
Re:MP and the RESET# pin
Hi,
Due to lack of available facts, I'm mostly using Intel's System Programmer's Guide as a basis for my guesses. Most specifically, the paragraph in section 5.14, "EXCEPTION AND INTERRUPT REFERENCE" in the part that discusses double faults:
I'm more interested in finding out if there's any computers/motherboards that don't reset all CPUs when one triple faults, and what I'd need to do to detect it.
Cheers,
Brendan
On my OS something as simple as messing up kernel's ESP can cause a triple fault, but it's not something that'd be fatal. If the other CPU/s detected the triple fault they could terminate the thread/process that was running and try to continue (or at least attempt to do a soft shutdown - flushing data to disk, etc).Pype.Clicker wrote:I'd say that 3xfaulting one CPU should reset the whole computer. If things have been going *that* bad on one CPU, there are little chance the environment is still reliable on the other CPUs.
...or the motherboard's NMI connections, or in the machine check MSRs after a machine check exception, or buried deep within motherboard-specific firmware/ACPI code, or BIOS code invoked via the system management interrupt.Pype.Clicker wrote:If that's not the way things happen, i'd expect the local APIC of other CPUs to receive an Inter-Processor Interrupt to notify it of the other CPU's death, so the final answer would be in the local APIC datasheets ...
Due to lack of available facts, I'm mostly using Intel's System Programmer's Guide as a basis for my guesses. Most specifically, the paragraph in section 5.14, "EXCEPTION AND INTERRUPT REFERENCE" in the part that discusses double faults:
IMHO for the purpose of resetting the computer it's safer to just use the 8042 reset.If another exception occurs while attempting to call the double-fault handler, the process enters shutdown mode. This mode is similar to the state following execution of an HLT instruction. In this mode the processor stops executing instructions until an NMI interrupt, SMI interrupt, hardware reset, or INIT# is received. The processor generates a special bus cycle to indicate that it has entered shutdown mode. Software designers may need to be aware of the response of hardware when it goes into shutdown mode. For example, hardware may turn on an indicator light on the front panel, generate an NMI interrupt to record diagnostic information, invoke reset initialization, generate an INIT initialization, or generate an SMI. If any events are pending during shutdown, they will be handled after an wake event from shutdown is processed (for example, A20M# interrupts).
I'm more interested in finding out if there's any computers/motherboards that don't reset all CPUs when one triple faults, and what I'd need to do to detect it.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:MP and the RESET# pin
Would there be a way to connect a LED on my computer to this internal bit, so I could have a triple-faulted-light on my computer? That'd kind of help in OS debugging to distinguish between infinite loops and triple faults...Brendan wrote:If another exception occurs while attempting to call the double-fault handler, the process enters shutdown mode. This mode is similar to the state following execution of an HLT instruction. In this mode the processor stops executing instructions until an NMI interrupt, SMI interrupt, hardware reset, or INIT# is received. The processor generates a special bus cycle to indicate that it has entered shutdown mode. Software designers may need to be aware of the response of hardware when it goes into shutdown mode. For example, hardware may turn on an indicator light on the front panel, generate an NMI interrupt to record diagnostic information, invoke reset initialization, generate an INIT initialization, or generate an SMI. If any events are pending during shutdown, they will be handled after an wake event from shutdown is processed (for example, A20M# interrupts).
- Colonel Kernel
- Member
- Posts: 1437
- Joined: Tue Oct 17, 2006 6:06 pm
- Location: Vancouver, BC, Canada
- Contact:
Re:MP and the RESET# pin
I guess I'm just getting paranoid. I'd like to be able to handle really egregious failures in a predictable way.bubach wrote: This might seem like an even dumber question but, don't you guys have more important things to worry about?
Personally I can't even being to speculate how far down the priority list multiprocessor support would go...
What spooks me is that there is a lot of code in my equivalent of panic() that could cause exceptions (at least in theory). Rather than rely on panic() to not fail (code & pray is *not* failsafe IMO), I'd like to set things up so that if an exception happens while running within panic(), the machine will halt/reboot/whatever in a predictable manner. Then I thought about the MP case and realized that I didn't know for sure how it would work.
Paranoia can be hard on the brain...
Top three reasons why my OS project died:
- Too much overtime at work
- Got married
- My brain got stuck in an infinite loop while trying to design the memory manager