Page 1 of 2
8259A vector base reset hangs the system
Posted: Sun Nov 23, 2014 3:23 am
by InsoReiges
Hi,
I have the following problem that i am not sure how to diagnose further. I am writing a piece of real mode code where i reset 8259A PIC vectors and call INT13h. Sometimes when resetting them INT13h stops working and completely hangs the system on an ASUS P77 motherboard. I've spent a lot of time trying to diagnose this and managed to isolate the following test case that reproduces the problem 100% of the time:
while(1) {
1. We are in real mode. PIC vector bases are configured to 0x08, 0x70.
2. Read PIC mask registers and save them
3. Reset PIC vector bases to 0x08, 0x70, i.e. do not actually change anything, just touch both PICs
4. Write PIC mask registers from saved values.
5. Call int13h to read a sector from disk
}
As you can see no RM/PM mode switches are happening and PICs are not really being reconfigured, just reset to the same configuration as they are currently already in.
After several hundreds of iterations INT13h returns ETIMEOUT and soon after system hangs completely. Sector number that was being read is different every time. If i omit step 3 everything works properly. If i manage to repeat step 3 before system hangs everything goes back to normal for a while until it happens again hundreds of attempts later.
I have double checked my PIC reset code, compared it to Linux kernel, OSDev examples, even disassembled the BIOS on this MB to see how it's done there and i am pretty sure i've done it properly. So what the hell is going on? Faulty hardware? Any ideas on how i can diagnose this further?
Re: 8259A vector base reset hangs the system
Posted: Sun Nov 23, 2014 4:02 am
by DLBuunk
Just to be sure, is the interrupt flag cleared during step 3? Or are timer IRQs masked in an other way?
The only thing I can think of is that IRQ0 comes in during the reset and messes it up somehow.
Re: 8259A vector base reset hangs the system
Posted: Sun Nov 23, 2014 4:18 am
by InsoReiges
DLBuunk wrote:Just to be sure, is the interrupt flag cleared during step 3? Or are timer IRQs masked in an other way?
The only thing I can think of is that IRQ0 comes in during the reset and messes it up somehow.
Yes, interrupt flag is cleared at the start of PIC reset and restored afterwards. IRQs are not masked on the PIC however.
What i will attempt is to hook IRQ0 and some others, log if they have been triggered and call original handler. I am suspecting that PICs are somehow put into internal error state during reset and IRQs stop firing, hence the timeout error from INT13h. I will also try to disable NMI but i haven't seen anyone do that and really have no idea how that might be a problem. I will also try to trace int13h call to see what error code path actually make it return an error. This will take some time however and i was hoping if anybody on osdev might have seen this problem before.
Also, as far as i know this happens only on this single MB. However i don't have a lot of test configs and all of them have the same AMI bios on them, different hardware though.
Re: 8259A vector base reset hangs the system
Posted: Sun Nov 23, 2014 4:25 am
by DLBuunk
Can you try masking all IRQs in the PIC before the reset procedure? Clearing the interrupt flag doesn't prevent incoming IRQs from going through the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
Re: 8259A vector base reset hangs the system
Posted: Sun Nov 23, 2014 4:27 am
by InsoReiges
DLBuunk wrote:Can you try masking all IRQs in the PIC before the reset procedure? Clearing the interrupt flag doesn't prevent incoming IRQs from going through the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
I will try that. I don't see how NMI can influence this either, it's just that i am out of ideas
Re: 8259A vector base reset hangs the system
Posted: Mon Nov 24, 2014 6:08 am
by InsoReiges
DLBuunk wrote:Can you try masking all IRQs in the PIC before the reset procedure? Clearing the interrupt flag doesn't prevent incoming IRQs from going through the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
Masking all IRQs on both PICs before reset didn't work, problem still persists
Re: 8259A vector base reset hangs the system
Posted: Tue Nov 25, 2014 4:25 am
by InsoReiges
I've managed to find another motherboard (asus p7p55d with Intel P55 chipset) and effect is reproduced on it. Other configurations work as intended. I've tried looking up any errata on P55 chipset and/or motherboard, but found zero information
Re: 8259A vector base reset hangs the system
Posted: Tue Nov 25, 2014 7:31 am
by Schol-R-LEA
Do you have the code available to be viewed anywhere (e.g., GitHub, Sourceforge, Google Code), or if not, could you Pastebin the relevant section? It may be useful to see the actual code itself.
Also, are the majority of the systems you have tested this code on older models, ones which would not have had an APIC/IOAPIC combination? I am wondering if it is some odd interaction with the IOAPIC somehow that is at fault (which shouldn't be the case, as it shouldn't be active so long as the PIC is enabled, but you never know). It's a highly unlikely possibility, but one I thought I'd mention.
Re: 8259A vector base reset hangs the system
Posted: Tue Nov 25, 2014 8:09 am
by InsoReiges
Schol-R-LEA wrote:Do you have the code available to be viewed anywhere (e.g., GitHub, Sourceforge, Google Code), or if not, could you Pastebin the relevant section? It may be useful to see the actual code itself.
Also, are the majority of the systems you have tested this code on older models, ones which would not have had an APIC/IOAPIC combination? I am wondering if it is some odd interaction with the IOAPIC somehow that is at fault (which shouldn't be the case, as it shouldn't be active so long as the PIC is enabled, but you never know). It's a highly unlikely possibility, but one I thought I'd mention.
No i think all tested systems have APIC/IOAPIC, but i guess that legacy BIOS have them disabled at boot time.
About public code - i think that can be done but will take some time.
Re: 8259A vector base reset hangs the system
Posted: Wed Nov 26, 2014 11:38 pm
by InsoReiges
Ok guys, here's the test code. Compiled with MSVC 1.52, runs in x86 real mode directly after booting from MBR. Sample log output also attached
https://gist.github.com/anonymous/1a0a0025100a1a21b5c9
Re: 8259A vector base reset hangs the system
Posted: Thu Nov 27, 2014 2:31 am
by Brendan
Hi,
Um, that's extremely racey. Don't do it.
In general,
only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
Cheers,
Brendan
Re: 8259A vector base reset hangs the system
Posted: Thu Nov 27, 2014 3:08 am
by InsoReiges
Brendan wrote:Hi,
Um, that's extremely racey. Don't do it.
In general,
only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
Cheers,
Brendan
Sorry, i don't get it. What exactly is extremely racey there?
Also what was that about protected mode? There are no protected mode switches in this code.
Yes, IRQs might have been lost, so what, how does that explain int 13h timeout especially since it works without problem on any other hardware i tested this on? I've disassembled int13h handler on this BIOS and it seems to use PIT channel 2 for timeouts as far as i can see.
Don't get me wrong, i honestly don't understand the issues you've rased, can you be more specific?
Re: 8259A vector base reset hangs the system
Posted: Thu Nov 27, 2014 3:17 am
by Brendan
Hi,
Brendan wrote:Um, that's extremely racey. Don't do it.
Let me see if I can explain this properly..
Imagine a device as a piece of hardware with 2 states:
- Device State 1: Nothing important happening (quiescent)
- Something happens. Device sends an interrupt to the PIC chip and enters a "wait for service" state. Device goes to "Device State 2".
Device State 2: Device is waiting for attention.
- Device receives attention from the CPU. Device goes back to "Device State 1".
Note: The PIT is a little unique, as it's the only device that doesn't have a "Device State 2".
Now imagine the PIC has 3 states:
- PIC State 1: Nothing important happening (quiescent - "Interrupt Received Register" clear, "In Service Register" clear)
- PIC receives an interrupt from a device, and sets the corresponding bit in its "Interrupt Received Register". PIC moves to "PIC State 2".
PIC State 2: PIC is waiting to deliver an IRQ to the CPU.
- When the CPU's "interrupt enable flag" is set, and the PIC chip's "In Service Register" says no higher priority IRQs are in service; the PIC sends the new interrupt to the CPU, clears the bit in the "Interrupt Received Register" and sets that bit in its "In Service Register". PIC moves to "PIC State 3".
PIC State 3: PIC is waiting for either EOI from CPU or another IRQ from a device
- If the CPU sends EOI, then PIC clears the highest set bit in its "In Service Register"; then:
- If the "Interrupt Received Register" is clear and the "In Service Register" is clear, PIC moves back to "PIC state 1"
- If the "In Service Register" is not clear, PIC moves back to "PIC state 3"
- Otherwise, if the "Interrupt Received Register" is not clear, PIC moves back to "PIC state 2"
- If the PIC receives another IRQ from a device, then PIC sets the corresponding bit in its "Interrupt Received Register"; then:
- If the new IRQ is higher priority, PIC moves to "PIC state 2"
- If the new IRQ is lower priority, PIC stays in "PIC state 3"
Normally the order that things happen mean that the device's state and the PIC's state remain synchronised. When you reconfigure the PIC you're forcing it to go directly to "PIC state 1". When this happens, the "Interrupt Received Register" is cleared and any IRQs that were waiting there get lost, causing their devices to become stuck in "Device State 2".
Note: I've simplified this a lot by ignoring the device masking/unmasking. This is partly because the documentation doesn't clearly say when the mask is applied (e.g. if the mask prevents the bit in the IRR from being set; or if the mask causes the bit in the IRR to be ignored). I've also ignored the fact that (from the master PIC's perspective) the slave PIC is a device, and therefore the slave PIC can also get stuck in "Device State 2".
EDIT: Fixed some mistakes in the original version!
Cheers,
Brendan
Re: 8259A vector base reset hangs the system
Posted: Thu Nov 27, 2014 3:29 am
by InsoReiges
Thanks Brendan, a couple of clarifications:
When the CPU's "interrupt flag" is clear, and the PIC chip's "In Service Register" says no higher priority IRQs are in service; the PIC sends the new interrupt to the CPU, clears the bit in the "Interrupt Received Register" and sets that bit in its "In Service Register". PIC moves to "PIC State 3".
You probably mean "When the CPU's "interrupt flag" is
set, not clear?
If the "Interrupt Received Register" is clear, PIC moves back to "PIC state 1"
If the "Interrupt Received Register" is clear, PIC moves back to "PIC state 2"
Both lines sat if the IRR is clear. I imagine some of them should say "not clear"?
Re: 8259A vector base reset hangs the system
Posted: Thu Nov 27, 2014 3:32 am
by Brendan
InsoReiges wrote:Brendan wrote:Um, that's extremely racey. Don't do it.
In general,
only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
Sorry, i don't get it. What exactly is extremely racey there?
I expected that - see previous post.
InsoReiges wrote:Also what was that about protected mode? There are no protected mode switches in this code.
If there's no protected mode, why are you diddling with the PIC that the BIOS is relying on in the first place? This is like throwing shrapnel into a car's engine while it's running.
InsoReiges wrote:Yes, IRQs might have been lost, so what, how does that explain int 13h timeout especially since it works without problem on any other hardware i tested this on?
The BIOS (int 0x13) issues a command to the disk controller. When the command completes the disk controller sends an IRQ. When the IRQ arrives the BIOS gathers results. If the IRQ never arrives, the BIOS gets tired of waiting and returns a time-out (instead of waiting forever itself).
This might be caused by screwing up the PIC and leaving the disk controller in a "waiting for attention" state. It might also be caused by screwed up the master PIC and leaving the slave PIC in a "waiting for attention" state. The latter is probably more likely (due to the BIOS "int 0x13" being a synchronous API).
Cheers,
Brendan