8259A vector base reset hangs the system
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
8259A vector base reset hangs the system
Hi,
I have the following problem that i am not sure how to diagnose further. I am writing a piece of real mode code where i reset 8259A PIC vectors and call INT13h. Sometimes when resetting them INT13h stops working and completely hangs the system on an ASUS P77 motherboard. I've spent a lot of time trying to diagnose this and managed to isolate the following test case that reproduces the problem 100% of the time:
while(1) {
1. We are in real mode. PIC vector bases are configured to 0x08, 0x70.
2. Read PIC mask registers and save them
3. Reset PIC vector bases to 0x08, 0x70, i.e. do not actually change anything, just touch both PICs
4. Write PIC mask registers from saved values.
5. Call int13h to read a sector from disk
}
As you can see no RM/PM mode switches are happening and PICs are not really being reconfigured, just reset to the same configuration as they are currently already in.
After several hundreds of iterations INT13h returns ETIMEOUT and soon after system hangs completely. Sector number that was being read is different every time. If i omit step 3 everything works properly. If i manage to repeat step 3 before system hangs everything goes back to normal for a while until it happens again hundreds of attempts later.
I have double checked my PIC reset code, compared it to Linux kernel, OSDev examples, even disassembled the BIOS on this MB to see how it's done there and i am pretty sure i've done it properly. So what the hell is going on? Faulty hardware? Any ideas on how i can diagnose this further?
I have the following problem that i am not sure how to diagnose further. I am writing a piece of real mode code where i reset 8259A PIC vectors and call INT13h. Sometimes when resetting them INT13h stops working and completely hangs the system on an ASUS P77 motherboard. I've spent a lot of time trying to diagnose this and managed to isolate the following test case that reproduces the problem 100% of the time:
while(1) {
1. We are in real mode. PIC vector bases are configured to 0x08, 0x70.
2. Read PIC mask registers and save them
3. Reset PIC vector bases to 0x08, 0x70, i.e. do not actually change anything, just touch both PICs
4. Write PIC mask registers from saved values.
5. Call int13h to read a sector from disk
}
As you can see no RM/PM mode switches are happening and PICs are not really being reconfigured, just reset to the same configuration as they are currently already in.
After several hundreds of iterations INT13h returns ETIMEOUT and soon after system hangs completely. Sector number that was being read is different every time. If i omit step 3 everything works properly. If i manage to repeat step 3 before system hangs everything goes back to normal for a while until it happens again hundreds of attempts later.
I have double checked my PIC reset code, compared it to Linux kernel, OSDev examples, even disassembled the BIOS on this MB to see how it's done there and i am pretty sure i've done it properly. So what the hell is going on? Faulty hardware? Any ideas on how i can diagnose this further?
Re: 8259A vector base reset hangs the system
Just to be sure, is the interrupt flag cleared during step 3? Or are timer IRQs masked in an other way?
The only thing I can think of is that IRQ0 comes in during the reset and messes it up somehow.
The only thing I can think of is that IRQ0 comes in during the reset and messes it up somehow.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
Yes, interrupt flag is cleared at the start of PIC reset and restored afterwards. IRQs are not masked on the PIC however.DLBuunk wrote:Just to be sure, is the interrupt flag cleared during step 3? Or are timer IRQs masked in an other way?
The only thing I can think of is that IRQ0 comes in during the reset and messes it up somehow.
What i will attempt is to hook IRQ0 and some others, log if they have been triggered and call original handler. I am suspecting that PICs are somehow put into internal error state during reset and IRQs stop firing, hence the timeout error from INT13h. I will also try to disable NMI but i haven't seen anyone do that and really have no idea how that might be a problem. I will also try to trace int13h call to see what error code path actually make it return an error. This will take some time however and i was hoping if anybody on osdev might have seen this problem before.
Also, as far as i know this happens only on this single MB. However i don't have a lot of test configs and all of them have the same AMI bios on them, different hardware though.
Last edited by InsoReiges on Sun Nov 23, 2014 4:37 am, edited 2 times in total.
Re: 8259A vector base reset hangs the system
Can you try masking all IRQs in the PIC before the reset procedure? Clearing the interrupt flag doesn't prevent incoming IRQs from going through the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
I will try that. I don't see how NMI can influence this either, it's just that i am out of ideasDLBuunk wrote:Can you try masking all IRQs in the PIC before the reset procedure? Clearing the interrupt flag doesn't prevent incoming IRQs from going through the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
Masking all IRQs on both PICs before reset didn't work, problem still persistsDLBuunk wrote:Can you try masking all IRQs in the PIC before the reset procedure? Clearing the interrupt flag doesn't prevent incoming IRQs from going through the PIC.
And I don't see how NMI can influence any of this, since it is completely separate from the PIC.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
I've managed to find another motherboard (asus p7p55d with Intel P55 chipset) and effect is reproduced on it. Other configurations work as intended. I've tried looking up any errata on P55 chipset and/or motherboard, but found zero information
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: 8259A vector base reset hangs the system
Do you have the code available to be viewed anywhere (e.g., GitHub, Sourceforge, Google Code), or if not, could you Pastebin the relevant section? It may be useful to see the actual code itself.
Also, are the majority of the systems you have tested this code on older models, ones which would not have had an APIC/IOAPIC combination? I am wondering if it is some odd interaction with the IOAPIC somehow that is at fault (which shouldn't be the case, as it shouldn't be active so long as the PIC is enabled, but you never know). It's a highly unlikely possibility, but one I thought I'd mention.
Also, are the majority of the systems you have tested this code on older models, ones which would not have had an APIC/IOAPIC combination? I am wondering if it is some odd interaction with the IOAPIC somehow that is at fault (which shouldn't be the case, as it shouldn't be active so long as the PIC is enabled, but you never know). It's a highly unlikely possibility, but one I thought I'd mention.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
No i think all tested systems have APIC/IOAPIC, but i guess that legacy BIOS have them disabled at boot time.Schol-R-LEA wrote:Do you have the code available to be viewed anywhere (e.g., GitHub, Sourceforge, Google Code), or if not, could you Pastebin the relevant section? It may be useful to see the actual code itself.
Also, are the majority of the systems you have tested this code on older models, ones which would not have had an APIC/IOAPIC combination? I am wondering if it is some odd interaction with the IOAPIC somehow that is at fault (which shouldn't be the case, as it shouldn't be active so long as the PIC is enabled, but you never know). It's a highly unlikely possibility, but one I thought I'd mention.
About public code - i think that can be done but will take some time.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
Ok guys, here's the test code. Compiled with MSVC 1.52, runs in x86 real mode directly after booting from MBR. Sample log output also attached
https://gist.github.com/anonymous/1a0a0025100a1a21b5c9
https://gist.github.com/anonymous/1a0a0025100a1a21b5c9
Re: 8259A vector base reset hangs the system
Hi,
In general, only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
Cheers,
Brendan
Um, that's extremely racey. Don't do it.InsoReiges wrote:Ok guys, here's the test code. Compiled with MSVC 1.52, runs in x86 real mode directly after booting from MBR. Sample log output also attached
https://gist.github.com/anonymous/1a0a0025100a1a21b5c9
In general, only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
Sorry, i don't get it. What exactly is extremely racey there?Brendan wrote:Hi,
Um, that's extremely racey. Don't do it.InsoReiges wrote:Ok guys, here's the test code. Compiled with MSVC 1.52, runs in x86 real mode directly after booting from MBR. Sample log output also attached
https://gist.github.com/anonymous/1a0a0025100a1a21b5c9
In general, only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
Cheers,
Brendan
Also what was that about protected mode? There are no protected mode switches in this code.
Yes, IRQs might have been lost, so what, how does that explain int 13h timeout especially since it works without problem on any other hardware i tested this on? I've disassembled int13h handler on this BIOS and it seems to use PIT channel 2 for timeouts as far as i can see.
Don't get me wrong, i honestly don't understand the issues you've rased, can you be more specific?
Re: 8259A vector base reset hangs the system
Hi,
Imagine a device as a piece of hardware with 2 states:
Now imagine the PIC has 3 states:
Note: I've simplified this a lot by ignoring the device masking/unmasking. This is partly because the documentation doesn't clearly say when the mask is applied (e.g. if the mask prevents the bit in the IRR from being set; or if the mask causes the bit in the IRR to be ignored). I've also ignored the fact that (from the master PIC's perspective) the slave PIC is a device, and therefore the slave PIC can also get stuck in "Device State 2".
EDIT: Fixed some mistakes in the original version!
Cheers,
Brendan
Let me see if I can explain this properly..Brendan wrote:Um, that's extremely racey. Don't do it.
Imagine a device as a piece of hardware with 2 states:
- Device State 1: Nothing important happening (quiescent)
- Something happens. Device sends an interrupt to the PIC chip and enters a "wait for service" state. Device goes to "Device State 2".
- Device receives attention from the CPU. Device goes back to "Device State 1".
Now imagine the PIC has 3 states:
- PIC State 1: Nothing important happening (quiescent - "Interrupt Received Register" clear, "In Service Register" clear)
- PIC receives an interrupt from a device, and sets the corresponding bit in its "Interrupt Received Register". PIC moves to "PIC State 2".
- When the CPU's "interrupt enable flag" is set, and the PIC chip's "In Service Register" says no higher priority IRQs are in service; the PIC sends the new interrupt to the CPU, clears the bit in the "Interrupt Received Register" and sets that bit in its "In Service Register". PIC moves to "PIC State 3".
- If the CPU sends EOI, then PIC clears the highest set bit in its "In Service Register"; then:
- If the "Interrupt Received Register" is clear and the "In Service Register" is clear, PIC moves back to "PIC state 1"
- If the "In Service Register" is not clear, PIC moves back to "PIC state 3"
- Otherwise, if the "Interrupt Received Register" is not clear, PIC moves back to "PIC state 2"
- If the PIC receives another IRQ from a device, then PIC sets the corresponding bit in its "Interrupt Received Register"; then:
- If the new IRQ is higher priority, PIC moves to "PIC state 2"
- If the new IRQ is lower priority, PIC stays in "PIC state 3"
Note: I've simplified this a lot by ignoring the device masking/unmasking. This is partly because the documentation doesn't clearly say when the mask is applied (e.g. if the mask prevents the bit in the IRR from being set; or if the mask causes the bit in the IRR to be ignored). I've also ignored the fact that (from the master PIC's perspective) the slave PIC is a device, and therefore the slave PIC can also get stuck in "Device State 2".
EDIT: Fixed some mistakes in the original version!
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Posts: 22
- Joined: Wed Jul 04, 2012 8:08 am
Re: 8259A vector base reset hangs the system
Thanks Brendan, a couple of clarifications:
You probably mean "When the CPU's "interrupt flag" is set, not clear?When the CPU's "interrupt flag" is clear, and the PIC chip's "In Service Register" says no higher priority IRQs are in service; the PIC sends the new interrupt to the CPU, clears the bit in the "Interrupt Received Register" and sets that bit in its "In Service Register". PIC moves to "PIC State 3".
Both lines sat if the IRR is clear. I imagine some of them should say "not clear"?If the "Interrupt Received Register" is clear, PIC moves back to "PIC state 1"
If the "Interrupt Received Register" is clear, PIC moves back to "PIC state 2"
Re: 8259A vector base reset hangs the system
I expected that - see previous post.InsoReiges wrote:Sorry, i don't get it. What exactly is extremely racey there?Brendan wrote:Um, that's extremely racey. Don't do it.
In general, only one piece of code can handle a device at a time (and it makes no sense to have interrupts enabled in protected mode if the BIOS is still responsible for handling the devices).
Also, each time you mess with the PIC you have to assume IRQs may have been lost (and that all devices may now be stuck in a "waiting for attention forever" state), and therefore you must reset/reinitialise all of the devices.
If there's no protected mode, why are you diddling with the PIC that the BIOS is relying on in the first place? This is like throwing shrapnel into a car's engine while it's running.InsoReiges wrote:Also what was that about protected mode? There are no protected mode switches in this code.
The BIOS (int 0x13) issues a command to the disk controller. When the command completes the disk controller sends an IRQ. When the IRQ arrives the BIOS gathers results. If the IRQ never arrives, the BIOS gets tired of waiting and returns a time-out (instead of waiting forever itself).InsoReiges wrote:Yes, IRQs might have been lost, so what, how does that explain int 13h timeout especially since it works without problem on any other hardware i tested this on?
This might be caused by screwing up the PIC and leaving the disk controller in a "waiting for attention" state. It might also be caused by screwed up the master PIC and leaving the slave PIC in a "waiting for attention" state. The latter is probably more likely (due to the BIOS "int 0x13" being a synchronous API).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.