Page 2 of 2

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 3:38 am
by Brendan
HI,
InsoReiges wrote:Thanks Brendan, a couple of clarifications:
When the CPU's "interrupt flag" is clear, and the PIC chip's "In Service Register" says no higher priority IRQs are in service; the PIC sends the new interrupt to the CPU, clears the bit in the "Interrupt Received Register" and sets that bit in its "In Service Register". PIC moves to "PIC State 3".
You probably mean "When the CPU's "interrupt flag" is set, not clear?
Sorry - yes ("when the CPU's "interrupt enable flag is set").
InsoReiges wrote:
If the "Interrupt Received Register" is clear, PIC moves back to "PIC state 1"
If the "Interrupt Received Register" is clear, PIC moves back to "PIC state 2"
Both lines sat if the IRR is clear. I imagine some of them should say "not clear"? :)
Heh - yes. :oops:

I'll edit the post to correct it (otherwise it won't make much sense).


Cheers,

Brendan

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 3:43 am
by InsoReiges
Brendan wrote:
InsoReiges wrote:Also what was that about protected mode? There are no protected mode switches in this code.
If there's no protected mode, why are you diddling with the PIC that the BIOS is relying on in the first place? This is like throwing shrapnel into a car's engine while it's running. :roll:
Yeah, you are right, this is a test that reproduces a problem when switching between RM and PM. Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h. This code is a result of elimination testing - i've removed all PM stuff and managed to reproduce a problem with this tight loop in RM.
Brendan wrote:
InsoReiges wrote:Yes, IRQs might have been lost, so what, how does that explain int 13h timeout especially since it works without problem on any other hardware i tested this on?
The BIOS (int 0x13) issues a command to the disk controller. When the command completes the disk controller sends an IRQ. When the IRQ arrives the BIOS gathers results. If the IRQ never arrives, the BIOS gets tired of waiting and returns a time-out (instead of waiting forever itself).

This might be caused by screwing up the PIC and leaving the disk controller in a "waiting for attention" state. It might also be caused by screwed up the master PIC and leaving the slave PIC in a "waiting for attention" state. The latter is probably more likely (due to the BIOS "int 0x13" being a synchronous API).
Yes, but as far as i know (from BIOS code on this MB) int 13h is synchronous when it works with the disk. It does indeed wait for an IRQ13 to happen but it won't return until it does. What you are saying is that disk controller might send an interrupt besides during usual int 13h code path? Well, that can be tested by hooking RM IDT and logging fired IRQs to serial port.
Brendan wrote: Cheers,
Brendan
Cheers, mate!

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 3:52 am
by InsoReiges
I will be able to trace some IRQs monday, but how do you suppose this can be done correctly anyway? I mean i want to work in protected mode with reconfigured PIC to catch exception and traps but since i don't have any device drivers i also want to return to real mode to ask BIOS to work with a boot disk for me and hence has to reset PIC back to BIOS bases.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 4:04 am
by Brendan
Hi,
InsoReiges wrote:
Brendan wrote:If there's no protected mode, why are you diddling with the PIC that the BIOS is relying on in the first place? This is like throwing shrapnel into a car's engine while it's running. :roll:
Yeah, you are right, this is a test that reproduces a problem when switching between RM and PM. Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h. This code is a result of elimination testing - i've removed all PM stuff and managed to reproduce a problem with this tight loop in RM.
Then the solution is simple - just disable (postpone) IRQs while in protected mode by leaving the CPU's interrupt enable flag clear. In this case all interrupts must be exceptions.

Alternatively (if you must enable IRQs in protected mode for some reason) you can read the PIC's "In Service Register" to determine if an interrupt was an IRQ or an exception. Note: This doesn't work for the master PIC's "spurious IRQ", which is interrupt 0x15. This isn't a problem because interrupt 0x15 is reserved anyway.

A while ago, I wrote some example code that does this (which also lets you use BIOS IRQs and BIOS functions directly from protected mode).
InsoReiges wrote:
Brendan wrote:The BIOS (int 0x13) issues a command to the disk controller. When the command completes the disk controller sends an IRQ. When the IRQ arrives the BIOS gathers results. If the IRQ never arrives, the BIOS gets tired of waiting and returns a time-out (instead of waiting forever itself).

This might be caused by screwing up the PIC and leaving the disk controller in a "waiting for attention" state. It might also be caused by screwed up the master PIC and leaving the slave PIC in a "waiting for attention" state. The latter is probably more likely (due to the BIOS "int 0x13" being a synchronous API).
Yes, but as far as i know (from BIOS code on this MB) int 13h is synchronous when it works with the disk. It does indeed wait for an IRQ13 to happen but it won't return until it does. What you are saying is that disk might send an interrupt besides during usual int 13h code path? Well, that can be tested by hooking RM IDT and logging fired IRQs to serial port.
I'd expect that while waiting for the disk controller's IRQ the BIOS does something like "while( (now < timeout) && (completed == false) ) { hlt; }" (where the PIT's IRQ handler does "now++", and the disk driver's IRQ does "completed = true;").


Cheers,

Brendan

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 4:16 am
by InsoReiges
Brendan wrote:Hi,
InsoReiges wrote:
Brendan wrote:If there's no protected mode, why are you diddling with the PIC that the BIOS is relying on in the first place? This is like throwing shrapnel into a car's engine while it's running. :roll:
Yeah, you are right, this is a test that reproduces a problem when switching between RM and PM. Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h. This code is a result of elimination testing - i've removed all PM stuff and managed to reproduce a problem with this tight loop in RM.
Then the solution is simple - just disable (postpone) IRQs while in protected mode by leaving the CPU's interrupt enable flag clear. In this case all interrupts must be exceptions.

Alternatively (if you must enable IRQs in protected mode for some reason) you can read the PIC's "In Service Register" to determine if an interrupt was an IRQ or an exception. Note: This doesn't work for the master PIC's "spurious IRQ", which is interrupt 0x15. This isn't a problem because interrupt 0x15 is reserved anyway.

A while ago, I wrote some example code that does this (which also lets you use BIOS IRQs and BIOS functions directly from protected mode).
Yes, i actually need timer interrupt while in PM. The way i do this now is put 0xFFFE mask on master and slave PICs so that only IRQ0 is enabled. Thanks for the code sample, i will take a look at it.
Brendan wrote:
InsoReiges wrote:
Brendan wrote:The BIOS (int 0x13) issues a command to the disk controller. When the command completes the disk controller sends an IRQ. When the IRQ arrives the BIOS gathers results. If the IRQ never arrives, the BIOS gets tired of waiting and returns a time-out (instead of waiting forever itself).

This might be caused by screwing up the PIC and leaving the disk controller in a "waiting for attention" state. It might also be caused by screwed up the master PIC and leaving the slave PIC in a "waiting for attention" state. The latter is probably more likely (due to the BIOS "int 0x13" being a synchronous API).
Yes, but as far as i know (from BIOS code on this MB) int 13h is synchronous when it works with the disk. It does indeed wait for an IRQ13 to happen but it won't return until it does. What you are saying is that disk might send an interrupt besides during usual int 13h code path? Well, that can be tested by hooking RM IDT and logging fired IRQs to serial port.
I'd expect that while waiting for the disk controller's IRQ the BIOS does something like "while( (now < timeout) && (completed == false) ) { hlt; }" (where the PIT's IRQ handler does "now++", and the disk driver's IRQ does "completed = true;").
You are correct, BIOS disasm shows roughly this algorithm. Except that BIOS uses PIT channel 2 which is not reported through IRQ0 but through some I/O port i can't remeber now (64h i think). However it should not be a problem if disk controller and BIOS only use IRQ13 this way, e.g. to report int13h request completion, should it? I mean disk controller has to trigger IRQ13 some other time besides inside int13h handler. Otherwise there is nothing for the controller to wait for which is presumably screwed up by resetting PICs.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 4:22 am
by alexfru
InsoReiges wrote: Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h. This code is a result of elimination testing - i've removed all PM stuff and managed to reproduce a problem with this tight loop in RM.
It would be best to use v86 if you rely on the BIOS to perform I/O. Understandably, it will require quite a bit of new code, but no interrupts will go missing and you'll be able to do something useful while the BIOS is waiting for I/O completion.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 4:24 am
by InsoReiges
alexfru wrote:
InsoReiges wrote: Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h. This code is a result of elimination testing - i've removed all PM stuff and managed to reproduce a problem with this tight loop in RM.
It would be best to use v86 if you rely on the BIOS to perform I/O. Understandably, it will require quite a bit of new code, but no interrupts will go missing and you'll be able to do something useful while the BIOS is waiting for I/O completion.
Well according to this: http://wiki.osdev.org/Virtual_8086_Mode using VM86 for disk access is dodgy.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 4:43 am
by alexfru
InsoReiges wrote:
alexfru wrote:
InsoReiges wrote: Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h. This code is a result of elimination testing - i've removed all PM stuff and managed to reproduce a problem with this tight loop in RM.
It would be best to use v86 if you rely on the BIOS to perform I/O. Understandably, it will require quite a bit of new code, but no interrupts will go missing and you'll be able to do something useful while the BIOS is waiting for I/O completion.
Well according to this: http://wiki.osdev.org/Virtual_8086_Mode using VM86 for disk access is dodgy.
If you make a very simple implementation, you may run into issues.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 4:50 am
by Schol-R-LEA
InsoReiges wrote: Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h.
Two questions: one, are all of the required disk accesses reads, and would it be possible to perform them all during the RM build-up (along with retrieving things like memory size and video capabilities) prior to switching to PM initially, and two, are there any technical limitations that would prevent you from incorporating a least a minimal disk driver into your PM code? Either of these solutions would eliminate the need to switch back to RM, hopefully. I expect you've considered both of these options already, but it may be useful to review the reasons why they would or wouldn't be suitable solutions with us so we can brainstorm together.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 5:58 am
by InsoReiges
Schol-R-LEA wrote:
InsoReiges wrote: Basically i have a piece of code that runs in PM but has no device drivers. When entering PM i change PIC bases to catch exceptions and disable all device interrupts. When i need to read from a disk i return to RM, reconfigure PIC to BIOS values and call int 13h.
Two questions: one, are all of the required disk accesses reads, and would it be possible to perform them all during the RM build-up (along with retrieving things like memory size and video capabilities) prior to switching to PM initially, and two, are there any technical limitations that would prevent you from incorporating a least a minimal disk driver into your PM code? Either of these solutions would eliminate the need to switch back to RM, hopefully. I expect you've considered both of these options already, but it may be useful to review the reasons why they would or wouldn't be suitable solutions with us so we can brainstorm together.
Some of them are writes, and no, i can't read everything beforehand during RM buildup. The amount of data can be huge, several GB.
PM disk driver can be implemented but i am simply not there yet, maybe in the future. Besides, i've peeked around some boot loader code and NT loader for example uses the same approach - they run in PM switchig to RM for disk access and keyboard input. But i think they just disable interrupts completely when in PM and i couldn't find any PIC-related code so i assume they don't reconfigure it either.

Re: 8259A vector base reset hangs the system

Posted: Thu Nov 27, 2014 6:24 am
by Schol-R-LEA
InsoReiges wrote:Some of them are writes, and no, i can't read everything beforehand during RM buildup. The amount of data can be huge, several GB.
Gigabytes? Seriously? That's going to present a real problem, as the BIOS reads are limited to low memory. Assuming you could get your whole disk read code into the real mode HMA, you'd have a maximum of 640K to use as buffer space for reading. That means each GB of data will require a minimum of 1639 RM read/PM transfer cycles, with ten BIOS reads in each cycle because it can only read a 64K segment at a time. That is going to be very slow.

How is it that you are dealing with such a large volume of data in the kernel startup? That seems very unusual.
InsoReiges wrote:PM disk driver can be implemented but i am simply not there yet, maybe in the future.
Fair enough. No one can do everything at once. However, given the massive volume of data involved, I would make writing an efficient PM disk driver a priority.
InsoReiges wrote:Besides, i've peeked around some boot loader code and NT loader for example uses the same approach - they run in PM switchig to RM for disk access and keyboard input. But i think they just disable interrupts completely when in PM and i couldn't find any PIC-related code so i assume they don't reconfigure it either.
True, but they aren't dealing with the volume of data you are.

Re: 8259A vector base reset hangs the system

Posted: Tue Dec 30, 2014 7:26 am
by InsoReiges
Hi guys,

I thought i should tell you how that whole thing ended. I've coded up a real mode GDB stub and traced bios on this MB. It turned out that after some PIC resets cascade stopped working. IRQ's from master were happening but not from slave. No slave IRQs means no IRQ14 which this BIOS uses to update "disk task completed" BDA variable. No BDA flag means INT13h read timeout. Besides IRQ14 only IRQ0 is used periodically on this MB.

We've contacted ASUS to help us with this and, surprisingly, they seem to have acknowledged that as a HW/SMM/AMT bug. Unfortunately they also said that since this MB is EOL they will probably not deal with that.

After writing some more tests i've managed to write a workaround which reduces timeouts to 1 out of 1.000.000 attempts. I've also optimized / reduced our disk load significantly and together with a workaround everything seems to be working fine for now :)