Page 1 of 1

Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 6:44 pm
by CelestialMechanic
Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

I am working on an operating system. My host platform is eComStation 1.2 (an OS/2 distribution), my tool-chain is Open Watcom 1.2. I use bochs 2.2.6 on my development machine, and I also have VirtualBox 4.x on a Linux machine and VirtualBox 5.0 on a Windows 8.0 machine that I can sneakernet to.

My attempt at an operating system has finally reached the point where I have started to handle IRQs, the first being the PIT timer, IRQ0. My test program wrote the system time over and over to verify that it was being handled properly by the code I had then. At the moment the only handling I do of exceptions is to display a BSOD and halt. The test would run for an unpredictable amount of time and then error out with a page fault.

Here is what the BSOD told me: the error was occurring at one of two instructions in the WriteHexString routine. Examination of the assembly code showed that these instructions accessed memory at [ecx + esi]. The BSOD showed an address of F81024xx for ECX and F810000 for ESI, which added together explains F02024xx in CR2. According to the assembly, the compiler had chosen to use ECX for a pointer and ESI as the count (a strange choice), and somehow the count was being contaminated. The results were the same on all the emulators I tried.

My original version of the IRQ0 handler used PUSHA at the beginning and POPA at the end. I changed these into explicit pushes and pops of the registers and the problem went away.

Is there some reason that PUSHA and POPA are not reliable in protected mode and particularly in an interrupt handler? I'm well aware that the 80386 had problems with POPA and a standard work-around was to add a NOP after POPA. I tried that and it didn't work here. It is strange that it behaves the same under different emulators and host OSes.

Does anyone have any insight into this? All I can say is, "no wonder PUSHA and POPA are deprecated in 64-bit mode!".

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 6:58 pm
by alexfru
Are those pushad and popad actually?

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 7:32 pm
by Brendan
Hi,
CelestialMechanic wrote:My original version of the IRQ0 handler used PUSHA at the beginning and POPA at the end. I changed these into explicit pushes and pops of the registers and the problem went away.

Is there some reason that PUSHA and POPA are not reliable in protected mode and particularly in an interrupt handler? I'm well aware that the 80386 had problems with POPA and a standard work-around was to add a NOP after POPA. I tried that and it didn't work here. It is strange that it behaves the same under different emulators and host OSes.

Does anyone have any insight into this? All I can say is, "no wonder PUSHA and POPA are deprecated in 64-bit mode!".
If alexfru's suggestion (16-bit "pushaw" instead of 32-bit "pushad", where highest 16-bits of all registers aren't saved/restored) wasn't the problem; then I'd assume that when you replaced "pushad/popad" you adjusted other things to suit the different stack layout and fixed the real problem by accident. For example, maybe you were using "mov [esp+8*4],eax" to access a local variable (and it was wrong) and you had to change it to "mov [esp+5*4],eax" (and got it right), so the problem disappeared.


Cheers,

Brendan

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 7:54 pm
by CelestialMechanic
alexfru wrote:Are those pushad and popad actually?
Doesn't matter. They have the same encoding. The B bit of the descriptor in the GDT determines the size of the pushes and pops. I did not see any prefixes with PUSHA and POPA, by this time I'm only using 32-bit segments.

I have to add that this seemed to be a POM (Phase Of the Moon) sort of phenomenon -- I had no idea whether it would run for five minutes or fifty seconds before the BSOD.

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 8:05 pm
by CelestialMechanic
Brendan wrote:Hi,
{Snip!}
If alexfru's suggestion (16-bit "pushaw" instead of 32-bit "pushad", where highest 16-bits of all registers aren't saved/restored) wasn't the problem; then I'd assume that when you replaced "pushad/popad" you adjusted other things to suit the different stack layout and fixed the real problem by accident. For example, maybe you were using "mov [esp+8*4],eax" to access a local variable (and it was wrong) and you had to change it to "mov [esp+5*4],eax" (and got it right), so the problem disappeared.

Cheers,

Brendan
I did not adjust anything; the C-compiler still compiled as before. My only adjustment was to change PUSHA/POPA into explicit PUSHes and POPs in the proper sequence. My handler never accessed the stack, it just saved and (supposedly) restored everything that had been pushed onto it. And then there's the seemingly random length of time before the failure.

The exception handlers (such as they are, they just display the BSOD and halt for now) are just fine because I never used PUSHA/POPA.

At any rate, I have a nice timer system working that uses both the RTC and the PIT. The RTC interrupts occur once every second after the update and writes out the RTC time and date. The PIT interrupts about 100 times per second and updates the hundredths of a second part. I have the system time and date in BCD, where it will make an almost readable time stamp for my messaging system.

Celestial Mechanic

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 9:25 pm
by alexfru
If you're running in 32-bit mode and you see no prefixes before pusha/popa in the compiled code, then the problem is elsewhere. One thing though, popa does not pop (e)sp.

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 05, 2016 10:29 pm
by Brendan
Hi,
CelestialMechanic wrote:
Brendan wrote:Hi,
{Snip!}
If alexfru's suggestion (16-bit "pushaw" instead of 32-bit "pushad", where highest 16-bits of all registers aren't saved/restored) wasn't the problem; then I'd assume that when you replaced "pushad/popad" you adjusted other things to suit the different stack layout and fixed the real problem by accident. For example, maybe you were using "mov [esp+8*4],eax" to access a local variable (and it was wrong) and you had to change it to "mov [esp+5*4],eax" (and got it right), so the problem disappeared.
I did not adjust anything; the C-compiler still compiled as before. My only adjustment was to change PUSHA/POPA into explicit PUSHes and POPs in the proper sequence. My handler never accessed the stack, it just saved and (supposedly) restored everything that had been pushed onto it. And then there's the seemingly random length of time before the failure.

The exception handlers (such as they are, they just display the BSOD and halt for now) are just fine because I never used PUSHA/POPA.
As far as I can tell (ie. if it wasn't "pushaw vs. pushad") the problem was never PUSHA/POPA. Maybe it's a random/uninitiated/dodgy pointer, or the way you're calling (e.g.) some C code from your assembly stub, or something else. In any case; I'd also expect that whatever actually caused of the problem originally is still there and will cause more problems later (e.g. if/when anything else changes).
CelestialMechanic wrote:At any rate, I have a nice timer system working that uses both the RTC and the PIT. The RTC interrupts occur once every second after the update and writes out the RTC time and date. The PIT interrupts about 100 times per second and updates the hundredths of a second part. I have the system time and date in BCD, where it will make an almost readable time stamp for my messaging system.
On a scale from 1 to 10; where 1 means it's known to be unstable, 2 means it seems to work by accident, 8 means it's acceptable, 9 means it's acceptable and been in use/tested for years, and 10 means it's been formally proven correct; you're saying that currently you have a nice timer system that is somewhere between 1 to 2 on that scale (that's not even close to being acceptable)?


Cheers,

Brendan

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Thu Oct 06, 2016 6:53 pm
by Sik
alexfru wrote:One thing though, popa does not pop (e)sp.
End result would end up being the same though, wouldn't it? (・ω・`) (unless you decided to modify the address that held the old SP value expecting it to get used after POPA)

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Fri Oct 07, 2016 7:01 pm
by azblue
I kinda wanna continue Brandon's thought: Right now you have a problem which will probably be very easy to properly fix. Currently, it appears to have magically fixed itself -- which in reality likely means it's buried.

If you continue to work like this, somewhere down the line you'll have a dozen or more hidden bugs that'll cause your OS to crash randomly, and at that point it'll be nearly impossible to track down.

The bug is not fixed until you thoroughly understand why it works now and why it didn't work before.

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Sat Oct 08, 2016 2:47 am
by onlyonemac
Disable all other interrupts except the timer. See if it's still crashing "randomly".

Also, is your "system time" a real clock time or just a "number of ticks since startup" time?

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Mon Oct 10, 2016 6:47 pm
by CelestialMechanic
onlyonemac wrote:Disable all other interrupts except the timer. See if it's still crashing "randomly".

Also, is your "system time" a real clock time or just a "number of ticks since startup" time?
IRQ0 was the only interrupt I was handling at the time of the random crashes. I now have both IRQ0 and IRQ8 running.

My system time is now taken from the RTC once a second upon completion of the update and augmented with the PIT timer every hundredth of a second, also expressed in BCD. I also have number of ticks (in binary) since startup for both PIT and RTC.

I have since run the program for two hours in Bochs on my development machine and three hours bare metal on an ancient 486 machine and it operated perfectly, displaying the system time repeatedly. The 486 machine has 8 megs of RAM as mapped by INT 15H, AX=88H, operates at 33 MHz, uses the "fast" method of keyboard control of A20. One peculiarity is that it gives the wrong day of the week for a 21st century date. Could it be that the 20th century RTC chips had no provision for the day of the week algorithm beyond 1999? After all, the "century" register was most definitely an afterthought placed at register 32H (on most systems).

Still, I think that azblue has some good points about the origin and apparent "solution" of my problem. As I was tracking down the crashes, I also decided to fix the flickering of the display. For some reason I had my WriteHexString function place a space and a NUL character after the last hex character. What was I thinking? Let the caller take responsibility for any punctuation/separator characters, delimit with a NUL only. When I look at the code that the compiler emitted, the access using [ecx+esi] is gone.

This suggests to me that I should now test using PUSHA/POPA in IRQ0 (since it happens 100 times per second) in order to find out whether it was really to blame.

I wish to thank all for their suggestions; I'll let you know how my experiment turns out.

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Tue Oct 11, 2016 1:17 am
by Octocontrabass
CelestialMechanic wrote:This suggests to me that I should now test using PUSHA/POPA in IRQ0 (since it happens 100 times per second) in order to find out whether it was really to blame.
The problem has absolutely nothing to do with PUSHA/POPA. It's very likely that you're doing something else wrong in your interrupt handler, but it's hard to tell without seeing any of your code.

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Tue Oct 11, 2016 1:57 am
by MichaelFarthing
CelestialMechanic wrote: One peculiarity is that it gives the wrong day of the week for a 21st century date. Could it be that the 20th century RTC chips had no provision for the day of the week algorithm beyond 1999? After all, the "century" register was most definitely an afterthought placed at register 32H (on most systems).
.
I once had a machine with a similar problem. It may well not be an end of century problem but a leap day problem. The rule for leap years is every year divisible by 4, (except those divisible by 100, (except those divisible by 400)). Some machines seemed unaware of the last exception which applied to the year 2000 and which, using the rule above, was a leap year. My machine sailed through the millenium bug with no problems but then got itself in a pickle the following March.

Re: Is it Safe to Use PUSHA/POPA in Interrupt Handlers?

Posted: Wed Oct 12, 2016 7:59 pm
by CelestialMechanic
Octocontrabass wrote:
CelestialMechanic wrote:This suggests to me that I should now test using PUSHA/POPA in IRQ0 (since it happens 100 times per second) in order to find out whether it was really to blame.
The problem has absolutely nothing to do with PUSHA/POPA. It's very likely that you're doing something else wrong in your interrupt handler, but it's hard to tell without seeing any of your code.
I have since made the test: I replaced the various PUSHes/POPs with PUSHA/POPA in both my IRQ0 and IRQ8 handlers and everything is OK. I agree that PUSHA/POPA was probably not the problem.