OSDev.org

Posted: **Sat Jan 27, 2018 11:01 pm**

Are there circumstances in which an interrupt can fire even if it is gated? Of course not, right?

I have been writing a driver for the CMOS AIP in x86 and began observing strange interrupt behavior once I started doing CMOS-related things that required me to adjust the state of interrupts during certain blocks of code. Specifically, I wanted to disable _only_ the CMOS interrupt (irq #8) while getting the cached hardware time from a global structure (so the handler would not also try to modify it). It was then that I started getting int15s firing for an unknown reason. You'll notice that I registered a few interrupts in the prints with a PIC vector base of 32.

Since it only began once I started executing my code to read the CMOS registers, I started there. Eventually I narrowed it down to this small bit of offending code:

Code: Select all

void cmos_get_time(struct cmos_rtc_time *time)
{
    if(NULL != time){
        printk("\r   \r@");
        plat.irq_disable(8);
        printk("@");
        memcpy(time, &g_time_hw, sizeof(struct cmos_rtc_time));
        plat.irq_enable(8);
        printk("@");

        ...
    }
}

Note that I strategically placed the printk calls so could see where exactly the int15 was occurring. I've let multiple simultaneous QEMU sessions run my kernel for a while now and the int15 always arrives just after the irq_disable and before the printk (which does not use/adjust interrupts). It is also worth noting that the arrival of the spurious interrupt is non-deterministic. It appears to be a race condition of sorts, as sometimes it can take several minutes to happen or it may take only a few seconds. Here is a screenshot of the QEMU video:

: spurious_int.png (5.39 KiB) Viewed 3338 times

My int15 handler just prints the number of times the interrupt has occurred along with the current mask, ISR, and IRR registers. (Note that the ISR and IRR registers are read from both the master and slave at once and are returned as a 16-bit value). As shown, only IRQs 0, 1, and 2 are enabled and the ISR register shows IRQ2/pin2 (on the master) fired. Of course this indicates an IRQ came from the slave, which is reflected in the IRR register. In this case, the IRR register informs that IRQ 8 fired. Why would the IDT entry for IRQ 15 execute in this case? Am I improperly handling the cause where an interrupt is coming from the slave PIC? There is no instant where int15 is unmasked. I've tried inserting asserts to catch this case. My irq_disable function looks like this:

Code: Select all

void pic8259_setmask(uint16_t mask)
{
    pic_outb(PIC8259_MASTER_DATA, mask & 0xff);
    pic_outb(PIC8259_SLAVE_DATA, mask >> 8);
}

uint16_t pic8259_getmask(void)
{   
    return (pic_inb(PIC8259_SLAVE_DATA) << 8) | pic_inb(PIC8259_MASTER_DATA);
}

void pic8259_disable_irq(uint8_t irq)
{
    pic8259_setmask(pic8259_getmask() | (1 << irq));
}

Should the PIC be cleared of all pending interrupts when changing the mask? Unless I missed something in the 8259 datasheet I did not come across anything relevant to this problem.

Posted: **Sun Jan 28, 2018 11:15 am**

Altogether my code can be found here if anyone wants to take a look. The spurious interrupt handler (installed at IDT index 47) does not execute if I do not call the cmos_get_time function. Then, as stated above, the interrupt only occurs if I disable irq #8. Namely, if I comment out the disabling/enabling of interrupts within this function then a spurious interrupt never executes.

Posted: **Mon Jan 29, 2018 12:21 am**

Hi,

This is perfectly normal, and caused by a race condition in hardware. Essentially:

PIC chip sees there's an IRQ, and raises its INTR line to tell CPU that there's an IRQ
CPU masks the IRQ (your "plat.irq_disable(8);")
CPU sees that the PIC chip raised its INTR line and responds by asking PIC chip for an interrupt vector
PIC chip sees that the IRQ no longer exists (because it was masked) but must give CPU an interrupt vector, so it gives the CPU a "spurious IRQ"

There's a few performance problems here too.

Mostly; you can expect that all legacy IO ports (IO ports in the range 0x0000 to 0x03FF that correspond to "formerly ISA" devices, like the PIC chips, CMOS/RTC, etc) will take 1 microsecond for the CPU to access. For modern CPUs (running at "several GHz" speeds) 1 microsecond adds up to several thousand cycles, which is an extremely large amount of time.

This means that it would be significantly faster to just use "CLI" and "STI" to postpone all IRQs to avoid the extremely large amount of time that your "plat.irq_disable(8); then plat.irq_enable(8);" will cost. For your code (where you don't cache the PIC's current mask in RAM to avoid one IO port read) this is literally "a fraction of a cycle" vs. "2 * (2 * several thousand cycles)".

It also means that actually accessing the RTC/CMOS chip's IO ports each time you want to determine the current time (e.g. every time the file system wants a timestamp, possibly hundreds of times per second) is incredibly slow (mostly due to the IO ports accesses). Standard practice is to get the current time from the RTC once during boot (or even better, from firmware in a more generic way) and store it in RAM somewhere, then (after boot) either use the RTC's "update IRQ" to increment the time stored in RAM or use a much better (lower overhead, higher precision) timer (PIT, HPET, local APIC, ...) to keep track of time instead.

Also note that the IO port accesses are not the only reason why getting the time directly from the RTC is slow. You have to make sure that there isn't an "update in progress" that will cause problems (and it costs up to 1 second to ensure that an update won't begin while you're in the middle of reading RTC registers). In addition; all sane software keeps time in a simple standardized integer format (e.g. maybe something like "nanoseconds since 1970, UTC") so that (if/when it's being displayed, which might be many years after it was stored) it can be converted to whichever time-zone the end user happens to want. Unfortunately the RTC gives "broken down time", in BCD, with no leap second support, often as "local time" (without saying which time zone it is), and occasionally with "has it or hasn't it been adjusted for daylight savings?" ambiguity. This means that converting what the RTC gives you into what software actually wants is very complicated, and this conversion is relatively expensive all by itself.

Cheers,

Brendan

Posted: **Fri Feb 02, 2018 3:41 pm**

Thanks for the reply. This being considered "normal" behavior is a perfectly suitable answer for me. I was concerned there may be an issue due to a misunderstanding of how spurious interrupts were triggered. I was under the impression that such an interrupt was fired when: the PIC had an IRQ line asserted, it informed the CPU via its INTR line, but the PIC's IRQ line became unasserted before the CPU was able to acknowledge the original interrupt. This case is an example race condition in which the hardware changes state faster than the core can respond.

However, you have mentioned a case that I had not considered: a race condition in which the IRQ line is asserted and the CPU simultaneously masks off the IRQ just after the CPU's INTR line is asserted. Thus, the PIC must interpret a state in which the CPU has attempted to acknowledge the asserted INTR line but the IRQ that caused it is actually masked off. Technically this state is illegal (i.e. it never should have driven the INT line because the IRQ is masked). Therefore, the PIC responds with IRQ 15.

Posted: **Wed Feb 07, 2018 7:50 am**

Brendan wrote: There's a few performance problems here too.

This means that it would be significantly faster to just use "CLI" and "STI" to postpone all IRQs to avoid the extremely large amount of time that your "plat.irq_disable(8); then plat.irq_enable(8);" will cost. For your code (where you don't cache the PIC's current mask in RAM to avoid one IO port read) this is literally "a fraction of a cycle" vs. "2 * (2 * several thousand cycles)".

Thanks for pointing this out. I'll work on optimizing that code. I originally had plans to cache the state of the mask, but early interrupt problems must have pushed the thought to the wayside.

Brendan wrote: It also means that actually accessing the RTC/CMOS chip's IO ports each time you want to determine the current time (e.g. every time the file system wants a timestamp, possibly hundreds of times per second) is incredibly slow (mostly due to the IO ports accesses). Standard practice is to get the current time from the RTC once during boot (or even better, from firmware in a more generic way) and store it in RAM somewhere, then (after boot) either use the RTC's "update IRQ" to increment the time stored in RAM or use a much better (lower overhead, higher precision) timer (PIT, HPET, local APIC, ...) to keep track of time instead.

Great thought. Currently I was using only the update interrupt, but obviously it makes sense to use this handler to update the in-memory version of time. Also, how can time be retrieved from firmware? I thought about using the PIT, for example, but was unsure about the idea of consuming chipset timers for the sole purpose of a once-per-second tick to keep track of time when these are needed to more important things like scheduling. Of course, we do need a timer to help keep track of time. My thought was to use a less important timer for this purpose. However, perhaps one of the more high-precision timers can be used to also keep higher precision time (e.g. milliseconds or nanoseconds) that can be converted as necessary.

Brendan wrote: Also note that the IO port accesses are not the only reason why getting the time directly from the RTC is slow. You have to make sure that there isn't an "update in progress" that will cause problems (and it costs up to 1 second to ensure that an update won't begin while you're in the middle of reading RTC registers).

This is why I opted to use the "update finished" interrupt to avoid having to contend with various status flags.

Brendan wrote: In addition; all sane software keeps time in a simple standardized integer format (e.g. maybe something like "nanoseconds since 1970, UTC") so that (if/when it's being displayed, which might be many years after it was stored) it can be converted to whichever time-zone the end user happens to want. Unfortunately the RTC gives "broken down time", in BCD, with no leap second support, often as "local time" (without saying which time zone it is), and occasionally with "has it or hasn't it been adjusted for daylight savings?" ambiguity. This means that converting what the RTC gives you into what software actually wants is very complicated, and this conversion is relatively expensive all by itself.

I read the wiki article on time and recognized how difficult this problem can be. I certainly need to revisit how time is implemented in the kernel.

Posted: **Wed Feb 07, 2018 2:42 pm**

Hi,

pragmatic wrote:
Brendan wrote:It also means that actually accessing the RTC/CMOS chip's IO ports each time you want to determine the current time (e.g. every time the file system wants a timestamp, possibly hundreds of times per second) is incredibly slow (mostly due to the IO ports accesses). Standard practice is to get the current time from the RTC once during boot (or even better, from firmware in a more generic way) and store it in RAM somewhere, then (after boot) either use the RTC's "update IRQ" to increment the time stored in RAM or use a much better (lower overhead, higher precision) timer (PIT, HPET, local APIC, ...) to keep track of time instead.
Great thought. Currently I was using only the update interrupt, but obviously it makes sense to use this handler to update the in-memory version of time. Also, how can time be retrieved from firmware?

For BIOS, you'd use "int 0x1A, ah = 0x02 - get real time clock time" (which only provides similar information to the RTC itself). For UEFI you'd use "GetTime()" which is capable of telling you additional information (nanoseconds, which time zone it is) that BIOS and RTC lack.

pragmatic wrote:I thought about using the PIT, for example, but was unsure about the idea of consuming chipset timers for the sole purpose of a once-per-second tick to keep track of time when these are needed to more important things like scheduling. Of course, we do need a timer to help keep track of time.

If you define "counter" as something that counts and "timer" as something capable of generating an IRQ; then you only need a counter to keep track of time (ACPI's power management counter, HPET's "main count", CPU's TSC) and don't need a timer or IRQ. For things like scheduling and "nanosleep" it's the opposite - you don't need a counter but do need a timer.

For modern hardware the best choice is to use the TSC as a counter, and then use the local APIC's timer (in "TSC deadline mode") as a timer. For ancient hardware the best choice is probably to use the RTC's periodic IRQ (set to maybe 1 KHz/1 ms) to implement a counter in RAM, then use PIT as a timer.

However don't forget that there's a whole lot of other options in between (ACPI's PM counter, HPET, other local APIC timer modes) and that (if you want) you can use RTC as a timer or use PIT as a counter; so (unless you're limiting the OS to "only modern hardware" or "only ancient hardware") its best to use abstractions (e.g. every counter provides a "get time since boot in nanoseconds" function and every timer provides a "notify me when N nanoseconds have passed" function), with auto-detection and "selection logic" (e.g. that sets some function pointers for the actual "get time since boot in nanoseconds" function that was chosen) to decide the counter to use to keep track of time and to decide the timer/s to use for IRQs/notifications. That way the OS can automatically use the best time source/s that the computer (and OS) happens to support (and you can easily add support for other time sources whenever you like).

Cheers,

Brendan

OSDev.org

[SOLVED] Spurious interrupt firing after disabling another

[SOLVED] Spurious interrupt firing after disabling another

Re: Spurious interrupt firing after disabling another (in QE

Re: Spurious interrupt firing after disabling another (in QE

Re: Spurious interrupt firing after disabling another (in QE

Re: Spurious interrupt firing after disabling another (in QE

Re: Spurious interrupt firing after disabling another (in QE