Hi,
alix wrote:I have a simple question, if a kernel keeps track of time ticks using IRQ0 interrupts. Shouldn't disabling interrupts and re-enabling it again messes up the kernel timer?
There's 2 cases here. If the kernel disables IRQs for too long (more than the amount of time between timer ticks) then timer IRQs can be lost. If an IRQ is pending, then any second, third, fourth, etc. IRQs get ignored. This would cause drift. For example, if it happens a lot you might end up thinking that 40 seconds passed when 60 seconds passed.
The other case is IRQ latency causing timer IRQ jitter, which causes precision loss. For example, if you increase a "milliseconds since the epoc" counter every 2 milliseconds exactly, then that counter could be precise to within +/- 1 millisecond. However, if IRQs are disabled (postponed) when the IRQ should happen then that counter isn't updated at exactly the right time (e.g. only updated later, when interrupts are enabled and the postponed IRQ can occur), and your counter might only be accurate to within +/- 1.5 milliseconds.
Basically, as long as you don't disable IRQs for so long that you lose IRQs and cause drift, the best case precision of your counter would be "-(time_between_IRQs/2) to +(time_between_IRQs/2 + worst_case_length_of_time_IRQs_disabled)"; which is close enough to (but not strictly equivalent to) "+/- (time_between_IRQs/2 + worst_case_length_of_time_IRQs_disabled)".
A lot of things use elapsed time (e.g. "end_time - start_time"). For this case the error caused by precision loss occurs twice. For example, you might read "start_time" immediately before the counter is incremented and get a "start_time" value with a worst case error of "+(time_between_IRQs/2 + worst_case_length_of_time_IRQs_disabled)"; and then read the "end_time" immediately after the counter is incremented and get a value with a worst case error of "-(time_between_IRQs/2)"; and then subtract these values and end up with an elapsed time that has a worst case error of "-(time_between_IRQs/2) - (time_between_IRQs/2 + worst_case_length_of_time_IRQs_disabled)" or "-(time_between_IRQs + worst_case_length_of_time_IRQs_disabled)".
In a similar way you can read "start_time" immediately after the counter is incremented and get a "start_time" value with a worst case error of "-(time_between_IRQs/2)"; and then read the "end_time" immediately before the counter is incremented and get a value with a worst case error of ""+(time_between_IRQs/2 + worst_case_length_of_time_IRQs_disabled)"; and then subtract these values and end up with an elapsed time that has a worst case error of "+(time_between_IRQs/2 + worst_case_length_of_time_IRQs_disabled) - -(time_between_IRQs/2)" or "+(time_between_IRQs + worst_case_length_of_time_IRQs_disabled)".
This means that, for elapsed time, precision is actually "-(time_between_IRQs + worst_case_length_of_time_IRQs_disabled) to +(time_between_IRQs + worst_case_length_of_time_IRQs_disabled)"; which is equivalent to "+/- (time_between_IRQs + worst_case_length_of_time_IRQs_disabled)".
For example, if an IRQ is meant to occur every 1 ms and the worst case length of time IRQs are disabled is 0.25 ms; then your elapsed time calculations may be 1.25 ms too short or 1.25 ms too long.
Now; normally an OS makes guarantees, like "a delay will never be shorter than requested (but may be longer)". To provide this guarantee when your elapsed time calculation may be 1.25 too short you need to add an extra 1.25 ms to the delay just in case. This means that (for this case), if software asks for a delay of 2 ms it may actually get a delay that is between 2 ms and 3.5 ms (and won't get a delay between 0.75 ms and 3.25 ms).
Of course an OS shouldn't disable IRQs for very long at all, and for timers like the PIT the time between IRQs is very large relative to the worst case length that of time IRQs are disabled; which means that precision loss caused by IRQ latency/jitter is negligible and could be ignored (as the time between IRQs dominates precision). However; if you start looking at things like using the local APIC timer in "TSC deadline" mode (where the precision loss caused by the timer itself may be as little as 1 nanosecond for delays longer than some minimum length) the precision loss caused by IRQ latency/jitter can be several orders of magnitude higher than the precision loss caused by the timer hardware itself. What this means is that to provide extremely precise timing, interrupt latency/jitter can be extremely important.
alix wrote:Shouldn't the solution be to have a mask to disable all interrupts except IRQ0? Is there is a problem in critical sections (where we usually disable interrupt) to have IRQ0 enabled?
This might sound confusing at first; but "CLI" causes interrupts to be postponed (e.g. postponed until you do STI) and does not cause IRQs to be ignored/lost.
If you mask an IRQ and it occurs, then it is ignored/lost (and not just postponed). Masking all IRQs except IRQ0 will (eventually) cause IRQs to be lost, which will cause devices to stop working, which will cause the OS to grind to a halt (e.g. all processes waiting for devices, and all devices waiting for their ignored/lost IRQs to be handled).
For APICs there is a "Task Priority Register" which can be used to postpone (not ignore) lower priority IRQs. Sadly this doesn't work for PIC chips.
You can also do it yourself in software. For example; design your IRQ handlers so that they test if a "postpone IRQs" flag is set and if it is they set a corresponding "IRQ was postponed" flag and exit (without running the real IRQ handler). When you clear the "postpone IRQs" flag you check all those "IRQ was postponed" flags and retry any IRQ handlers that need it. In this case you'd set your own "postpone IRQs" flag instead of doing "CLI" and might never need to use "CLI" at all.
Cheers,
Brendan