Using APIC timer as a "system board" timer when HPET fails

gerryg400 · Post by **gerryg400** » Wed Dec 21, 2011 6:07 am

Rdos, have you tried Eclipse on Windows. It has good SVN support.

Combuster · Post by **Combuster** » Wed Dec 21, 2011 6:48 am

There is no good version control integration in the primary tools I use.

So what? It's not like any commercial IDE I have ever seen is exactly good at it...

rdos · Post by **rdos** » Thu Dec 22, 2011 4:54 am

The merge operation is (almost) complete. The problem with the branch-code is that it uses timestamp counter (a per core resource) to keep real time. The optimal solution is to use the HPET clock to get elapsed time, even if it doesn't support MSI-interrupts. If the HPET doesn't exist at all, the code should revert to PIT timer for elapsed time (speaker channel). When the HPET doesn't support MSI, and thus is potentially really buggy with it's interrupts, the local APIC timer would be used for timers. The local APIC timer could also be used for timers under special circumstances when the HPET is setup for main timers.

Edit: Now all three modes of operation work (shared APIC timer + PIT for elapsed time, shared APIC timer + HPET for elapsed time and APIC timer for preemption + HPET timers + HPET for elapsed time). A forth mode (PIT channel 0 for timers & preemption + PIT channel 2 for elapsed time) also works when no APIC is available on older/low-end systems with only PIC. Maybe I should search for the HPET on these systems as well and use it for elapsed time instead of PIT channel 2?

gravaera · Post by **gravaera** » Thu Dec 22, 2011 7:42 pm

These rdos threads...they always deliver jajajaja

Anyway, dropping in to say that git es #1, es best VCS, always commit, never lose data...from start staging always up, I get so much work done with git, es the best!

Brendan · Post by **Brendan** » Thu Dec 22, 2011 9:24 pm

Hi,

rdos wrote:Edit: Now all three modes of operation work (shared APIC timer + PIT for elapsed time, shared APIC timer + HPET for elapsed time and APIC timer for preemption + HPET timers + HPET for elapsed time). A forth mode (PIT channel 0 for timers & preemption + PIT channel 2 for elapsed time) also works when no APIC is available on older/low-end systems with only PIC. Maybe I should search for the HPET on these systems as well and use it for elapsed time instead of PIT channel 2?

You shouldn't really be limited to specific permutations.

During boot or kernel initialisation you should "discover" timers and counters, and create a list containing the capabilities and characteristics for each of them (e.g. if you can read the count, how precise it is, how much overhead is involved, if it can generate a fixed frequency IRQ, if it can be used for a "one shot" IRQ, if it is effected by S2 and S3 sleep states, etc). For each of the timers you'd also have function pointers for "back-end code" to do various things (set it up, read the count, configure it for fixed frequency IRQs, set the next "one shot" IRQ, etc).

Later during boot or kernel initialisation, you should use the list to determine which timers and counters are most suitable for which roles. If the kernel decides that the third timer in the list is good for "something", then it'd use the function pointers for the third timer in the list for "something". The kernel wouldn't need to know or care what the timer actually is (the function pointers acts as an effective abstraction layer).

For example, you might search the list to find the best counter to use for the master "real time" counter (e.g. something that lets you read the count, isn't effected by sleep states and doesn't need to be able to generate an IRQ at all - maybe HPET main counter or the ACPI timer, or maybe PIT or RTC if the corresponding back-end does some extra work); or you might search the list for per-CPU scheduler timers (something that supports "one shot" IRQs, that may be effected by sleep states, may not allow the count to be read, may not support "fixed frequency", etc - possibly local APIC timers, possibly one or more HPET timers, possibly PIT).

If a manufacturer invents a new type of timer, then you just add it to your list and let the kernel decide what to use it for. If you decide you want to add a new feature to the kernel (watchdog timer?) then you just search the list for a timer to suit the new role. If you find out that something doesn't work right on a specific chipset (e.g. maybe the ACPI counter is broken), then you might add a work-around in the code that creates the list of timers/counters (e.g. if chipset is XYZ then don't add the ACPI counter to the list). In all these cases the OS would automatically adjust.

Cheers,

Brendan

rdos · Post by **rdos** » Fri Dec 23, 2011 12:32 am

Brendan wrote:You shouldn't really be limited to specific permutations.

During boot or kernel initialisation you should "discover" timers and counters, and create a list containing the capabilities and characteristics for each of them (e.g. if you can read the count, how precise it is, how much overhead is involved, if it can generate a fixed frequency IRQ, if it can be used for a "one shot" IRQ, if it is effected by S2 and S3 sleep states, etc). For each of the timers you'd also have function pointers for "back-end code" to do various things (set it up, read the count, configure it for fixed frequency IRQs, set the next "one shot" IRQ, etc).

That sounds way to complicated. There currently are no other timers in main-stream PCs, so why make it more difficult than it is? With only HPET, PIT and APIC timer to choose between, there is no need to create lists or anything, and the usage rules can be coded into the APIC device-driver when that is available, and in the PIC device-driver when that is used.

Additionally, I've looked at ACPI-tables for several machines now, and they are not reliable in regards to HPET or PIT. Most BIOS developpers probably just copy them, and do not investigate how they are configured previous to copying the configurations.

Brendan · Post by **Brendan** » Fri Dec 23, 2011 1:47 am

Hi,

rdos wrote:That sounds way to complicated. There currently are no other timers in main-stream PCs, so why make it more difficult than it is? With only HPET, PIT and APIC timer to choose between...

Erm:

TSC
Local APIC timer (possibly including "TSC deadline mode")
Performance monitoring counters and/or IRQs
HPET main counter
HPET comparators
ACPI's "power management timer" (32-bit or 24-bit) counter
PIT channel 0
PIT channel 2
RTC periodic and/or update IRQ
Watchdog timer/s (e.g. the "WDAT" and "WDRT" ACPI tables)

Most of these may or may not exist; and most may or may not support different features when they do exist.

Then there's roles:

Some sort of counter to measure real time (need accuracy, precision would be nice, per-CPU would be nice, don't need an IRQ)
Some sort of timer to wake sleeping tasks (need accuracy, precision would be nice, per-CPU would be nice, need an IRQ)
Some sort of counter to measure how much time each task has used (precision would be nice, per-CPU would be really nice, low overhead would be nice, don't need an IRQ, don't really need accuracy)
Some sort of timer that the scheduler can use to know when a task has used all of the time it was given (some precision would be nice, accuracy doesn't matter much, don't need to be able to read the current count, do need an IRQ, "one shot" IRQ would be nice)
Some sort of timer to keep track of power management (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ, "one shot" IRQ would be nice)
(optional) Some sort of timer to use for "poor man's profiling" (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, "one shot" IRQ would be nice for pseudo-random delays)
(optional) Some sort of timer to use for a watchdog (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, fixed frequency is fine)

rdos wrote:Additionally, I've looked at ACPI-tables for several machines now, and they are not reliable in regards to HPET or PIT. Most BIOS developpers probably just copy them, and do not investigate how they are configured previous to copying the configurations.

In which way are the ACPI tables not accurate (and why don't other OSs like Windows and Linux have problems)?

Cheers,

Brendan

rdos · Post by **rdos** » Fri Dec 23, 2011 2:21 am

Brendan wrote: TSC

Highly unreliable on older CPUs (when it is present), and per-core, which makes it useless for keeping elapsed time. Has no IRQ.

Brendan wrote: Local APIC timer (possibly including "TSC deadline mode")

Affected by some types of power-management, but pretty useful for preemption. Not good for high precision timing due to possible effects of power-management.

Brendan wrote: Performance monitoring counters and/or IRQs

Better being free for performance measuring

Brendan wrote: HPET main counter

The best alternative for measuring elapsed time.

Brendan wrote: HPET comparators

The best alternative for high-precision timers. When they work. When the HPET doesn't support MSI-delivery it seems like it often malfunctions. The configuration information returned on some motherboards is incorrect regarding IRQ routings. The ACPI tables are not always correct either. Some report IRQs when they don't work, while some report no IRQ, but they still work.

Brendan wrote: ACPI's "power management timer" (32-bit or 24-bit) counter

HPET?

AFAIK, there is no garantee in ACPI that this is not the HPET, or some channel on HPET.

Brendan wrote: PIT channel 0
PIT channel 2

Both can be used for elapsed time or high-precision (us) timers.

Brendan wrote: RTC periodic and/or update IRQ

Can be used to synchronize elapsed time with real time. Not useful for anything else.

Brendan wrote: Watchdog timer/s (e.g. the "WDAT" and "WDRT" ACPI tables)

These are better left out of this.

Brendan wrote: Some sort of counter to measure real time (need accuracy, precision would be nice, per-CPU would be nice, don't need an IRQ)

No, should absolutely not be per-CPU, but per system. That means one of the PIT channels or HPET. TSC doesn't work, as it is both affected by power-management and is per-CPU. I have tried to use TSC for elapsed time, and it doesn't work. There is no reliable way to synchronize time between cores, especially not when TSCs start ticking at different frequences when power-management "kicks-in".

Brendan wrote: Some sort of timer to wake sleeping tasks (need accuracy, precision would be nice, per-CPU would be nice, need an IRQ)

This is what I refer to as "timers". This can be APIC timer, PIT channel 0 or HPET comparator. The APIC timer is per-CPU, so if it is used, timers needs to be per CPU. When using PIT channel 0 or HPET comparators, timers would be per-system. It might be possible to use combinations if both APIC timer and PIT channel 0 / HPET comparator is available.

Brendan wrote: Some sort of counter to measure how much time each task has used (precision would be nice, per-CPU would be really nice, low overhead would be nice, don't need an IRQ, don't really need accuracy)

I use elapsed time for this. When the task is started, the elapsed counter is saved, and then when a new task is scheduled, elapsed time is read again, and then subtracted from the saved value. There is no need for a separate hardware resource for this.

Brendan wrote: Some sort of timer that the scheduler can use to know when a task has used all of the time it was given (some precision would be nice, accuracy doesn't matter much, don't need to be able to read the current count, do need an IRQ, "one shot" IRQ would be nice)

APIC timer, if available, is most suitable for this. If timers also use APIC timer, there is a need to combine the timeouts, but this works. If APIC timer is not available, HPET or PIT channel 0 can be used (most often combined with timer function). The most effecient allocation is to use APIC timer for preemption and HPET comparator for timers.

Brendan wrote: Some sort of timer to keep track of power management (don't really need precision or accuracy, don't need to be able to read the
current count, do need an IRQ, "one shot" IRQ would be nice)

I'd use a normal timer (as of above) for this. It doesn't need its own hardware resource.

Brendan wrote: (optional) Some sort of timer to use for "poor man's profiling" (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, "one shot" IRQ would be nice for pseudo-random delays)

This is more or less also the normal timers I have.

Brendan wrote: (optional) Some sort of timer to use for a watchdog (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, fixed frequency is fine)

Same as above. This is a normal timer.

Given the complex selection rules as of above, I doubt it is possible to write something generic that selects the best resources. Additionally, for such an algorithm to work there is a need to know several variables:

1. Does the hardware resource work?
2. Does the hardware resource trigger the IRQs it is supposed to trigger?
3. Does power-management affect frequencies?

These can at best be tested, but I'm not sure how to test if power-management will affect frequencies. If a resource is per-system, and leagacy, it probably won't be affected by power-management, but these are proabilities not parameters that are easily input into an algorithm.

rdos · Post by **rdos** » Fri Dec 23, 2011 3:07 am

Brendan wrote:In which way are the ACPI tables not accurate (and why don't other OSs like Windows and Linux have problems)?

On one particular machine, my 2-core AMD Athlon, the _CRS for _MEM is broken and makes ACPICA malfunction (it tries to output "ACPI_STACK_UNDERFLOW", but in the process of doing this makes several random writes via uninitialized pointers. This same machine also reports that HPET support IRQ 0 and IRQ 8 (which doesn't work). The comparator itself report that any interrupt routing is supported (obviously not correct). RDOS only works when APIC timer is used both for preemption and timers. Neither HPET nor PIT has functional IRQs. HPET can be used for elapsed time though, because the counter is running. It is the IRQ mappings that are wrong. The HPET doesn't support MSI delivery. Additionally, this machine frequently hangs-up during reboots, and is acting strange. It has Windows XP installed.

Brendan · Post by **Brendan** » Fri Dec 23, 2011 5:12 am

Hi,

rdos wrote:
Brendan wrote: TSC
Highly unreliable on older CPUs (when it is present), and per-core, which makes it useless for keeping elapsed time. Has no IRQ.

Insanely awesome (extremely low overhead and extremely high precision) on recent CPUs though. You'd want to detect TSC capabilities and use TSC if it's suitable.

rdos wrote:
Brendan wrote: Local APIC timer (possibly including "TSC deadline mode")
Affected by some types of power-management, but pretty useful for preemption. Not good for high precision timing due to possible effects of power-management.

Very nice if it's not effected by power management (but still very useful even when it is effected by power management). You'd want to detect local APIC timer capabilities and use it if it's suitable.

rdos wrote:
Brendan wrote: Performance monitoring counters and/or IRQs
Better being free for performance measuring

It's not like the performance monitoring stuff is actually used for performance monitoring most of the time anyway. If there's no other choice, I'd rather use performance monitoring counters for timing than be unable to boot. You'd want to detect performance monitoring capabilities and use it if it's more suitable than something else.

rdos wrote:
Brendan wrote: HPET main counter
The best alternative for measuring elapsed time.

Second best (TSC on recent CPUs is much better).

rdos wrote:
Brendan wrote: HPET comparators
The best alternative for high-precision timers. When they work. When the HPET doesn't support MSI-delivery it seems like it often malfunctions. The configuration information returned on some motherboards is incorrect regarding IRQ routings. The ACPI tables are not always correct either. Some report IRQs when they don't work, while some report no IRQ, but they still work.

Probably second best (local APIC is better for high-precision timers). In cases where the local APIC timer gets messed up due to sleep states, you'd still want to use local APIC timers, but when the CPU enters a sleep state migrate the work to HPET until the CPU comes back out of the sleep state. You'd want to detect HPET capabilities and use it if it's more suitable than something else (whether it's "use it on it's own" or "use it as a backup when CPUs are in sleep states").

rdos wrote:
Brendan wrote: ACPI's "power management timer" (32-bit or 24-bit) counter
HPET?

AFAIK, there is no garantee in ACPI that this is not the HPET, or some channel on HPET.

ACPI's "power management timer" is *not* HPET. It's a counter that is increased at a rate of about 3.5795 MHz. HPET typically runs at 10 MHz or more, which makes it separate. Of course it's possible (likely even) that some chipsets have a central 14.31818 MHz clock that is used to drive HPET directly, used via. a "divide by 4" to drive ACPI's counter, and used via. a "divide by 12" to drive the PIT (but that does not mean "HPET = ACPI's counter = PIT" - they're still all separate devices with separate control logic and capabilities).

rdos wrote:
Brendan wrote: PIT channel 0
PIT channel 2
Both can be used for elapsed time or high-precision (us) timers.

Yes, but they're slow and ugly (e.g. "legacy IO port" accesses to read the current count); and for channel 2 the thing can roll over several times without you knowing.

rdos wrote:
Brendan wrote: RTC periodic and/or update IRQ
Can be used to synchronize elapsed time with real time. Not useful for anything else.

For old systems (where you've only got PIT and RTC and nothing else), you'd want to use PIT for the scheduler's timer (in "one shot" mode) and RTC for everything else.

rdos wrote:
Brendan wrote: Watchdog timer/s (e.g. the "WDAT" and "WDRT" ACPI tables)
These are better left out of this.

Why? Is your OS a general purpose desktop thing that doesn't have to care if it locks up completely due to a hardware fault (rather than some sort of embedded system that might be used for banking)?

rdos wrote:
Brendan wrote: Some sort of counter to measure real time (need accuracy, precision would be nice, per-CPU would be nice, don't need an IRQ)
No, should absolutely not be per-CPU, but per system. That means one of the PIT channels or HPET. TSC doesn't work, as it is both affected by power-management and is per-CPU. I have tried to use TSC for elapsed time, and it doesn't work. There is no reliable way to synchronize time between cores, especially not when TSCs start ticking at different frequences when power-management "kicks-in".

Don't be silly - on recent systems (where TSC is guaranteed to run at a fixed frequency - e.g. the "TSC invariant" CPUID flag) TSC would be perfect for this (but synchronised to the RTC occasionally). For situations where TSC ticks at different frequencies on different CPUs, just synchronise more often to ensure that the TSC is always within an acceptable amount of error.

rdos wrote:
Brendan wrote: Some sort of timer to wake sleeping tasks (need accuracy, precision would be nice, per-CPU would be nice, need an IRQ)
This is what I refer to as "timers". This can be APIC timer, PIT channel 0 or HPET comparator. The APIC timer is per-CPU, so if it is used, timers needs to be per CPU. When using PIT channel 0 or HPET comparators, timers would be per-system. It might be possible to use combinations if both APIC timer and PIT channel 0 / HPET comparator is available.

Sadly, Linux does something like this too - take a high precision timer like the local APIC timer, and use it as a general purpose timing thing so that you can bury it under millions of networking timeouts (that have no need for high precision). It's stupid because there's always a minimum amount of time between delays, and when there's too many things using the same timer you have to group things together to avoid the "minimum time between delays" problem. For example, if "foo" should happen in 1000 ns and "bar" should happen in 1234 ns, then you can't setup a 234 ns delay and have to bunch them together, and "foo" ends up happening 234 ns too late. Things that don't need such high precision should use a completely different timer to avoid screwing up the precision for things that do need high precision.

rdos wrote:
Brendan wrote: Some sort of counter to measure how much time each task has used (precision would be nice, per-CPU would be really nice, low overhead would be nice, don't need an IRQ, don't really need accuracy)
I use elapsed time for this. When the task is started, the elapsed counter is saved, and then when a new task is scheduled, elapsed time is read again, and then subtracted from the saved value. There is no need for a separate hardware resource for this.

What is "elapsed time"? I'd use TSC for this if I could (and fall back to HPET if TSC can't be used, and fall back to ACPI's counter if both HPET and TSC can't be used).

rdos wrote:
Brendan wrote: Some sort of timer that the scheduler can use to know when a task has used all of the time it was given (some precision would be nice, accuracy doesn't matter much, don't need to be able to read the current count, do need an IRQ, "one shot" IRQ would be nice)
APIC timer, if available, is most suitable for this. If timers also use APIC timer, there is a need to combine the timeouts, but this works. If APIC timer is not available, HPET or PIT channel 0 can be used (most often combined with timer function). The most effecient allocation is to use APIC timer for preemption and HPET comparator for timers.

The most efficient way would be using performance monitoring counters for the scheduler, local APIC timer for high precision "sleep()", and HPET or PIT or TSC for low precision timing (e.g. network packet timeouts).

rdos wrote:
Brendan wrote: Some sort of timer to keep track of power management (don't really need precision or accuracy, don't need to be able to read the
current count, do need an IRQ, "one shot" IRQ would be nice)
I'd use a normal timer (as of above) for this. It doesn't need its own hardware resource.

Same "jack of all trades" problem (screwing up high precision timing by using it for low precision timing).

rdos wrote:
Brendan wrote: (optional) Some sort of timer to use for "poor man's profiling" (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, "one shot" IRQ would be nice for pseudo-random delays)
This is more or less also the normal timers I have.

Your normal "generic timer" stuff uses NMI? Sounds seriously painful to me.

rdos wrote:
Brendan wrote: (optional) Some sort of timer to use for a watchdog (don't really need precision or accuracy, don't need to be able to read the current count, do need an IRQ and something capable of generating an NMI would be nice, fixed frequency is fine)
Same as above. This is a normal timer.

Yes - same as above (seriously flawed).

rdos wrote:Given the complex selection rules as of above, I doubt it is possible to write something generic that selects the best resources.

Given the complex selection rules above, I doubt it's possible to avoid writing something that selects the best resources.

The only other alternative that I can think of is telling unsuspecting end users "I'm not smart enough to figure it out, and even though you probably know less than me, I'm making you solve my design failure via. compile time idiocy".

rdos wrote:Additionally, for such an algorithm to work there is a need to know several variables:

1. Does the hardware resource work?
2. Does the hardware resource trigger the IRQs it is supposed to trigger?
3. Does power-management affect frequencies?

These can at best be tested, but I'm not sure how to test if power-management will affect frequencies. If a resource is per-system, and leagacy, it probably won't be affected by power-management, but these are proabilities not parameters that are easily input into an algorithm.

The conservative way would be to assume the answer to all those questions is "no" unless you know otherwise. For S4 (hybernate) and S3 (suspend) you can assume all timers lose their state (as you're effectively turning everything off, except RAM for S3) and reinitialise your timing (e.g. starting from getting the new time and date from the RTC) when you come out of S3/S4. For S2 only the CPUs are turned off, so you should never need to worry about PIT, RTC, HPET, ACPI counter (and only have to worry about TSC and local APIC). For TCS and local APIC behaviour, you can use CPUID (either the "TSC invarient" flag, or the "vendor

model"). That probably solves 99% of the problems, and the remaining problems can easily be handled with some special case work-arounds if/when they occur.

Cheers,

Brendan

Brendan · Post by **Brendan** » Fri Dec 23, 2011 5:20 am

Hi,

rdos wrote:
Brendan wrote:In which way are the ACPI tables not accurate (and why don't other OSs like Windows and Linux have problems)?
On one particular machine, my 2-core AMD Athlon, the _CRS for _MEM is broken and makes ACPICA malfunction (it tries to output "ACPI_STACK_UNDERFLOW", but in the process of doing this makes several random writes via uninitialized pointers. This same machine also reports that HPET support IRQ 0 and IRQ 8 (which doesn't work). The comparator itself report that any interrupt routing is supported (obviously not correct). RDOS only works when APIC timer is used both for preemption and timers. Neither HPET nor PIT has functional IRQs. HPET can be used for elapsed time though, because the counter is running. It is the IRQ mappings that are wrong. The HPET doesn't support MSI delivery. Additionally, this machine frequently hangs-up during reboots, and is acting strange. It has Windows XP installed.

So, it could be a mistake in the firmware's AML (but Windows XP doesn't have any problem), it could be a bug in ACPICA (would be nice to try Linux on the machine to rule that out), or it could be a bug in your "just slapped ACPICA and SMP support in last month and still having lots of problems with everything" code?

I guess you're right - a mistake in the firmware's AML is the most likely cause...

Cheers,

Brendan

rdos · Post by **rdos** » Fri Dec 23, 2011 1:18 pm

Brendan wrote:Insanely awesome (extremely low overhead and extremely high precision) on recent CPUs though. You'd want to detect TSC capabilities and use TSC if it's suitable.

The TSC is only useful if you peg threads to cores, and even then you destroy the high precision if you have too long IRQs or other code running with disabled interrupts for extended periods of time. In a normal multitasking OS, this means you won't get higher precision with TSC than your worse interrupt latency. If the thread then is migrated to a new core, and reads its TSC, it will have erratic behavior if TSCes are not well synchronized. You need frequent IPIs to synchronize, preferably with NMI delivery to minimize latency.

Brendan wrote:It's not like the performance monitoring stuff is actually used for performance monitoring most of the time anyway. If there's no other choice, I'd rather use performance monitoring counters for timing than be unable to boot. You'd want to detect performance monitoring capabilities and use it if it's more suitable than something else.

Agreed, but I think there are other (better) choices. At least I know of no platform that doesn't have either PIT or HPET / APIC timer, but it is possible such might exist.

Brendan wrote:Yes, but they're slow and ugly (e.g. "legacy IO port" accesses to read the current count); and for channel 2 the thing can roll over several times without you knowing.

Not in my design. The preemption timer is set to 1ms, so there is no chance that channel 2 will roll over.

Brendan wrote:For old systems (where you've only got PIT and RTC and nothing else), you'd want to use PIT for the scheduler's timer (in "one shot" mode) and RTC for everything else.

No. You would use PIT channel 0 for timers / preemption and PIT channel 2 for elapsed time. The RTC is just too slow to be useful when the system tic is 1 / 1.193 us. That means you cannot use the legacy speaker, but it is not very useful anyway.

Brendan wrote:Why? Is your OS a general purpose desktop thing that doesn't have to care if it locks up completely due to a hardware fault (rather than some sort of embedded system that might be used for banking)?

I have a very sophisticated software watchdog timer that in practise takes care of all software-related faults. It is installed in the production release, and will reboot on any fault, including kernel panics. In practise, this is all I need. I have yet to encounter a situation in production stage where this is not enough. We built a dedicated hardware watchdog, but it caused more problems than it solved problems, so we no longer have it.

Brendan wrote:Sadly, Linux does something like this too - take a high precision timer like the local APIC timer, and use it as a general purpose timing thing so that you can bury it under millions of networking timeouts (that have no need for high precision). It's stupid because there's always a minimum amount of time between delays, and when there's too many things using the same timer you have to group things together to avoid the "minimum time between delays" problem. For example, if "foo" should happen in 1000 ns and "bar" should happen in 1234 ns, then you can't setup a 234 ns delay and have to bunch them together, and "foo" ends up happening 234 ns too late. Things that don't need such high precision should use a completely different timer to avoid screwing up the precision for things that do need high precision.

I think the major difference between RDOS and Linux is how ISRs and timer-callbacks are coded. In RDOS, you should keep both ISRs and timer-callbacks short. A typical ISR and / or timer callback only consists of a signal to wake a server-thread, along with clearing some interrupt conditions. User-apps cannot use timers directly at all (they are kernel-only and run as ISRs). Since timer-callbacks are generally shorter than the overall interrupt latency, mixing precision timers is not a problem. You would not gain anything by using separate hardware for high precision timers as it is the interrupt latency that determines response times, not the resolution of the timer. In order to get ns resolution for timed events, it is necesary to run on a dedicated core without preemption and interrupt load.

Brendan wrote:What is "elapsed time"? I'd use TSC for this if I could (and fall back to HPET if TSC can't be used, and fall back to ACPI's counter if both HPET and TSC can't be used).

Elapsed time is how many tics has elapsed since the system started (simply explained, but not entirely true). A tic is one period on the PIT, which is convinient since 2^32 tics is one hour. I thus use 8 bytes to represent elapsed time. Elapsed time is also related to real time. To convert between elapsed time and real time you simply add an offset, which can be changed by setting real time. Elapsed time cannot be changed, but instead is garanteed to increase monotonly. When timed-waits are used, or timers are started, they use elapsed time. This also means that if you want a timer to fire exactly one time per second, you simply add how many tics there are on a second from the previous timeout, and start a timer. In order to have sub-micro second resolution for real time, the RTC update int is used to synchronize the update of the RTC with the real time counter. At boot-up, the setting in the RTC is loaded as elapsed time, so thus normally the difference between elapsed time and real time is zero.

Brendan wrote:Same "jack of all trades" problem (screwing up high precision timing by using it for low precision timing).

Not true since timer callbacks are ISRs, and are only allowed to clear some hardware conditions and signal a thread.

Brendan wrote:Your normal "generic timer" stuff uses NMI? Sounds seriously painful to me.

No. I have reserved NMI for the crash debugger. When the scheduler hits a fatal error, it will send NMI to all other cores to freeze them, regardless if they have interrupts enabled or not. Thus, NMI is not available.

Brendan wrote:The only other alternative that I can think of is telling unsuspecting end users "I'm not smart enough to figure it out, and even though you probably know less than me, I'm making you solve my design failure via. compile time idiocy".

Currently, users needs to specify which device-drivers to load for their hardware, but I could imagine I could change this if necesary to detect the available hardware and auto-create the configuration file. The kernel image is not built with compile-time switches, or by linking modules. It is created with a command-line tool that writes an image file that contains the separately compiled device-driver files, along with ordinary files, settings and autostarts. This file can even be built by software, as is done when we update the OS remotely.

So, if the user loads the PIC device-driver, it will look for PITs for timers / elapsed time, as those are commonly found on older hardware without APIC. If the user loads the APIC device-driver, it would make different choices, selecting between APIC timer, HPET or PIT. The only choice users have is to select which interrupt controller is available.

rdos · Post by **rdos** » Fri Dec 23, 2011 1:44 pm

Brendan wrote:So, it could be a mistake in the firmware's AML (but Windows XP doesn't have any problem), it could be a bug in ACPICA (would be nice to try Linux on the machine to rule that out), or it could be a bug in your "just slapped ACPICA and SMP support in last month and still having lots of problems with everything" code?

I guess you're right - a mistake in the firmware's AML is the most likely cause...

You forget that I can debug ACPICA at source level as if it was an ordinary application. Besides, there are no SMP conditions here as it is a single thread (and when done at boot-time, only runs on BSP) that runs the ACPI initialization. I know exactly where and why it faults. All the fields of Walkstate, except Next (which contains junk) are NULLs, and the status code is 13 (ACPI_STACK_UNDERFLOW).

To answer the question about Linux, no, it doesn't work. When I boot the live Mandriva 2011 DVD, it simply locks-up. This doesn't say if it is the video-BIOS problem (the code to switch mode faults in V86 mode on RDOS), or if it is related to ACPI.

rdos · Post by **rdos** » Fri Dec 23, 2011 4:55 pm

ACPICA seems to trash its own environment when reading the _CRS so badly that I cannot seem to provide a fix for it. The only fix that works 100% is to exclude objects which contains "MEM" from being evaluated.

After this is done, the device-list looks like this for the 2-core AMD machine:

Code: Select all

\_SB_.PCI0
    IO: 0CF8-0CFF

\_SB_.PCI0.LPC0.PMIO
    IO: 0B10-0B1F
    IO: 0B00-0B0F
    IO: 4210-4217
    IO: 4000-40FE
    IO: 0CD4-0CDF
    IO: 0CD2-0CD3
    IO: 0CD0-0CD1
    IO: 0C6F-0C6F
    IO: 0C6C-0C6D
    IO: 0C50-0C52
    IO: 0C14-0C14
    IO: 0C00-0C01
    IO: 04D6-04D6
    IO: 040B-040B
    IO: 0228-022F
    IO: 4100-411F

\_SB_.PCI0.LPC0.LNKA
    IRQ: 3,  sharable,  high level 

\_SB_.PCI0.LPC0.LNKB
    IRQ: 11,  sharable,  high level 

\_SB_.PCI0.LPC0.LNKC
    IRQ: 5,  sharable,  high level 

\_SB_.PCI0.LPC0.LNKD
    IRQ: 10,  sharable,  high level 

\_SB_.PCI0.LPC0.LNKE
    IRQ: 0,  sharable,  high level 

\_SB_.PCI0.LPC0.LNKF
    IRQ: 0,  sharable,  high level 

\_SB_.PCI0.LPC0.LNK0
    IRQ: 11,  sharable,  high level 

\_SB_.PCI0.LPC0.LNK1
    IRQ: 0,  sharable,  high level 

\_SB_.PCI0.LPC0.PIC_
    IO: 00A0-00A1
    IO: 0020-0021
    IRQ: 2,  exclusive,  edge 

\_SB_.PCI0.LPC0.DMA1
    IO: 00C0-00DF
    IO: 0094-009F
    IO: 0080-0090
    IO: 0000-000F

\_SB_.PCI0.LPC0.TMR_
    IO: 0040-0043

\_SB_.PCI0.LPC0.HPET
    Mem: FED00000-FED003FF
    IRQ: 8,  exclusive,  edge 
    IRQ: 0,  exclusive,  edge 

\_SB_.PCI0.LPC0.RTC_
    IO: 0070-0073

\_SB_.PCI0.LPC0.SPKR
    IO: 0061-0061

\_SB_.PCI0.LPC0.COPR
    IO: 00F0-00FF
    IRQ: 13,  exclusive,  edge 

\_SB_.PCI0.SYSR
    IO: 0220-0225
    IO: 04D0-04D1
    IO: 00E0-00EF
    IO: 00A2-00BF
    IO: 0091-0093
    IO: 0074-007F
    IO: 0065-006F
    IO: 0062-0063
    IO: 0044-005F
    IO: 0022-003F
    IO: 0010-001F

\_SB_.PCI0.FDC0
    IO: 03F7-03F7
    IO: 03F0-03F5
    IRQ: 6,  exclusive,  edge 

\_SB_.PCI0.UAR1
    IO: 03F8-03FF
    IRQ: 4,  exclusive,  edge 

\_SB_.PCI0.UAR2
    IO: 0000-0007
    IRQ: 0,  exclusive,  edge 

\_SB_.PCI0.LPT1
    IO: 0378-037F
    IRQ: 7,  exclusive,  edge 

\_SB_.PCI0.ECP1
    IO: 0778-077B
    IO: 0378-037F
    IRQ: 7,  exclusive,  edge 

\_SB_.PCI0.PS2M
    IRQ: 12,  exclusive,  edge 

\_SB_.PCI0.PS2K
    IO: 0064-0064
    IO: 0060-0060
    IRQ: 1,  exclusive,  edge 

\_SB_.PCI0.PSMR
    IO: 0064-0064
    IO: 0060-0060

\_SB_.PCI0.EXPL
    Mem: E0000000-EFFFFFFF

As can be seen, it reports that the PIT exists, but doesn't have any interrupts. It also reports that HPET has both IRQ 0 and 8.

Cognition · Post by **Cognition** » Fri Dec 23, 2011 5:00 pm

The walkstate being nulled or containing junk is a pretty good indicator there's a bug in either ACPICA or the OS specific code it depends on, it's highly unlikely the AML code would be messing up the internal state of the interpreter like that. Even if it were, you should be able to track it through the OS hooks ACPICA depends on. If you can get any other major OS running on the platform you can probably dump the DSDT and SSDTs (if there are any) with iasl just to make sure the AML code is sane.

OSDev.org

Using APIC timer as a "system board" timer when HPET fails

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai

Re: Using APIC timer as a "system board" timer when HPET fai