What cores do when they are idle

rdos · Post by **rdos** » Sat Dec 10, 2011 8:02 am

I have had some success in getting 4 out of 6 cores booting on my AMD now, but I run into the same issue as on my 4-core AMD. I cannot just let all cores execute "hlt" when idle, because then the APIC timer will not fire as expected. I also have somewhat strange timing on my 2-core portable when running our terminal application.

The problem seems easy to define:

When the null loop for four cores looks like this the APIC timer will malfunction:

Code: Select all

null_loop:
    hlt
    jmp null_loop

However, if I add a fifth core that does this everything works as expected:

Code: Select all

    cli
stopl:
    jmp stopl

So when I start a fifth core, and let it busy-loop with interrupts disabled, the APIC timer on other cores suddenly work as expected.

On older PCs, using "hlt" when idle will conserve power, but on modern PCs (at least with AMD CPUs), this seems to interfere with power-management and APIC timer. I'm quite reluctant to change the null-loop to only busy loop, or perhaps use the pause instruction, but how else would I get timely interrupts from the APIC timer unit?

Another potential way to solve this would be to turn-off cores when load is low, and also inhibit them from being woken-up by ints, but regardless of this, it seems like at least one core must busy-loop when idle in order for APIC timer to work properly.

A third solution might be to let one core (probably BSP) maintain all timers and use the legacy timer. This is not an attractive option, but at least the legacy-timer always keeps it's frequency, and if only one core keeps time this would simplify time / timer management. Additionally, in this scenario, BSP should not receive device-interrupts in this scenario, rather it should only run ordinary code and handle timers. IPIs could be used from other cores when timers are started and stopped in order to reload timers on BSP. Also, since the legacy timer is a global resource, reading out current time could be done from any core, provided spinlocks are used to protect the timer.

A related question is how to manage an "interrupt on lowest priority" scheme when some cores are sleeping or excluded. Is this possible to do with destination specifications, or is it better done by setting TPR to the maximum value on cores that should not be interrupted?

Brendan · Post by **Brendan** » Sat Dec 10, 2011 8:40 am

Hi,

There's a lot of different factors here - it would be nice to know which AMD CPU and which chipset, and if the same problem occurs when ACPICA is disabled (or enabled).

It might be possible that you're having problems with AMD's "C1E" (or "C1 Enhanced") state. For some AMD CPUs and some motherboards, when all CPUs are doing HLT (in the normal HLT/C1 state) the firmware shifts them to the "C1E" state and shuts down more of the physical CPU to save more power, including shutting down the local APIC timer's clock. Best way to tell would be to dig into the datasheet for the specific CPU (and find out if C1E applies to your CPU, and which MSR to use to detect if the BIOS/firmware has enabled C1E or not).

Cheers,

Brendan

rdos · Post by **rdos** » Sat Dec 10, 2011 9:08 am

Brendan wrote:There's a lot of different factors here - it would be nice to know which AMD CPU and which chipset, and if the same problem occurs when ACPICA is disabled (or enabled).

The problem exists regardless if the ACPICA driver is loaded (and enabled) or not. I know because I had to add the idle loop without a hlt already on my 4-core AMD, which was before I had the ACPICA driver.

My current motherboard is an Asus M5A99X EVO with a phenom II X6 1055T CPU. The 4-core machine has a Gigabyte 880GA-UD3H motherboard, and an Athlon II X4 CPU.

Interestingly, this also explains why my current computer initially appeared hanged when TPR for each core was set to maxmimum. The CPU then entered suspended state and stopped the APIC timers that were programmed to fire. Because other interrupts were locked out, nothing more happened.

Brendan wrote:It might be possible that you're having problems with AMD's "C1E" (or "C1 Enhanced") state. For some AMD CPUs and some motherboards, when all CPUs are doing HLT (in the normal HLT/C1 state) the firmware shifts them to the "C1E" state and shuts down more of the physical CPU to save more power, including shutting down the local APIC timer's clock. Best way to tell would be to dig into the datasheet for the specific CPU (and find out if C1E applies to your CPU, and which MSR to use to detect if the BIOS/firmware has enabled C1E or not).

I'd prefer to have something that works without involving chipset specifics. The more I think about it, the more sense it makes to use a motherboard resource to keep time rather than an erratic per-CPU resource. The only alternative to PIT seems to be HPET. What speeks in favor of using PIT is that the resolution of timers is defined by the resolution of the PIT, so using faster timers won't add any significant improvements. The major problem with the PIT is the cludgy 8-bit IO-interface, but I suspect all recent motherboards implement this in PCI-space, and not in the slow AT-bus space.

The only thing that APIC-timer seems to be perfect for is as an preemption timer. Because when the APIC timer is used for preemption, the code executed will not be hlts, so it will not be stopped. Preemption is also per core, which makes it ideal for the APIC-timer.

I don't like to do a major re-arrange in the scheduler again, but it seems to be necesary in order to solve the timing issues.

rdos · Post by **rdos** » Sat Dec 10, 2011 10:05 am

Looking at the code, I've decided that it no longer is a major redesign (with major complications) to only use the APIC timer for scheduler timeout, while using a motherboard resource (PIT or HPET) for timers. These procedures are now well isolated from the rest of the code after the last major rebuild of the scheduler. Additionally, a frequent operation, stop_timer, is quite complicated today as there is need to lock timers on all cores in searching for the relevant timer to stop. This becomes much easier if there is only a single list of timers, that are programmed to a motherboard resource. In this case, both starting and stopping timers only needs to acquire a single timer spinlock. '

On second thought, it might not even be necesary to let one core do this, but a "lowest priority core" approach would work quite well. That would also mean there is no need to send IPIs.

At the same time, it would make sense to also switch the time-keeping from TSC to a motherboard resource (PIT channel 2 or HPET). The same issues that apply to the APIC timer apply to TSC (it is not reliable under aggressive power-management), and additionally, using a per-core resource would need complex IPI synchronization to keep time synchronized.

The TSC counter could be used to calculate a thread's elapsed time for the same reason that APIC timer could be used as a preemption timer. The TSC counter will be reliable as long as the core is executing code.

The new approach to separate scheduling timers / threads elapsed timer actually makes the scheduler more effecient. The current implementation of the scheduler needs to check timers and update real time on each reschedule, but this would no longer be necesary. The only thing the scheduler needs to do is to read-out TSC when it starts a new thread, and when it switches thread, read-out TSC again and record the difference as elapsed time. Then it would program the APIC timer with a new period as it schedules a new thread. The APIC timer interrupt, and any blocking call, could also always assume that preemption should be done, so there would be no need to check for this.

rdos · Post by **rdos** » Sat Dec 10, 2011 12:18 pm

OK, so timers are back to a global memory area (task_sel), and it works on single-CPUs with the legacy PIC driver. SMP-support is broken, but I'll eventually make that work again.

Next, I'll test to use the IO-APIC to serve the PIT interrupt. This requires some more code rearranging.

BTW, this is when version control is essential. If this venture eventually turns really bad, I can revert all or most of it back to a stable state.

Brendan · Post by **Brendan** » Sat Dec 10, 2011 12:31 pm

Hi,

rdos wrote:The new approach to separate scheduling timers / threads elapsed timer actually makes the scheduler more effecient. The current implementation of the scheduler needs to check timers and update real time on each reschedule, but this would no longer be necesary. The only thing the scheduler needs to do is to read-out TSC when it starts a new thread, and when it switches thread, read-out TSC again and record the difference as elapsed time. Then it would program the APIC timer with a new period as it schedules a new thread. The APIC timer interrupt, and any blocking call, could also always assume that preemption should be done, so there would be no need to check for this.

That sounds good to me..

rdos wrote:BTW, this is when version control is essential. If this venture eventually turns really bad, I can revert all or most of it back to a stable state.

BTW this is where version control is detrimental. If you knew you couldn't revert back at all, you would've been more careful designing it in the first place, and would've saved yourself a lot of work rewriting it twice (once for the original SMP support, and again to fix problems).

Cheers,

Brendan

rdos · Post by **rdos** » Sat Dec 10, 2011 1:09 pm

Brendan wrote:BTW this is where version control is detrimental. If you knew you couldn't revert back at all, you would've been more careful designing it in the first place, and would've saved yourself a lot of work rewriting it twice (once for the original SMP support, and again to fix problems).

But the original design would have worked if AMD had made the TSC and APIC timer operate in the correct way. I had no idea that they would suddenly stop the clock with their power-management code.

BTW, now I also have a functional system using the PIT timer via APIC.

And now system time is also global, and the accumulated times for threads works. In fact, it was not necesary to use the time stamp counter at all since it is just as simple to readout system time and take the difference. I also had to add spinlocks to the read-out of the PIT timer and update of system time, since it could now be accessed by multiple cores.

Solar · Post by **Solar** » Sat Dec 10, 2011 3:18 pm

rdos wrote:But the original design would have worked if AMD had made the TSC and APIC timer operate in the correct way.

You always manage to crack me up. Your design is always correct, it's just the chip designers that get it wrong.

Rusky · Post by **Rusky** » Sat Dec 10, 2011 4:35 pm

If you're good enough, it's impossible for initial research of the system to miss incompatibilities or other such details.

rdos · Post by **rdos** » Sat Dec 10, 2011 4:48 pm

Whatever.

The new design has now been fully implemented, and it works extremely well. No more strange timing issues in my test programs, and I even managed to boot the 5th core on the AMD machine. Some other issues also seems to have disappeared, like the Realtek ethernet controller now works every time. I still cannot boot the 6th core, but that is probably some other issue (the CPU tripplefaults when I try).

What remains is to use the HPET when it is available, as this should improve performance a bit.

rdos · Post by **rdos** » Sun Dec 11, 2011 2:44 am

OK, so the new design works perfectly well on one modern PC (the one that didn't work before), but then it also malfunctions on an older AMD Athlon and on geode. I get no logs on any of them, they just malfunction.

It's at times like these that all the 12 incremental stages (+ the diffs between them) are essential. I'll start by checking-out the last stable version and verify that it is working and then step forward until it breaks.

Edit: A very subtile register overwrite caused the problem on geode. As for the Athlon, I suspect it doesn't have fully compatible PIT hardware. I need to look into the HPET.

rdos · Post by **rdos** » Sun Dec 11, 2011 3:37 pm

I've done some reading on HPET, and it seems perfect for both global types of timers. It has a free running counter (32 or 64 bits, but I'll fix it to 32 bits), which is perfect for recording elapsed time. Then there is a number of comparators for generating interrupts. If the HPET is available, it should minimum support one comparator. One comparator is all that is needed in order to create the one-shot timer support needed for timers in RDOS. In order not to miss interrupts, I think it is enough to read the counter after programming the expire count, and if the counter has passed, call the expire sys-call. Maybe the programmed count should also be read in order to make sure that caching is not affecting the result.

rdos · Post by **rdos** » Mon Dec 12, 2011 9:16 am

They claim that the HPET could replace RTC ints, but I cannot see how this would be possible. The primary use of RTC ints is to provide real-time with higher precision, or synchronize real-time with a higher precision counter, but this is simply not possible as the HPET is not synchronized with the real-time clock update in the RTC. I use the RTC int with a 2Hz clock to synchronize the RTC with PIT/HPET.

A central issue is also if the RTC or the PIT/HPET has the best stability. I don't know about the HPET, but the PIT is typically not as stable as the RTC, which means it is the RTC that needs to stabilize the PIT.

Brendan · Post by **Brendan** » Mon Dec 12, 2011 10:36 am

Hi,

rdos wrote:They claim that the HPET could replace RTC ints, but I cannot see how this would be possible. The primary use of RTC ints is to provide real-time with higher precision, or synchronize real-time with a higher precision counter, but this is simply not possible as the HPET is not synchronized with the real-time clock update in the RTC. I use the RTC int with a 2Hz clock to synchronize the RTC with PIT/HPET.

The primary use of the RTC is to keep track of time when the computer is turned off.

For precision, HPET is way better than PIT, and PIT is better than the RTC.

For accuracy, both PIT and RTC are about the same - mostly it depends on the motherboard, and both are typically accurate to about +/- 2 seconds per day; but I've seen systems where the RTC loses up to 5 minutes per day. The minimum recommendations for HPET is about +/- 43 seconds per day (which seems really bad), but that is only the minimum recommendation, and I'd expect HPET is typically a lot better than that (and probably as good as PIT or RTC).

rdos wrote:A central issue is also if the RTC or the PIT/HPET has the best stability.

They all have long term drift and there's no way to guess which timer will be better/worse. Use NTP and/or allow the user to compensate for drift manually; and make sure your timing code is able to add or remove a little bias to compensate for long term drift. For example, if you set the timer to 1000 Hz then you might want to add 1.00000003 ms or 0.99999997 ms to the current time each IRQ.

Cheers,

Brendan

rdos · Post by **rdos** » Mon Dec 12, 2011 1:08 pm

Brendan wrote:For accuracy, both PIT and RTC are about the same - mostly it depends on the motherboard, and both are typically accurate to about +/- 2 seconds per day; but I've seen systems where the RTC loses up to 5 minutes per day. The minimum recommendations for HPET is about +/- 43 seconds per day (which seems really bad), but that is only the minimum recommendation, and I'd expect HPET is typically a lot better than that (and probably as good as PIT or RTC).

A problem with the HPET is that it hasn't got a fixed frequency, rather the period is given in femtoseconds in a 32-bit integer. The HPET on this machine is a little above 14MHz, and it is almost (but not exactly) 12 times faster than the PIT. That is also a factor in the accuracy possible to achieve.

Brendan wrote:They all have long term drift and there's no way to guess which timer will be better/worse. Use NTP and/or allow the user to compensate for drift manually; and make sure your timing code is able to add or remove a little bias to compensate for long term drift. For example, if you set the timer to 1000 Hz then you might want to add 1.00000003 ms or 0.99999997 ms to the current time each IRQ.

I have support for setting the clock from NTP, as well as keeping it synchronized. That is the best solution provided you have a wired Internet like ADSL.

OSDev.org

What cores do when they are idle

What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle

Re: What cores do when they are idle