Lockless 32-bit timer to 64-bit counter

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Lockless 32-bit timer to 64-bit counter

Post by OSwhatever »

There are many systems that only provide a HW 32-bit free running counter but internally in my OS the timer type is 64-bits. Of course this is not a problem when you can detect the wrap around case. In in order avoid problems with preemption I now lock the timer reading so that there is no condition where several CPUs tries updated that hsb part at the same time or it was preempted in that part. So basically it looks like this:

Code: Select all

SavedIrqStatus_t saveStatus = m_lock.LockDisableInterrupts();

uint32_t lsb = 0xffffffff - m_timerBase.Input32(TIMER1VALUE);

if(lsb < m_lastLsbVal)
{
        m_timerHsb += 1;
}

m_lastLsbVal = lsb;

m_lock.UnlockRestoreInterrupts(saveStatus);
This works but it has the problem that it needs to lock every time you need to read the time which happens not too seldom. I've noticed that this lock degrades performance quite a lot in some test cases. In some test cases, user processes will read the time continuously and they start to compete with the kernel scheduler that also needs to read the time and the result is severe degradation. If I remove the lock, the system behaves much better. So what I need is to find out a way to handle the wrap around case in a lockless way, but I can't really find a way to get it right. Do you know a good way?
User avatar
Nessphoro
Member
Member
Posts: 308
Joined: Sat Apr 30, 2011 12:50 am

Re: Lockless 32-bit timer to 64-bit counter

Post by Nessphoro »

Well, dedicate one core to be the timer (I mean it is the only core that will update the counter) and then you can just use LOCK to get your ticks in a deterministic manner.
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: Lockless 32-bit timer to 64-bit counter

Post by gravaera »

Yo:

You can investigate multiple reader locks, or using atomic operations to read the time, and atomic increment to update the time, etc, etc. However you solve the issue of handling wrap-around is up to you, though.

--Peace out,
gravaera
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
rdos
Member
Member
Posts: 3306
Joined: Wed Oct 01, 2008 1:55 pm

Re: Lockless 32-bit timer to 64-bit counter

Post by rdos »

Maybe you could save the last reading (per core), and then add the difference to the last reading to a (per core) current time instead? This assumes cores read the timer often enough.

Besides, do you really need to disable interrupts for this operation? Wouldn't a typical spinlock (without disabling interrupts) work as well (provided you don't read the time from IRQs)?

Also, not all computers have a global 32-bit timer. Some older computers only have PIT-timer, while some others only have the per-core APIC timer.
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Lockless 32-bit timer to 64-bit counter

Post by OSwhatever »

gravaera wrote:Yo:

You can investigate multiple reader locks, or using atomic operations to read the time, and atomic increment to update the time, etc, etc. However you solve the issue of handling wrap-around is up to you, though.

--Peace out,
gravaera
I've played around with atomic operation but I never get right, there is always a loophole somewhere. It's two values we have to play around with here, the last lsb value and the hsb value.
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Lockless 32-bit timer to 64-bit counter

Post by OSwhatever »

rdos wrote:Maybe you could save the last reading (per core), and then add the difference to the last reading to a (per core) current time instead? This assumes cores read the timer often enough.

Besides, do you really need to disable interrupts for this operation? Wouldn't a typical spinlock (without disabling interrupts) work as well (provided you don't read the time from IRQs)?

Also, not all computers have a global 32-bit timer. Some older computers only have PIT-timer, while some others only have the per-core APIC timer.
Absolutely you have to disable the interrupt, that's is something basically have to do for almost everything in the kernel, disable interrupts+lock. Otherwise you will end up in deadlock situations because you might use these in the next thread or in the interrupt.
rdos
Member
Member
Posts: 3306
Joined: Wed Oct 01, 2008 1:55 pm

Re: Lockless 32-bit timer to 64-bit counter

Post by rdos »

OSwhatever wrote:
rdos wrote:Maybe you could save the last reading (per core), and then add the difference to the last reading to a (per core) current time instead? This assumes cores read the timer often enough.

Besides, do you really need to disable interrupts for this operation? Wouldn't a typical spinlock (without disabling interrupts) work as well (provided you don't read the time from IRQs)?

Also, not all computers have a global 32-bit timer. Some older computers only have PIT-timer, while some others only have the per-core APIC timer.
Absolutely you have to disable the interrupt, that's is something basically have to do for almost everything in the kernel, disable interrupts+lock. Otherwise you will end up in deadlock situations because you might use these in the next thread or in the interrupt.
Yes, I forgot. If the operation cannot be accessed in scheduler or in IRQs, a better option is a semaphore, but this is not applicable in this case as it is used in scheduler.

But I'm still puzzled how this simple operation could affect performance so much? Maybe the code in your C-statements is not very optimal? And why save the IRQ-state? Can't you just use cli in the request and sti in release part? You really shouldn't support nesting spinlocks anyway!
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Lockless 32-bit timer to 64-bit counter

Post by Combuster »

Typically systems implement a compare-and-swap in bus width which means you can update such 64-bit counters at will, make sure there's only one thread actually applying the wraparound, and any readers can just as well grab the value in whole.

if there's no atomic 64-bit operation, you'll probably get to resort to things like the snapshot algorithm (which is wait-free, but at a price).
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Lockless 32-bit timer to 64-bit counter

Post by Brendan »

Hi,
rdos wrote:Maybe you could save the last reading (per core), and then add the difference to the last reading to a (per core) current time instead? This assumes cores read the timer often enough.
This is the right way to do it - no locks needed anywhere (but some sort of timer IRQ used to make sure the counter is read often enough). You can calculate the maximum amount of time between counter reads easily enough - "2**32/counter_frequency" or "2**32 * time_between_counter_updates". For example, if the hardware updates the counter every 100 ns (or at a rate of 10 MHz) then you'd have to make sure the counter is read at least once every 42949672960 us (or about every 7 minutes).

If an IRQ handler could read the time, or if a task switch can occur while the time is being read, then you would have to disable IRQs (e.g. "pushfd; cli; ...; popfd"), but there's no need for anything more expensive than that.
OSwhatever wrote:Also, not all computers have a global 32-bit timer. Some older computers only have PIT-timer, while some others only have the per-core APIC timer.
In that case you use something radically different (none of the code for "global 32-bit counter" needs to be the same as any of the code for "PIT", or any of the code for "TSC", or...).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: Lockless 32-bit timer to 64-bit counter

Post by OSwhatever »

Brendan wrote:Hi,

This is the right way to do it - no locks needed anywhere (but some sort of timer IRQ used to make sure the counter is read often enough). You can calculate the maximum amount of time between counter reads easily enough - "2**32/counter_frequency" or "2**32 * time_between_counter_updates". For example, if the hardware updates the counter every 100 ns (or at a rate of 10 MHz) then you'd have to make sure the counter is read at least once every 42949672960 us (or about every 7 minutes).

If an IRQ handler could read the time, or if a task switch can occur while the time is being read, then you would have to disable IRQs (e.g. "pushfd; cli; ...; popfd"), but there's no need for anything more expensive than that.

In that case you use something radically different (none of the code for "global 32-bit counter" needs to be the same as any of the code for "PIT", or any of the code for "TSC", or...).


Cheers,

Brendan
At a first glance this is the best solution as it only alters CPU private variables and has no locking. However, as it is CPU private data it also means that I have to disable the interrupts while I retrieve the CPU specific data and updating the counter otherwise there can lead to errors due to preemption and also that the thread can migrate to another CPU. I played around with 64-bit cmpxchg and I finally made lockless version that I think should hold water.

In both cases you need to visit the timer often enough so that you don't miss a wrap, though.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Lockless 32-bit timer to 64-bit counter

Post by Brendan »

Hi,
OSwhatever wrote:
Brendan wrote:This is the right way to do it - no locks needed anywhere (but some sort of timer IRQ used to make sure the counter is read often enough). You can calculate the maximum amount of time between counter reads easily enough - "2**32/counter_frequency" or "2**32 * time_between_counter_updates". For example, if the hardware updates the counter every 100 ns (or at a rate of 10 MHz) then you'd have to make sure the counter is read at least once every 42949672960 us (or about every 7 minutes).

If an IRQ handler could read the time, or if a task switch can occur while the time is being read, then you would have to disable IRQs (e.g. "pushfd; cli; ...; popfd"), but there's no need for anything more expensive than that.

In that case you use something radically different (none of the code for "global 32-bit counter" needs to be the same as any of the code for "PIT", or any of the code for "TSC", or...).
At a first glance this is the best solution as it only alters CPU private variables and has no locking. However, as it is CPU private data it also means that I have to disable the interrupts while I retrieve the CPU specific data and updating the counter otherwise there can lead to errors due to preemption and also that the thread can migrate to another CPU. I played around with 64-bit cmpxchg and I finally made lockless version that I think should hold water.

In both cases you need to visit the timer often enough so that you don't miss a wrap, though.
Imagine 2 CPUs both trying to "CMPXCHG" the same location. For "best case" the first CPU fetches the cache line, reads some data from it, then makes the cache line "exclusive" and modifies that cache line; then the second CPU does the same thing. For "worst case" the second CPU fetches the cache line and reads some data from it, but then the cache line is evicted (due to the first CPU making it "exclusive" and modifying it), then the second CPU tries to do "CMPXCHG" and has to fetch the cache line a second time, and sees that data was modified and has to retry.

Now imagine 16 CPUs that all do "CMPXCHG" frequently - that poor little cache line is going to be bouncing all over the place. The time taken to transfer the cache line between CPUs (plus the retries needed when other CPUs have modified it between the original read and the CMPXCHG) adds up to a scalability problem. The more CPUs you've got the worse it gets.

For single-CPU, disabling (and then enabling) IRQs is probably just as fast as CMPXCHG; but there's no scalability problems at all, so (for multi-CPU) it's much much better.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
xenos
Member
Member
Posts: 1121
Joined: Thu Aug 11, 2005 11:00 pm
Libera.chat IRC: xenos1984
Location: Tartu, Estonia
Contact:

Re: Lockless 32-bit timer to 64-bit counter

Post by xenos »

Brendan wrote:Imagine 2 CPUs both trying to "CMPXCHG" the same location.
Is it really the same location? I thought this method uses one counter per CPU?
Programmers' Hardware Database // GitHub user: xenos1984; OS project: NOS
Post Reply