OSDev.org

Posted: **Fri Apr 24, 2009 8:20 am**

Hi,

What would be the best way to implement high resolution delays in a kernel?

I originally had my scheduler running at 1000hz, so basically i could use that to perform an n-ms delay, but suppose we want to perform sub ms delays, like 50us or something. The two options I can think of is to have the scheduler run at a much higher rate, say 20khz, so we can use it to manage multiples of 50us and it only does the rest of it's work every 20 calls (in the case of 20khz).

Or the other option would be to use rdtsc and the frequency to measure a 50us period? (this would assume consistent frequency.. ie no speedstep) and then this would act more as an app level function rather than a kernel delay function.

I'm not considering the hpet/lapic for the second..

Posted: **Fri Apr 24, 2009 8:28 am**

What about just putting the process to sleep? This might not be too precise though because you cannot guarantee when it will updated and this can throw off the time if it is now awakened fast enough. If thats not good enough, you could always set another timer, that way you dont have to touch the scheduler. The timer can have 3 different rates.

Posted: **Fri Apr 24, 2009 8:44 am**

Also you could run the scheduler at variable time steps.

Posted: **Fri Apr 24, 2009 11:04 am**

wierdly, in bochs the pit has no problem going to 20khz, but in qemu it seems to almost stop.. not sure if anyone's tried running faster timers under qemu?

Posted: **Fri Apr 24, 2009 11:37 am**

Hi,

johnsa wrote:to have the scheduler run at a much higher rate, say 20khz, so we can use it to manage multiples of 50us and it only does the rest of it's work every 20 calls (in the case of 20khz).

Back some time there was the idea to raise the linux kernels HZ constant from 100 to 1000 (not sure on which kernel version this exactly happened). IIRC there was some debate ongoing if that would be a to high frequency and they concluded that 1000 Hz would be okay. You are planning to have a even 20 times higher frequency. If a 10x-increase (from 100 to 1000 Hz) is debatable then a 200x-increase (from 100 Hz to 20 kHz) would definitely not being accepted.

Just think a second about it: The most time the interrupt occurs you have nothing to do at all. Just keep track of time that you can time sub-millisecond delays. That's a big overhead for a little use. How often do you use such delays? Is a general 20 kHz timer frequency that would have a considerable performance impact justified? I doubt it.

--TS

Posted: **Fri Apr 24, 2009 11:51 am**

i agree, i personally don't see the point in keeping the "scheduler" running at that sort of rate... during normal code it'll just cause major loss of perf.
1000hz it is then. So the question is then, if the kernel scheduler is already running.. and you need to time say 50us... whats the best approach? suddenly boost the scheduler just for that, or leave it well alone and use something else like rdtsc?

Posted: **Fri Apr 24, 2009 12:15 pm**

lets see..
There is hardcore CPU timing(ha!)
or there is my little idea; using the RTC for scheduler, leaving the PIT open for free use. This also has the bad problem of a lot of modern computers(especially Dells) don't properly implement the RTC. This makes it so the RTC works and can be polled and such, but when you try to make it do an interrupt, it either fails(no interrupt) or it goes extremely slow no matter what the set rate of interrupt is(like, an interrupt ever 5 seconds or so)

There is hpet and all that gunk too you can consider.

Posted: **Fri Apr 24, 2009 1:00 pm**

I've had issues with rtc before too, i use it just to get the sys date/time and sync it back up.. thats where i'm going to leave the rtc for compatability sake.
I'll assume the pit will always run the scheduler, unless lapic is used possibly.. hpet i'd keep as the general purpose for app use high perf timer... i just need something for device driver code really that can say do 50/100us delays .. as such it will always be sequential and non interruptable... hmmm

Posted: **Fri Apr 24, 2009 1:16 pm**

johnsa wrote:I've had issues with rtc before too, i use it just to get the sys date/time and sync it back up.. thats where i'm going to leave the rtc for compatability sake.
I'll assume the pit will always run the scheduler, unless lapic is used possibly.. hpet i'd keep as the general purpose for app use high perf timer... i just need something for device driver code really that can say do 50/100us delays .. as such it will always be sequential and non interruptable... hmmm

what driver do you think actually needs such precise timers?

Posted: **Fri Apr 24, 2009 1:19 pm**

The question for the delay is whether you will do busy-wait or put the process to sleep. In the latter case the scheduler will have to be able to wake it up again after the requested time has passed or the delay will not be the requested amount of time.

By using mode 0 (interrupt on terminal count) on channel 0 and mode 2 (rate generator) on channel 2, multiple timers can be simulated by reprogramming channel 0 to generate an interrupt when the next timer expires, and using channel 2 to avoid that we accumulate errors when doing so. When a timer interrupt occurs the action required for corresponding timer(s) are performed and if this calls for preempting a task (for example the one that got to run during the 50us) this can be done.

Posted: **Fri Apr 24, 2009 1:50 pm**

The way modern day code works (eg. POSIX) many system calls are expected to block. A thread can only run X number of instructions before it hits a blocking system call and stops. Even on a 5 year old CPU, you can run 6 million instructions in one millisecond. What do you really think the odds are that a normal everyday app will run 6 million instructions before it hits a blocking system call and has to stop anyway? Now, compare that probability to it only running for 300,000 instructions before it blocks on a system call? I think the odds are nearly identical. I doubt that most apps even make it through 50,000 operations before blocking. This is something that needs to be determined experimentally -- you cannot just guess the right answer. But if typical apps run less than 300,000 instructions before blocking and task switching ANYWAY -- then you can run your clock at 20KHz and suffer no penalty at all.

Alternately, you have two choices. Get a piece of hardware to send you a one-shot interrupt, with reasonable accuracy (you really need an APIC for this -- forget about the PIT).

Or, if you want to wait for a period of time that is shorter than a scheduler timeslice, your only choice is to pick a timer that is ticking reasonably fast, and poll it until it reaches a number that you want.

(Unless you want a REALLY short delay -- and then you can make a very poor estimate that an IO Port read takes 100 nanoseconds.)

Posted: **Fri Apr 24, 2009 9:29 pm**

Hi,

johnsa wrote:What would be the best way to implement high resolution delays in a kernel?

For accurate delays, the scheduler might run other tasks for a while (until the delay is almost expired) and then busy wait. For example, imagine you want a delay of 100012345 ns and the IRQ that's used for waking up sleeping tasks might occur every 8 ms; so you could sleep for 96000000 ns then busy wait for the remaining 4012345 ns.

The best way depends on what the hardware supports. Different timers may or may not be present, RDTSC might not provide reliable timing, etc.

The best way is to determine which timers are present and the characteristics of those timers during boot, and then use the best timer for each purpose. I'd recommend starting by writing a list of all things your OS will need to use timers and timing for. For e.g.:

Something used by the scheduler to determine when a task has used all of it's time slice and pre-empt it.
Something to keep track of "wall clock" time (prefereably in UTC or something that isn't effected by time zones, etc).
Something to keep track of time used by tasks
Something to wake up tasks that sleep, for coarse-grained delays
Something to use with a busy-wait, for fine-grained delays

From this list you'd combine some things. For example, the timer used for fine-grained delays could also be used for keeping track of the time used by each task.

Then, during boot you'd find out which timers are present and perhaps build a list of details for each timer - if it supports "periodic IRQ", if it supports "one shot IRQ", how much overhead is involved, how accurate it is (drift), how precise it is, etc. The possible timers include:

PIT
Local APIC timers
HPET
RTC
ACPI counter
TSC

Then (for e.g.) for each purpose you can give each timer a score and choose the timer with the best score.

You'd also want to use a small abstraction layer, so that the code for all timers use the same interface, and so that the kernel can do something like "call [selected_timer + function_number * 4]" without caring which timer it's actually using for what.

Note 1: for some CPUs the TSC doesn't run at a fixed frequency, but you can use the "thermal sensor" IRQ (in the local APIC) to determine when the speed of the RDTSC changes; and therefore you could still use the TSC for precise timing by doing something like "virtual_ticks += (current_TSC_count - last_TSC_count) * scaling_factor" (where the scaling factor is changed whenever the RDTSC speed changes).

Note 2: There's a few situations where the PIT might not be usable - the PIT might be emulated by HPET (the emulated PIT disappears once you configure HPET), or it might be present but it's IRQ might not be routed to the I/O APIC.

Cheers,

Brendan

Posted: **Fri Apr 24, 2009 10:07 pm**

Brendan wrote:Hi,

johnsa wrote:What would be the best way to implement high resolution delays in a kernel?
For accurate delays, the scheduler might run other tasks for a while (until the delay is almost expired) and then busy wait. For example, imagine you want a delay of 100012345 ns and the IRQ that's used for waking up sleeping tasks might occur every 8 ms; so you could sleep for 96000000 ns then busy wait for the remaining 4012345 ns.

If the dispatcher is able to deduce that such a situation is going to come, it should temporarily increase the quanta to 100012345 so that the next slice is bigger for the next process. Fair or unfair to the other processes, it would still be a bit more effective. Using the same (increased) quanta for all processes one time and then returning to the initial one (to remedy the fairness issue) is a bad idea since another such situation may occur and the quanta needs to be increased yet again.

OSDev.org

Higher Precision timing/delays in kernel

Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel

Re: Higher Precision timing/delays in kernel