Page 4 of 4

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 5:30 am
by Brendan
Hi,
rdos wrote:
Brendan wrote:And meaning that your broken code could attempt 1.193 million IRQs per second under the right conditions and screw things up (setting PIT count to 1 is never a good idea).
If somebody is stupid enough to start a new timer each tic, yes. :mrgreen:
The problem isn't one thread running on one CPU that asks for time delays that are too close together. The problem is thousands of threads running in hundreds of processes on tens of CPUs all wanting time delays that happen to result in "not enough time between IRQs", that leads to an IRQ being missed and the new count never being set, and thousands of threads locking up because they're waiting for something that will never happen; followed by a bug report for "unpredictable lock up under load" that's impossible to debug.
rdos wrote:
Brendan wrote:In a good OS (possibly not your OS) IRQs are rarely disabled for more than about 100 cycles. On an old slow 500 MHz CPU that works out to about 200 ns of "jitter" caused by interrupt latency. The minimum precision of the PIT is about 838 ns which is 4 times higher than "jitter" caused by interrupt latency. Interrupt latency is negligible in a good OS (when running on CPU/s that were made in this century).
I'm pretty sure that interrupt latencies in RDOS on new CPUs is several orders lower than the PIT tic, which means that a sustained rate of 1.193 million PIT interrupts would be possible. Late PIT ISRs is only an issue on older CPUs.
A sustained rate of 1.193 million PIT interrupts per second should be theoretically possible. Unfortunately everything I've read says that such high frequencies aren't sustainable in practice. I'm not sure if it works on some chipsets and not others, or works on none of them. If I was planning to attempt high frequencies with the PIT, I'd test it on a range of computers to determine the maximum frequency the computers I have can handle and then halve it just in case.
rdos wrote:
Brendan wrote:Unless you're using HPET's main counter or TSC (where there's no downside), there's no point increasing overhead (to increase effective precision, not the "storage precision") for real time when nobody cares about that extra precision anyway.
I could give you some examples when it is useful. If you log network traffic, you could also log the real-time marks with us precision. This is useful since you both want to see the real-time when things happened, and the time interval between packets. It is not important if real-time is accurate down to us, but it is important that the difference between packets is accurate down to us.
Sounds like you only need to read real time once for that (e.g. "Log started on 20/11/2011 at 12:34") and then use elapsed time for everything else. It'd be easier calculate the difference between packets if you don't need to worry about different us/seconds/minutes wrapping around back to zero.
rdos wrote:Another example is synchronization with NTP-servers, that have much better resolution than seconds or milliseconds. In order to take advantage of NTP, you need far better precision than milliseconds.
Wikipedia says "NTP can usually maintain time to within tens of milliseconds over the public Internet,[1] and can achieve 1 millisecond accuracy in local area networks under ideal conditions.". You'd need much better precision than milliseconds for fine granularity drift adjustment (e.g. the "drift adjustment in 0.000000195156391 ns increments" from the example code I posted earlier), but not for real time itself.
rdos wrote:
Brendan wrote:
rdos wrote:The only overhead that is needed to synchronize real time with elapsed time is two RTC ints per second. That would adjust for the drift, and doesn't cost any significant overhead even on a 386 processor.
I'm talking about the overhead of keeping track of real time; and you're talking about the overhead of synchronising real time with elapsed time.
Because you suggested to keep track of real time with an ISR.

Here is how I read real time (no overhead or IRQs involved):

Code: Select all

get_time    PROC far
    GetSystemTime
    add eax,cs:time_diff
    adc edx,cs:time_diff+4
    retf32
get_time   ENDP
Erm.

Here's the source code for the best kernel that anyone could ever possibly write:

Code: Select all

    someUnknownMacro
    ret

Cheers,

Brendan

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 6:27 am
by rdos
Brendan wrote:The problem isn't one thread running on one CPU that asks for time delays that are too close together. The problem is thousands of threads running in hundreds of processes on tens of CPUs all wanting time delays that happen to result in "not enough time between IRQs", that leads to an IRQ being missed and the new count never being set, and thousands of threads locking up because they're waiting for something that will never happen; followed by a bug report for "unpredictable lock up under load" that's impossible to debug.
I don't support that environment, so no problem. Thousands of threads in hundreds of processes would use-up GDT, so I don't support that. :mrgreen:
Brendan wrote:
rdos wrote:Here is how I read real time (no overhead or IRQs involved):

Code: Select all

get_time    PROC far
    GetSystemTime
    add eax,cs:time_diff
    adc edx,cs:time_diff+4
    retf32
get_time   ENDP
Erm.

Here's the source code for the best kernel that anyone could ever possibly write:

Code: Select all

    someUnknownMacro
    ret
GetSystemTime is a syscall. It reads elapsed time. You saw it above in the timer code. :mrgreen:

Anyhow, here is how it looks like with PIT, channel 0:

Code: Select all

get_system_time  Proc
    push ds
;
    mov ax,SEG data
    mov ds,ax

gstSpinLock:    
    mov ax,ds:time_spinlock
    or ax,ax
    je gstGet
;
    sti
    pause
    jmp gstSpinLock

gstGet:
    cli
    inc ax
    xchg ax,ds:time_spinlock
    or ax,ax
    jne gstSpinLock
;
    mov al,0
    out TIMER_CONTROL,al           ; latch count
    jmp short $+2
    in al,TIMER0                          ; read lsb
    mov ah,al
    jmp short $+2
    in al,TIMER0                          ; read msb
    xchg al,ah
    mov dx,ax
    xchg ax,ds:clock_tics
    sub ax,dx
    movzx eax,ax
    add ds:system_time,eax
    adc ds:system_time+4,0
;    
    mov eax,ds:system_time
    mov edx,ds:system_time+4
;    
    mov ds:time_spinlock,0
    sti
    pop ds
    retf32
get_system_time Endp

For HPET counter it looks like:

Code: Select all


get_system_time Proc
    push ds
    push es
    push ecx
;
    mov ax,SEG data
    mov ds,ax

ghtSpinLock:    
    mov ax,ds:time_spinlock
    or ax,ax
    je ghtGet
;
    sti
    pause
    jmp ghtSpinLock

ghtGet:
    cli
    inc ax
    xchg ax,ds:time_spinlock
    or ax,ax
    jne ghtSpinLock
;
    mov es,ds:hpet_sel
    mov eax,es:hpet_count
    mov edx,eax
    xchg edx,ds:prev_hpet
    sub eax,edx
    mul ds:hpet_factor                ; HPET rate
    add eax,ds:hpet_guard           ; 32-bit guard (could be use to increase resolution)
    adc edx,0
;
    mov ecx,31F5C4EDh               ; conversion factor to tics
    div ecx
    mov ds:hpet_guard,edx
    add ds:system_time,eax
    adc ds:system_time+4,0
;    
    mov eax,ds:system_time
    mov edx,ds:system_time+4
;    
    mov ds:time_spinlock,0
    sti
;
    pop ecx
    pop es
    pop ds
    retf32
get_system_time Endp
But, just maybe, you wanted to know what the GetSystemTime macro is? :D

Code: Select all


get_system_time_nr                      = 76

UserGate16	MACRO gate_nr
    db 67h
    db 66h
    db 9Ah
    dd gate_nr
    dw 1
		ENDM

UserGate32	MACRO gate_nr
    db 3Eh
    db 67h
    db 9Ah
    dd gate_nr
    dw 3
		ENDM

UserGate        MACRO gate_nr
IF (size $) EQ 0FF02h
    UserGate16 gate_nr
ELSE
    UserGate32 gate_nr
ENDIF
		ENDM

; OUT EDX:EAX       System time
GetSystemTime   MACRO
    UserGate get_system_time_nr
		ENDM


Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 8:40 am
by Brendan
Hi,
rdos wrote:
Brendan wrote:The problem isn't one thread running on one CPU that asks for time delays that are too close together. The problem is thousands of threads running in hundreds of processes on tens of CPUs all wanting time delays that happen to result in "not enough time between IRQs", that leads to an IRQ being missed and the new count never being set, and thousands of threads locking up because they're waiting for something that will never happen; followed by a bug report for "unpredictable lock up under load" that's impossible to debug.
I don't support that environment, so no problem. Thousands of threads in hundreds of processes would use-up GDT, so I don't support that. :mrgreen:
Ok, you might not have this problem until you fix the "OS uses segments but doesn't use an LDT for each process" problem. Then again there's no guarantee you won't - it'd only take a few threads that ask for time delays that happen to be too close together.

Essentially you're relying on one problem (lack of GDT entries) to minimise the chance that another problem (lack of sane minimum PIT delay) will occur.
rdos wrote:
rdos wrote:Here is how I read real time (no overhead or IRQs involved):
GetSystemTime is a syscall. It reads elapsed time. You saw it above in the timer code. :mrgreen:

Anyhow, here is how it looks like with PIT, channel 0:

Code: Select all

get_system_time  Proc
    push ds
;
    mov ax,SEG data
    mov ds,ax

gstSpinLock:    
    mov ax,ds:time_spinlock
    or ax,ax
    je gstGet
;
    sti
    pause
    jmp gstSpinLock

gstGet:
    cli
    inc ax
    xchg ax,ds:time_spinlock
    or ax,ax
    jne gstSpinLock
;
    mov al,0
    out TIMER_CONTROL,al           ; latch count
    jmp short $+2
    in al,TIMER0                          ; read lsb
    mov ah,al
    jmp short $+2
    in al,TIMER0                          ; read msb
    xchg al,ah
    mov dx,ax
    xchg ax,ds:clock_tics
    sub ax,dx
    movzx eax,ax
    add ds:system_time,eax
    adc ds:system_time+4,0
;    
    mov eax,ds:system_time
    mov edx,ds:system_time+4
;    
    mov ds:time_spinlock,0
    sti
    pop ds
    retf32
get_system_time Endp

Your idea of no overhead is a spinlock that has an "impossible to estimate worst case time to acquire" followed by three legacy IO port accesses that can cost around 1 us each?

Note: For a spinlock like that, it's theoretically possible (although extremely unlikely) for as few as 2 CPUs to repeatedly acquire the lock and prevent a third CPU from acquiring it for extremely long amounts of time.


Cheers,

Brendan

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 9:19 am
by rdos
Brendan wrote:Ok, you might not have this problem until you fix the "OS uses segments but doesn't use an LDT for each process" problem. Then again there's no guarantee you won't - it'd only take a few threads that ask for time delays that happen to be too close together.

Essentially you're relying on one problem (lack of GDT entries) to minimise the chance that another problem (lack of sane minimum PIT delay) will occur.
I have LDTs per process, but when the application is using flat memory model, that doesn't help much. As I wrote, I don't support typical server configurations that need hundreds of processes and/or thousands of threads.
Brendan wrote:Your idea of no overhead is a spinlock that has an "impossible to estimate worst case time to acquire" followed by three legacy IO port accesses that can cost around 1 us each?
First, it won't cost 1 us on any modern hardware as the PIT is implemented on the PCI-bus by the LPC device. Second, on really old hardware where the PIT is implemented on the AT-bus, there is no SMP, so the spinlock is never busy.

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 9:42 am
by Brendan
Hi,
rdos wrote:
Brendan wrote:Ok, you might not have this problem until you fix the "OS uses segments but doesn't use an LDT for each process" problem. Then again there's no guarantee you won't - it'd only take a few threads that ask for time delays that happen to be too close together.

Essentially you're relying on one problem (lack of GDT entries) to minimise the chance that another problem (lack of sane minimum PIT delay) will occur.
I have LDTs per process, but when the application is using flat memory model, that doesn't help much. As I wrote, I don't support typical server configurations that need hundreds of processes and/or thousands of threads.
Then again there's no guarantee you won't - it'd only take a few threads that ask for time delays that happen to be too close together.
rdos wrote:
Brendan wrote:Your idea of no overhead is a spinlock that has an "impossible to estimate worst case time to acquire" followed by three legacy IO port accesses that can cost around 1 us each?
First, it won't cost 1 us on any modern hardware as the PIT is implemented on the PCI-bus by the LPC device. Second, on really old hardware where the PIT is implemented on the AT-bus, there is no SMP, so the spinlock is never busy.
First, the "LPC device" is indirectly connected to a PCI bus via. a low pin count connection, which is as slow as an ISA bus; and IO port accesses can cost around 1 us each. Ironically, I've seen people (attempt to) rely on this for very short delays (e.g. for a 123 us delay, do 123 IO port writes) because they thought it was "safe" after testing it on several computers. I wouldn't consider relying on "1 us per IO port access" safe, but I would consider "1 us per IO port access" unlikely either.

Second, did you notice that the HPET code has a spinlock too?


Cheers,

Brendan

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 10:45 am
by rdos
Brendan wrote:Then again there's no guarantee you won't - it'd only take a few threads that ask for time delays that happen to be too close together.
But that is still not a problem. It works just fine. There is no need to define a large minimum delay of say 119 tics or anything like that. There is no need for the system to perform well if users ask for 1 tic delays, or if thousands of threads ask for 1ms delays at the same time. If this happens temporarily it might affect performance temporarily, but it won't have any adverse effect beyond that.
Brendan wrote:First, the "LPC device" is indirectly connected to a PCI bus via. a low pin count connection, which is as slow as an ISA bus; and IO port accesses can cost around 1 us each. Ironically, I've seen people (attempt to) rely on this for very short delays (e.g. for a 123 us delay, do 123 IO port writes) because they thought it was "safe" after testing it on several computers. I wouldn't consider relying on "1 us per IO port access" safe, but I would consider "1 us per IO port access" unlikely either.
OK, but the same applies to the PIC. If there is no APIC, or no HPET, there is no other way for the OS to cope other than to use what is available, even if it is slow.
Brendan wrote:Second, did you notice that the HPET code has a spinlock too?
Of course. That is an absolute requirement if reliable elapsed time is implemented, and also if real time uses biased elapsed time. It is a major design-decision in RDOS that there should be timers and elapsed time with sub-microsecond resolution. If that costs a little, then let it be so. Using paging and segmentation also costs, but for similar reasons this is regarded as absolutely necessary, regardless of the costs.

Besides, the GetSystemTime syscall might be called maybe a few 1000 times per second and core. If the locked region is worse case (3us), the chance for lock contention is perhaps 1% on a dual core CPU. In the HPET-version, it is much smaller than that. For a HPET version with 16 cores the chance for lock contention is probably still less than 1%. I don't exactly find that problematic at all.

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Sun Dec 25, 2011 8:11 pm
by Brendan
Hi,
rdos wrote:
Brendan wrote:Then again there's no guarantee you won't - it'd only take a few threads that ask for time delays that happen to be too close together.
But that is still not a problem. It works just fine. There is no need to define a large minimum delay of say 119 tics or anything like that. There is no need for the system to perform well if users ask for 1 tic delays, or if thousands of threads ask for 1ms delays at the same time. If this happens temporarily it might affect performance temporarily, but it won't have any adverse effect beyond that.
If this happens temporarily (for PIT), you miss an IRQ, never set the count for the next IRQ, and all your generic timers stop working after that. If random failure isn't an adverse effect, then I guess I'll assume it's a design goal.
rdos wrote:
Brendan wrote:First, the "LPC device" is indirectly connected to a PCI bus via. a low pin count connection, which is as slow as an ISA bus; and IO port accesses can cost around 1 us each. Ironically, I've seen people (attempt to) rely on this for very short delays (e.g. for a 123 us delay, do 123 IO port writes) because they thought it was "safe" after testing it on several computers. I wouldn't consider relying on "1 us per IO port access" safe, but I would consider "1 us per IO port access" unlikely either.
OK, but the same applies to the PIC. If there is no APIC, or no HPET, there is no other way for the OS to cope other than to use what is available, even if it is slow.
Why not provide an alternative "GetSystemTime_Fast" that is less precise (e.g. seconds only)? That way software that doesn't care about sub-microsecond precision doesn't have to have the extra overhead of reading the PIT's count (when the PIT is being used).
rdos wrote:
Brendan wrote:Second, did you notice that the HPET code has a spinlock too?
Of course. That is an absolute requirement if reliable elapsed time is implemented, and also if real time uses biased elapsed time.
You should remove the spinlock and just use atomic 64-bit reads/writes (no chance of lock contention is better than a small chance of lock contention).

You shouldn't hard-code the conversion factor "31F5C4EDh" for HPET, as it can be different for different chipsets.

You could multiply by "2^32 * 1/31F5C4EDh" instead of dividing.

The way you do the conversion and lose precision, then add the less precise amount of time to elapsed time causes long term drift; in the same way that "1.25 + 1.25 + 1.25 + 1.25" is not the same as "1 + 1 + 1 + 1" and 5 ms isn't the same as 4 ms.

You should be keeping track of elapsed time with higher precision anyway, so that callers can use higher precision if/when it is available (would also solve the previous problem).

If you're using elapsed time as the basis for real time, then you should have fine grained drift adjustment for elapsed time.


Cheers,

Brendan

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Mon Dec 26, 2011 3:29 am
by rdos
Brendan wrote:If this happens temporarily (for PIT), you miss an IRQ, never set the count for the next IRQ, and all your generic timers stop working after that. If random failure isn't an adverse effect, then I guess I'll assume it's a design goal.
Unless there are bugs in PITs that make them not trigger IRQs with short delays, I still fail to see how this could happen. The PIT IRQ is edge-triggered, and would be retained until a CPU handles it and sends EOI.
Brendan wrote:Why not provide an alternative "GetSystemTime_Fast" that is less precise (e.g. seconds only)? That way software that doesn't care about sub-microsecond precision doesn't have to have the extra overhead of reading the PIT's count (when the PIT is being used).
Might be a good idea. All that it would need to do is to read the 64-bit counter, as this would be updated by the scheduler regularily. OTOH, threads that only need seconds for real-time will not call the function very often.
Brendan wrote:You should remove the spinlock and just use atomic 64-bit reads/writes (no chance of lock contention is better than a small chance of lock contention).
I suppose this would be possible if the chipset implements the full 64-bit HPET counter as it won't roll-around.
Brendan wrote:You shouldn't hard-code the conversion factor "31F5C4EDh" for HPET, as it can be different for different chipsets.
That is not the conversion factor for the specific counter, but how the smallest time interval in the HPET specification is converted into tics, and thus it won't differ between chipsets. I first multiply with the chipset specific factor to get the count in a standardized format (femptoseconds).
Brendan wrote:You could multiply by "2^32 * 1/31F5C4EDh" instead of dividing.

The way you do the conversion and lose precision, then add the less precise amount of time to elapsed time causes long term drift; in the same way that "1.25 + 1.25 + 1.25 + 1.25" is not the same as "1 + 1 + 1 + 1" and 5 ms isn't the same as 4 ms.
I use the rest from the division and save it as a remainder, which I then add to the next readout difference. The way to get extended precision is to take this remainder and multiply it with 2^32 / 31F5C4EDh. So there should be no loss of precision.

However, if I change logic to atomically readout the 64-bit counter, then the conversion logic must also be changed. I then need to use some base-time (determined at boot-time) that I subtract from, and then do the conversion. The conversion logic then probably needs updating so it can handle large differences, which I suspect it presently cannot. The current logic relies on small differences that are accumulated.

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Tue Jan 03, 2012 3:28 pm
by rdos
I found a very interesting errata from AMD (http://support.amd.com/us/Processor_TechDocs/33610.pdf). The last errata (#400), claims that APIC timer will not trigger interrupts when the processor is in C1e or C3 state. This errata is for AMD Athlon, but I suspect it also is true for newer processors. That means that APIC timer generally cannot be used as a source for timers if C1e mode is enabled, and the processor executes hlt instructions. The APIC timer might be ok for preemption though (you wouldn't execute hlt in a normal task).

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Tue Jan 17, 2012 6:14 am
by rdos
Brendan wrote:If this happens temporarily (for PIT), you miss an IRQ, never set the count for the next IRQ, and all your generic timers stop working after that. If random failure isn't an adverse effect, then I guess I'll assume it's a design goal.
I think I know what this "PIT bug" is now. After optimizing the IRQ handlers, I started to get really strange errors where the system clock advances a few 100ms per 10s, everything that is timed start to move really slow. After making some remote debugging and inspecting various system variables while in this state, I conclude:

1. There is no thread hogging the CPU.
2. The system time advances at a very low rate, but when there is interrupt activity, it speeds up considerably.
3. The reload code is run with interrupts disabled, so that should not be an issue. Secondly, there should always be an even number of loads, so this would correct itself, which it doesn't do.

The problem seems to be that LSB and MSB gets mixed-up. When I change the reload code to this, the bug disappears:

Code: Select all

   push ax
    mov ax,30h
    out TIMER_CONTROL,al
    pop ax
    jmp short $+2
;
    out TIMER0,al
    xchg al,ah
    jmp short $+2
    out TIMER0,al
By setting the current mode of the counter before loading expire time, it seems to work every time.

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Tue Jan 17, 2012 4:52 pm
by Brendan
Hi,
rdos wrote:
Brendan wrote:If this happens temporarily (for PIT), you miss an IRQ, never set the count for the next IRQ, and all your generic timers stop working after that. If random failure isn't an adverse effect, then I guess I'll assume it's a design goal.
I think I know what this "PIT bug" is now. After optimizing the IRQ handlers, I started to get really strange errors where the system clock advances a few 100ms per 10s, everything that is timed start to move really slow. After making some remote debugging and inspecting various system variables while in this state, I conclude:

1. There is no thread hogging the CPU.
2. The system time advances at a very low rate, but when there is interrupt activity, it speeds up considerably.
3. The reload code is run with interrupts disabled, so that should not be an issue. Secondly, there should always be an even number of loads, so this would correct itself, which it doesn't do.
That might be the problem I was thinking of. If you miss the timer IRQ and don't set a new count then you wouldn't get any timer IRQs again; unless your scheduler happens to set a new count during a task switch that was caused by something else and restart the timer. It might be a completely different problem too though. ;)
rdos wrote:The problem seems to be that LSB and MSB gets mixed-up.
That shouldn't be possible (as long as interrupts are correctly disabled when you set the timer count), and you shouldn't need to touch "TIMER_CONTROL" to set the new count.

Also don't forget that those out instructions are expensive (around 1 us per "out"). To reduce overhead you could put the timer into "high-byte only" mode so that you can set a new count with only one out instruction. This would mean you end up with something like "high byte * 214 us" granularity for the IRQ.


Cheers,

Brendan

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Wed Jan 18, 2012 3:04 am
by rdos
Brendan wrote: That shouldn't be possible (as long as interrupts are correctly disabled when you set the timer count), and you shouldn't need to touch "TIMER_CONTROL" to set the new count.
I didn't think so either, yet it happened 3-4 times yesterday with the new, faster, IRQ-stubs. In fact, it happened every time within 30 minutes. When I added the TIMER_CONTROL lines yesterday, this no longer happens, and the system still runs today.
Brendan wrote: Also don't forget that those out instructions are expensive (around 1 us per "out"). To reduce overhead you could put the timer into "high-byte only" mode so that you can set a new count with only one out instruction. This would mean you end up with something like "high byte * 214 us" granularity for the IRQ.
More likely I would set it in "low-byte only" mode in that case, and program FF if it was too large.

Re: Using APIC timer as a "system board" timer when HPET fai

Posted: Tue Apr 10, 2012 5:11 am
by rdos
Having reread this discussion twice now, I think Brendan might have a point about low-resolution timeouts, like typical packet timeouts in network drivers and alike. Since the clock always is updated at least every 1ms (because of the preemption timeout), it would be possible to support timeouts that require less than 1ms resolution by simply reading the value of system time. Timeouts could be checked in the task-loading routine (which is garanteed to be called at least every 1ms), and thus low resolution timers would never need to load the timer hardware, which would improve performance of time-critical timers, as well as of interrupts in general.