Contemplating the use of hardware timers. How many timers is enough? I think this can be done in hardware. If they had a pre-scaler they do not all have to operate in parallel. For instance, if pre-scaled by 32, 32 parallel hardware timers could be turned into 1024 timers using dedicated RAM to keep track of the counts and sequencing through the timers using the 32 pre-scaler counts. A larger pre-scale and more RAM would allow more timers. Would a timer per thread be enough?
Pre-scaling could affect the timer resolution, but would still allow high-resolution.
Hardware timer per thread?
Re: Hardware timer per thread?
I think you are too much influenced by the BIOS timer which is implemented with the PIT. This kind of timers are not good for anything. You get one tick every 1/18.2 seconds, but the PIT has microsecond resolution. For timers, you either need one per system or one per processor core, the latter being optimal. Then you keep a list of active timers per system or per core. The API would have a timeout time (in whatever resolution you desire, like PIT tics, nanoseconds) and a callback. The OS then starts the timer (if the request is at the head) by programming the hardware timer. When the timer IRQ happens, the callback is called (in the IRQ context). This provides microsecond resolution using the PIT. Then there is also another user case: You want to know elapsed time. This is best implemented by having a three-running counter/timer per system, like the HPET, but the PIT can be used too. You read the timer in the API and then add to an accumulated count. No IRQ is required, as the accumulated count can be updated in the scheduler to avoid timer/counter wrap. This too can be implemented with microsecond resolution (with PIT), or nanosecond with better timers.robfinch wrote: ↑Thu Jul 10, 2025 2:36 am Contemplating the use of hardware timers. How many timers is enough? I think this can be done in hardware. If they had a pre-scaler they do not all have to operate in parallel. For instance, if pre-scaled by 32, 32 parallel hardware timers could be turned into 1024 timers using dedicated RAM to keep track of the counts and sequencing through the timers using the 32 pre-scaler counts. A larger pre-scale and more RAM would allow more timers. Would a timer per thread be enough?
Pre-scaling could affect the timer resolution, but would still allow high-resolution.
Re: Hardware timer per thread?
I was thinking of timers with micro-second or better resolution. For instance, with a 100 MHz base clock the clock can be divided (scaled) by 100 and still have micro-second resolution.
The appeal of having a large number of timers is that there are no lists to manage. I think it would be possible that the OS does not need to manage the timers in the same manner.
It might be possible to have a callback directly into the app, with the app managing the timer. Sleep(timeout, callback). TimeThis(signal, callback). Although the OS would still need to be present.
I have been trying to move some of the OS functions into hardware. It may help to get rid of the timer lists.
The appeal of having a large number of timers is that there are no lists to manage. I think it would be possible that the OS does not need to manage the timers in the same manner.
It might be possible to have a callback directly into the app, with the app managing the timer. Sleep(timeout, callback). TimeThis(signal, callback). Although the OS would still need to be present.
I have been trying to move some of the OS functions into hardware. It may help to get rid of the timer lists.
Re: Hardware timer per thread?
Have you guys ever heard of the LAPIC timer? It is a timer that runs on each thread separately, and runs as fast as the TSC. It has also been the standard way of interrupting a core in a time-based way for decades now.
Carpe diem!
Re: Hardware timer per thread?
Yes, I read of the LAPIC a while ago. I think there is only one timer per hardware thread (processing core / CPU) for the LAPIC. I was referring to a timer per software thread. I should have been more specific. The system is running a custom (hobby) based system, no LAPIC available. The hardware is all on one FPGA chip. CPU / chipset varies. Current rendition is for a 68010 CPU.
Re: Hardware timer per thread?
I reasoned based on existing x86 hardware. Then you need to have a software API that doesn't depend on the hardware it runs on. If the LAPIC is available, it's the best choice.
If you are creating an FPGA solution (like I've done too), then you can have any number of timers. Just create the clocks you need for your application, either in hardware (Verilog/VHDL) or export them to processor implementations.
Still, I don't see the utility of this. In the FPGA, you just add a register and count down to create a "timer". In my old design, everything is Verilog, and then data is exported via PCIe to a standard PC. This implementation doesn't use timers; instead, it operates with IRQs delivered via PCIe. The analysis application does use timers, but that's standard OS timers. In my new FPGA using a Xilinx Zynq core, I have a real-time dual-core ARM processor. My idea is that it will poll hardware registers implemented in Verilog, and will not have any timers either. It will be entirely event-based. There is also a quad-core ARM processor where I will run AMDs(Xilinxs) Linux implementation. It will not have any dedicated timers from the FPGA, rather will operate like a typical memory mapped device. The real-time processor will create memory mapped requests that are then served under Linux. For this application microsecond timers are not enough, not even nanosecond. Instead, I will use a 64-bit free-running counter that is incremented at 500 MHz, and which also has eight "sub-tics" since the ADC runs are 4 GHz. Samples will be tagged using this counter and when they are analysed the timing information will refer to the sampling counter.
If you are creating an FPGA solution (like I've done too), then you can have any number of timers. Just create the clocks you need for your application, either in hardware (Verilog/VHDL) or export them to processor implementations.
Still, I don't see the utility of this. In the FPGA, you just add a register and count down to create a "timer". In my old design, everything is Verilog, and then data is exported via PCIe to a standard PC. This implementation doesn't use timers; instead, it operates with IRQs delivered via PCIe. The analysis application does use timers, but that's standard OS timers. In my new FPGA using a Xilinx Zynq core, I have a real-time dual-core ARM processor. My idea is that it will poll hardware registers implemented in Verilog, and will not have any timers either. It will be entirely event-based. There is also a quad-core ARM processor where I will run AMDs(Xilinxs) Linux implementation. It will not have any dedicated timers from the FPGA, rather will operate like a typical memory mapped device. The real-time processor will create memory mapped requests that are then served under Linux. For this application microsecond timers are not enough, not even nanosecond. Instead, I will use a 64-bit free-running counter that is incremented at 500 MHz, and which also has eight "sub-tics" since the ADC runs are 4 GHz. Samples will be tagged using this counter and when they are analysed the timing information will refer to the sampling counter.
Re: Hardware timer per thread?
Earlier versions of the LAPIC were quite problematic given that their frequency depended on the CPU core frequency. On those CPUs, the LAPIC cannot be used even if it is present. Also, for x86 OSes, you need to consider hardware with only PIT.
-
- Member
- Posts: 5889
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Hardware timer per thread?
That seems like it would be too many timers (or not enough software threads).
What's your plan for a thread that wants two timers? What are you going to do with all the extra timers when most threads don't use them?
Re: Hardware timer per thread?
Octo already asked the "but why?" question. Let me expand upon it: You can just collect all things that need timeouts into a priority queue sorted by deadline. Then in the timer interrupt, you dequeue all elements whose deadlines have passed (that is, you dequeue-min until the minimum element has a deadline in the future) and execute their callback functions. That way, more than one timer interrupt source is just unnecessary, although it might be nice to spread the load.
Oh, and try to maintain the system time from hardware counters like the TSC or such, that way you don't need to count timer interrupts.
Right, in that case the LAPIC timer cannot be used to interrupt at a certain real time in the future. However, the LAPIC timer can still be used as scheduler tick, because if a task is active, the CPU isn't sleeping, and so neither is the TSC. And on multi-core systems, using the LAPIC timer for this is nicer than the PIT, because the latter leads to a bucket chain (where the PIT interrupts some core which then has to send an IPI to the correct core). It's not awful, just not as nice as it could be.
Well, if you want to be technical about it, x86 is only the CPU architecture, and nothing obligates the machine in question to follow ISA guide lines. See for example the PlayStation 4, an x86 computer that is so definitely not an ISA PC they had to massively patch the Linux kernel to get it to run on there. That was also because Sony made some boneheaded decisions, but I distinctly remember the lack of PIT being one thing they had to address.
Which is to say, you also need to consider the case where you don't have a PIT and need some other known-frequency timer source to calibrate the TSC.
Carpe diem!