I hope that many of you have implemented the PIT.
Would you mind rather checking that if you latch the PIT counter and read it, twice in a row, then how much difference you get while running in QEMU. I get a difference of about 50-100 ticks - which means a lot of instructions.
Is this due to a bug - or due to delay in reading the PIT counter after latching in QEMU?
Reading the PIT
-
- Posts: 23
- Joined: Tue Dec 12, 2017 6:51 am
- Libera.chat IRC: XtremeCoder
Re: Reading the PIT
Hi,
Note that you can set the access mode to "hibyte only" or "lobyte only", so that you can read the current count with a single read without issuing the latch command (making it around 3 times faster to read the count). For example, if the PIT was set to "4660.867 Hz, lobyte only" then your IRQ handler could increment a 32-bit "hibyte variable" in RAM, and you could combine that with the lobyte read from the PIT (and then check the "hibyte variable" from RAM and handle roll-over) to get a 40-bit counter value with the same (~838 nanosecond) precision; however for each IRQ you'll probably send an EOI to the PIC chip and that will cost about 1 microsecond too, so this "lobyte only" scheme can be even more expensive if you get the counter rarely (and could be a lot more efficient if you read the counter extremely often). For another example; if you use "hibyte only" (and maybe set the PIT frequency to 18.2 Hz) it'd be similar (you could combine an IRQ handler's variable with the PIT's hibyte to get a larger counter), but you'd have a lot less IRQ overhead (because there's a lot less IRQs) and a lot worse precision (~214.55 microsecond precision instead of ~838 nanosecond precision).
Alternatively; (if the computer supports them) the ACPI counter, HPET's main counter, the CPU's local APIC timer's counter, and the CPU's TSC (if it's fixed frequency) can all be accessed with a lot less overhead (because none of them use legacy IO ports) and all of these will give you a lot better precision than the PIT. Ideally you'd only use the PIT on ancient computers that don't support any of the better alternatives; and on ancient computers the cost of legacy IO port accesses (in "CPU cycles") will be a lot less (e.g. 3 microseconds is only 75 cycles for an old 25 MHz CPU).
Cheers,
Brendan
I don't know about Qemu; but for real hardware you can expect that each read or write to a "legacy" IO port will probably cost 1 microsecond (due to ancient ISA bus timing); and the "latch, then read then read" sequence will cost about 3 microseconds (or around 9000 cycles for a 3Ghz CPU).ShukantPal wrote:Would you mind rather checking that if you latch the PIT counter and read it, twice in a row, then how much difference you get while running in QEMU. I get a difference of about 50-100 ticks - which means a lot of instructions.
Is this due to a bug - or due to delay in reading the PIT counter after latching in QEMU?
Note that you can set the access mode to "hibyte only" or "lobyte only", so that you can read the current count with a single read without issuing the latch command (making it around 3 times faster to read the count). For example, if the PIT was set to "4660.867 Hz, lobyte only" then your IRQ handler could increment a 32-bit "hibyte variable" in RAM, and you could combine that with the lobyte read from the PIT (and then check the "hibyte variable" from RAM and handle roll-over) to get a 40-bit counter value with the same (~838 nanosecond) precision; however for each IRQ you'll probably send an EOI to the PIC chip and that will cost about 1 microsecond too, so this "lobyte only" scheme can be even more expensive if you get the counter rarely (and could be a lot more efficient if you read the counter extremely often). For another example; if you use "hibyte only" (and maybe set the PIT frequency to 18.2 Hz) it'd be similar (you could combine an IRQ handler's variable with the PIT's hibyte to get a larger counter), but you'd have a lot less IRQ overhead (because there's a lot less IRQs) and a lot worse precision (~214.55 microsecond precision instead of ~838 nanosecond precision).
Alternatively; (if the computer supports them) the ACPI counter, HPET's main counter, the CPU's local APIC timer's counter, and the CPU's TSC (if it's fixed frequency) can all be accessed with a lot less overhead (because none of them use legacy IO ports) and all of these will give you a lot better precision than the PIT. Ideally you'd only use the PIT on ancient computers that don't support any of the better alternatives; and on ancient computers the cost of legacy IO port accesses (in "CPU cycles") will be a lot less (e.g. 3 microseconds is only 75 cycles for an old 25 MHz CPU).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.