Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
nullplan wrote:1000 ticks != 1000 instructions. Especially if there is something as costly as an interrupt transition in there. All that IDT dereferencing and frame pushing doesn't come for free.
True but if you have a 40 MHz i386, you also don't have a deep pipeline ;D On newer CPUs (than the i386), the number of instruction will rapidly go up while the cost of cache misses and pipeline flushes etc. will rise.
A modern CPU with ~1GHz will be more than capable of handling 40k IRQs per second, even without priorities. I think Managarm peaks at more than 40k IRQs per second during boot (most are block device IRQs, timer IRQs, cross-CPI ping IPIs, shootdown IPIs and self IPIs to trigger rescheduling, certainly not UART). EDIT: I also have the data to back it up, rescheduling IRQs per second after boot:
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Korona wrote:If we take the 40k IRQs per second for granted (which is already insane) and you assume that your code runs on an ancient i386 with 40 MHz, you still have 1000 instructions to handle one FIFO IRQ
I counted no more than 400 instructions. Interrupt entry from ring 3 takes about 100 cycles, IRET back to ring 3 takes another 100 cycles, and all other instructions take at least 2 cycles each. Wait states and DRAM refresh will also eat into your budget, but you need benchmarks to count those.
But on a 40 MHz 386, you won't have that many IRQs since the UARTs will have bigger FIFOs.
It's not enough that the CPU can handle 40k IRQs per second. Most CPUs, including the very old 40 MHz 386 can handle this. The issue is more that it must ALWAYS be able to handle an IRQ within a set time limit (interrupt latency) since otherwise you will miss a character on the serial port and cause resends (in the best case) or lost data in the worse.
The issue of interrupt latency is a complex one which includes how the operating system is written (does it run with interrupts disabled for extended periods of time?), firmware that uses SMI excessively, other interrupts with higher priority, and similar.
Also, switching between rings has not really improved much on modern CPUs since segmentation has been neglected and ring switching requires reloading several segment registers. Actually, on some Intel CPUs (Atom) ring switches takes much longer on modern CPUs than on a bit older ones.
Korona wrote:I second nullplan's remarks about actually building a system and measuring -
So would I if I had a machine which was old enough to make the test meaningful. Unfortunately, the oldest suitable (for which read x86-32) machines go for a small fortune on ebay and are far too expensive for me.
Korona wrote:the claim that you need interrupt priorities at all to keep latency down is not supported by any data at this point.
On the contrary, it's a problem which has been known about for years - albeit that it's less recognised now. I've read many times of communication programs which had to reprogram the master 8259 to prioritise the com port interrupts or they could not support high baud rates.
However, I do think I've found a way to set up custom IRQ priorities which, if it works, looks like it would do the job and more. See viewtopic.php?f=1&t=43035
Korona wrote:If we take the 40k IRQs per second for granted (which is already insane) and you assume that your code runs on an ancient i386 with 40 MHz, you still have 1000 instructions to handle one FIFO IRQ
I counted no more than 400 instructions. Interrupt entry from ring 3 takes about 100 cycles, IRET back to ring 3 takes another 100 cycles, and all other instructions take at least 2 cycles each. Wait states and DRAM refresh will also eat into your budget, but you need benchmarks to count those.
But on a 40 MHz 386, you won't have that many IRQs since the UARTs will have bigger FIFOs.
Are you saying that UARTs in 386-class machines have a FIFO? AIUI the situation is something like the following.
the 8250 UART - has no FIFO
16450 - also no FIFO
16550A - has a FIFO but it's unusable
16550B - and later have usable FIFOs
rdos wrote:It's not enough that the CPU can handle 40k IRQs per second. Most CPUs, including the very old 40 MHz 386 can handle this. The issue is more that it must ALWAYS be able to handle an IRQ within a set time limit (interrupt latency) since otherwise you will miss a character on the serial port and cause resends (in the best case) or lost data in the worse.
The issue of interrupt latency is a complex one which includes how the operating system is written (does it run with interrupts disabled for extended periods of time?), firmware that uses SMI excessively, other interrupts with higher priority, and similar.
Great comments! As mentioned in another reply its simply not feasible to buy such old machines as they cost too much but they are still out there, alive and kicking, and if one wants to support them as well as new machines then it's really important to minimise interrupt latency especially for certain hardware.
JamesHarris wrote:On the contrary, it's a problem which has been known about for years - albeit that it's less recognised now. I've read many times of communication programs which had to reprogram the master 8259 to prioritise the com port interrupts or they could not support high baud rates.
I doubt that well-designed programs need to mess with IRQ priority at all (at least not if the OS takes high IRQ volumes into account -- the situation will likely be different on Windows 3.1 than on a OS that is designed from scratch). Before you need to do that, just switch from IRQ handling to polling.
IRQs only have an advantage over polling if either the IRQ rate is not easily predictable (but in your scenario, we know that we want to read from the UART every few microseconds), or if we can save power (which we can't if we're running on almost 100% CPU utilization anyway).
The same strategy (polling instead of IRQs if you know that the I/O latency is going to be low) is also exploited by modern NVMe stacks.
Note that x86 is about the only architecture that supports IRQ priorities (= nested IRQs) at all. Other archs (e.g., ARM) use a single vector for all IRQs. The myriad of embedded ARM devices that run in real-time applications does not seem to require nested IRQs.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Korona wrote:I doubt that well-designed programs need to mess with IRQ priority at all (at least not if the OS takes high IRQ volumes into account -- the situation will likely be different on Windows 3.1 than on a OS that is designed from scratch). Before you need to do that, just switch from IRQ handling to polling.
A pretty strange claim. If you use 8259 it means you are on a single core system, and if you poll the port then that is all your system will do.
Also, the issue has nothing to do with how well designed a program is. If it works or not depends on how the OS itself is designed (and not just the serial port IRQ), and how the firmware behaves. I wouldn't be sure that a more modern OS would handle it better. Modern OSes are bloated and not optimized for handling a huge number of IRQs.
My OS has pretty low interrupt latencies, but it will still miss IRQs on a serial port when running at 115000 baud on an Intel Atom processor. Not all of them, but a few, like one in 1,000 or so.
If course, you don't do "while(true) poll_uart();". You poll it every once in a while, e.g., you poll all running UARTs at an appropriate interval and run your application code for the remainder of the time.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Korona wrote:If course, you don't do "while(true) poll_uart();". You poll it every once in a while, e.g., you poll all running UARTs at an appropriate interval and run your application code for the remainder of the time.
I don't think that works. It's called cooperative multitasking and there is no guarantee whatsoever for how often the UART poll happens. It depends on how long each of the "cooperative" tasks wants to keep the CPU. With the IRQ approach the UART is at least polled each time the CPU is accepting an interrupt, which basically is when interrupts are enabled.
What you describe is how a microcontroller program might be designed, but writing PC programs that way is not motivated or useful.
Korona wrote:If course, you don't do "while(true) poll_uart();". You poll it every once in a while, e.g., you poll all running UARTs at an appropriate interval and run your application code for the remainder of the time.
It's not as simple as that because in the receive direction some of the older UARTs will buffer only one byte. If the host CPU does not read that byte before the next one arrives then it will be lost. Therefore the OS has to guarantee that it will get back to the UART in the required time interval every single time with no exceptions.
You could do that with polling if your OS were single threaded but then the OS would do nothing else while waiting for UART data.
At 57,600 baud you'd receive about 5,760 bytes per second. If you were using polling you'd have to check the UART at least that many times per second without exception. How would you ensure such a polling frequency in a multitasking OS unless you had an interrupt keeping time? And if you had an interrupt then you'd be back to square one!
Even if are using an interrupt, you have reduced the number of interrupts by a factor of 8. Even the oldest i386 will be able to serve 5k IRQs per second.
What you describe is how a microcontroller program might be designed, but writing PC programs that way is not motivated or useful.
You are right that this programming model is more often found on MCUs than on PCs. However, if you're running 40k interrupts on a CPU with 40 MHz, you are not running a true multi-tasking workload anyway: you have one task (namely the UART processing one) that you allow to starve all other tasks. Thus, it does make sense to apply techniques from embedded RTOS here. On the other hand, if you do have a CPU that can do meaningful work between serving these 40k interrupts, this issue goes away entirely anyway.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].