Opinions on scheduler (C++ coroutines and NodeJS)

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
rdos
Member
Member
Posts: 3247
Joined: Wed Oct 01, 2008 1:55 pm

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by rdos »

bellezzasolo wrote:
rdos wrote:So you have the IDT per CPU core? That's an interesting idea, but you would need some common exceptions to be shared.

Given that IRQs are a limited resource, and some modern devices want a lot of them, having the IDT per core might better support such devices. However, that means the IRQ setup code must know which core the IRQ is suppose to occur on, and the core of course must be running. I usually default all IRQs to BSP, and only start additional cores when load goes up. I then move both IRQs and server procedures to other cores when they use a lot of CPU time.

A possible strategy to keep IRQs dynamic is to allocate IRQs as private on BSP, and when they are moved to another core, deallocate on BSP and reallocate on the new core.

The Intel i210 network chip provides an interesting receive queue scheme where 5 different IRQs can be used, and in the optimal case they should be allocated on different cores.
Yeah, the crucial thing there was to support a per-core APIC timer, which is used for preemption. That said, the IRQ setup code doesn't need to know in most cases - all that is handled by the PCI MSI(-X) layer and interrupt dispatcher. Although that needs fleshing out, currently they're assigned to the CPU running the initialisation procedure, which is the BSP. The idea though is that the OS handles load balancing and such, all the driver has to do is assign a particular function and parameter to a particular PCI MSI. Generally the parameter is an opaque class pointer, and the function is a static that just casts the class and calls a member function to handle the interrupt.

How that interoperates with user applications requesting resources will be a little fun, but there we go.
Not all hardware support MSI or MSI-X, but moving interrupts between cores is a lot easier with MSI and MSI-X. Currently, I have no devices operating with MSI-X, so I only move IRQs between cores for MSI-based devices.

However, using MSI-X on Intel i210 is compelling, and would require the ability to move MSI-X IRQs. OTOH, since MSI-X allows each IRQ to be individually configured, a better solution might be to allocate one MSI-X IRQ and start one receive queue with one receive thread. Then, as load increase, another receive queue is enabled, the IRQ is tied to another core, and another receive thread is started. In this case the IRQ could be the same, and the difference could be in the target CPU core. I suspect this would cause problems with my scheduler code though. Or maybe this is a good strategy for a majority of MSI-X able devices.

I'd rather not go back to a situation where the device decides which core the IRQs are tied to. I want this to be automatic based on load. But I could create a MSI-X mechanism where some MSI-X slots can use the same IRQ on several different cores. This would not require IDTs per core though, which I think I want to avoid (too large risk of breaking things).
nullplan
Member
Member
Posts: 1744
Joined: Wed Aug 30, 2017 8:24 am

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by nullplan »

rdos wrote:
Octocontrabass wrote:I don't see why it wouldn't work. Actually, it should be easier than other devices since they're always edge-triggered and can't raise a new IRQ until the driver acknowledges them.
When a new one is raised (because another character arrives), the old one is lost if the port has no FIFO. So, the issue is that with high enough baudrate, and no (or small) FIFO, characters will be lost if not handled fast enough. The fastest handling of course is buffering characters in an IRQ written in assembly.
The highest baud rate of the ISA serial port is 115200. With the typical settings of 8/N/1, you require 10 bit times to transmit a single character, meaning a single character takes 1/11520 seconds to transmit (which comes out at roughly 85µs). A 1GHz CPU (bear in mind those are 20 years old now) has 85000 cycles in that time. That really is more than enough to take the interrupt, handle it, unblock the serial port driver, switch to it, and read out the character. Unless the serial driver were somehow not the most important thing to run in that situation, in which case I would question the design choices in the scheduler that lead to that being the case.

So I highly doubt that the kernel thread method is too slow for any device of any relevance today.
Carpe diem!
rdos
Member
Member
Posts: 3247
Joined: Wed Oct 01, 2008 1:55 pm

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by rdos »

nullplan wrote:
rdos wrote:
Octocontrabass wrote:I don't see why it wouldn't work. Actually, it should be easier than other devices since they're always edge-triggered and can't raise a new IRQ until the driver acknowledges them.
When a new one is raised (because another character arrives), the old one is lost if the port has no FIFO. So, the issue is that with high enough baudrate, and no (or small) FIFO, characters will be lost if not handled fast enough. The fastest handling of course is buffering characters in an IRQ written in assembly.
The highest baud rate of the ISA serial port is 115200. With the typical settings of 8/N/1, you require 10 bit times to transmit a single character, meaning a single character takes 1/11520 seconds to transmit (which comes out at roughly 85µs). A 1GHz CPU (bear in mind those are 20 years old now) has 85000 cycles in that time. That really is more than enough to take the interrupt, handle it, unblock the serial port driver, switch to it, and read out the character. Unless the serial driver were somehow not the most important thing to run in that situation, in which case I would question the design choices in the scheduler that lead to that being the case.

So I highly doubt that the kernel thread method is too slow for any device of any relevance today.
You forgot several factors above.

First, you need to add the longest time interrupts can be disabled (interrupt latency). In a real-time kernel this is an important parameter, but in many more normal kernels, this time can be very long. For instance, if you implement spinlocks with cli, then the CPU can spin a spinlock for a long time. The scheduler & task switcher is almost guaranteed to need to disable interrupts at last part of the time it switches tasks. Some IRQs might run with interrupts disabled (and if you enable interrupts in the IRQ, then you need to protect against scheduling). How you handle exceptions can further delay the response. For instance, does the pagefault handle run with interrupts disabled? A poorly designed SMP scheduler can further increase interrupt responses if it needs many spinlocks or have too many resources under contention, like shared task queues. A poor design could also have the IRQ happen on one core while the server thread runs on another, needing an IPI to activate the server.

Second, the BIOS might use SMI to emulate devices, and if this is poorly done, it can "kill" interrupt latency, This can be minimzed by not using emulated devices like keyboard emulation, but other than that, it's just an unknown that can create problems

Third, I don't think serial port server threads should have the highest priority in the system. Things like playing sound must have higher priority to avoid click sounds in the playback. I think network server threads are additional candidates for high priority, particularly if the chip turns off on buffer overrun (like RTL does).

Forth, my lower end CPUs run only at a few 100 MHz, so I'd divide the time by 5 or so. The original 386 only run at 25 MHz or so.

In fact, even if I think I have farily good interrupt latency, handling 115000 baud is a problem even when the IRQ buffers characters and there is no FIFO.

So, if you want to prove that your kernel can handle 115200 baud with no FIFO, you first need to subtract the worse case interrupt response time from 85us. If you know the interrupt latency, you probably have a realtime kernel, and otherwise you probably don't know it and you need to resort to statistics that you will lose a character with some probability.

Also, a cycle doesn't mean an instruction. For instance, if the IRQ happens in user mode, it can take hundreds of cycles before the processor has switched to kernel and is executing the IRQ code.
Octocontrabass
Member
Member
Posts: 5449
Joined: Mon Mar 25, 2013 7:01 pm

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by Octocontrabass »

rdos wrote:In fact, even if I think I have farily good interrupt latency, handling 115000 baud is a problem even when the IRQ buffers characters and there is no FIFO.
That's why nowadays every serial port has at least a 16-byte FIFO. (Even the serial port in your 25MHz 386 might have one.) A 16-byte FIFO allows for about 1ms of interrupt latency even with a baud rate of 115200, which might be part of the reason why that number comes up so often in serial devices...
rdos
Member
Member
Posts: 3247
Joined: Wed Oct 01, 2008 1:55 pm

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by rdos »

Octocontrabass wrote:
rdos wrote:In fact, even if I think I have farily good interrupt latency, handling 115000 baud is a problem even when the IRQ buffers characters and there is no FIFO.
That's why nowadays every serial port has at least a 16-byte FIFO. (Even the serial port in your 25MHz 386 might have one.) A 16-byte FIFO allows for about 1ms of interrupt latency even with a baud rate of 115200, which might be part of the reason why that number comes up so often in serial devices...
That's mostly true, but there is no standard for how to enable it. Some hardware will have it enabled by default, while other has some "custom" way of enabling it.
User avatar
bellezzasolo
Member
Member
Posts: 110
Joined: Sun Feb 20, 2011 2:01 pm

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by bellezzasolo »

rdos wrote:
Octocontrabass wrote:
rdos wrote:In fact, even if I think I have farily good interrupt latency, handling 115000 baud is a problem even when the IRQ buffers characters and there is no FIFO.
That's why nowadays every serial port has at least a 16-byte FIFO. (Even the serial port in your 25MHz 386 might have one.) A 16-byte FIFO allows for about 1ms of interrupt latency even with a baud rate of 115200, which might be part of the reason why that number comes up so often in serial devices...
That's mostly true, but there is no standard for how to enable it. Some hardware will have it enabled by default, while other has some "custom" way of enabling it.
Even then, up to the driver. For legacy PIO hardware like that, reading the byte into a dedicated buffer in the handler itself shouldn't be a problem. It's not like you're trying to memcpy() a page.

The lower half of the handler can then deal with sending the data to the consumer.
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
rdos
Member
Member
Posts: 3247
Joined: Wed Oct 01, 2008 1:55 pm

Re: Opinions on scheduler (C++ coroutines and NodeJS)

Post by rdos »

bellezzasolo wrote: Even then, up to the driver. For legacy PIO hardware like that, reading the byte into a dedicated buffer in the handler itself shouldn't be a problem. It's not like you're trying to memcpy() a page.

The lower half of the handler can then deal with sending the data to the consumer.
You can compare it to a TCP/IP socket. The TCP/IP stack will not give you back raw network messages, and there is no reason why a serial port should give you data character-by-character either. Both of these are best implemented with a syscall that return currently accumulated data. How the data is collected is up to the OS kernel to decide. My implementation of serial ports do not have kernel server threads, rather each port has a buffer (size passed with open) and a spinlock. The IRQ will then fill the buffer, and a "read" syscall will fetch data for the application. The TCP/IP socket does not have a kernel server thread either, rather is filled up by packets from a network server thread.
Post Reply