Mixing hard-realtime with an ordinary multitasking OS

rdos · Post by **rdos** » Fri Nov 11, 2011 3:42 am

I'm still in the planning / thinking stage about this. What I would want is a way for a realtime thread to make itself run exclusively (and with no IRQs) on a core so that deadlines can be garanteed. However, I would also want this thread to be able to use any syscall without blocking more than is necesary.

To make a thread run exclusively on a core is really not much of a problem. All that is required is to stop the scheduler from using this core, and rescheduling any threads that are queued on it to the system queues (where they eventually end up on other cores). IRQs can easily be blocked by setting the core's priority-level to the highest possible, and then IRQs would be directed to other cores. In my design, I always set "lowest priority delivery" for all IRQs in the IO-APIC and MSI-control.

The problem is to provide syscalls that will not block, and to do so transparently. A majority of the syscalls can block, even if it is not obvious, mostly because they are synchronized. I would absolutely not want to put "if realtime thread do this else do that" into the entry-points of every syscall! What I would want is synchronization primitives that would favor realtime-threads, and that would use spinlocks to synchronize.

I've come up with one possible solution for this with almost zero-overhead. All syscalls start with an invalid call (call 0002:0000nnnn / call 0003:0000nnnn where n is a gate-number). When these are executed first-time, they are patched to a real call to the destination. When calls are from user to kernel, a call-gate is patched-in, while when they are called kernel-to-kernel, the real destination is patched-in. OK, so if realtime threads used their own GDT, where scheduler-related syscalls had their own call-gate mappings, and if scheduler-related syscalls where not patched to their real destination in kernel, rather used a call-gate just like user-to-kernel already do, then a transparent non-blocking syscall-interface for realtime threads could be implemented only be writing special handlers for syscalls directly connected to the scheduler, which is only a dozen or so instead of for some 500 as is the total number of syscalls.

Then, as a thread is transfered to a realtime thread, an LGDT instruction would change the whole syscall-interface. When the thread exits realtime state, the ordinary per-core GDT would be reloaded, and scheduling on the core would restart.

Additionally, as a thread hits faults, the fault handler would always exit realtime state for the thread. When it is resumed, it could once more enter realtime state. That way, realtime threads could also be debugged normally with an application debugger or with the kernel debugger. Realtime threads like these could be used to drive fast AD and DA converters, to drive parallell hardware where microsecond precision is required, things that cannot be supported otherwise.

The only problem I can see is that I need to localize and remove any code that doesn't use the descriptor management interface to modify GDT-entries. Then, the descriptor management interface needs to upate both GDTs in order to keep the realtime GDT the same as the ordinary GDT, except for the syscall gates.

turdus · Post by **turdus** » Fri Nov 11, 2011 12:10 pm

Study Solaris kernel, they have done that. They use several priority levels, on top there are real-time queues, and at the bottom time shared ones. I won't say it's easy to read, but definitely better than Linux source.

Brendan · Post by **Brendan** » Fri Nov 11, 2011 3:22 pm

Hi,

turdus wrote:Study Solaris kernel, they have done that. They use several priority levels, on top there are real-time queues, and at the bottom time shared ones. I won't say it's easy to read, but definitely better than Linux source.

As far as I know, Solaris is only soft real-time and not hard real-time.

Hard real-time is very difficult to do, makes the OS slower (reduces rare "worst case" times but increases common "average case" times), and is never really needed for general purpose OSs (for the very rare situations where hard real-time is needed, just offload whatever needs to be "hard real-time" to a dedicated ASIC). For all these reasons, no "ordinary multi-tasking OS" bothers with hard real-time.

Cheers,

Brendan

rdos · Post by **rdos** » Fri Nov 11, 2011 6:48 pm

Brendan wrote:Hard real-time is very difficult to do, makes the OS slower (reduces rare "worst case" times but increases common "average case" times),

It depends on the scope. It is not necesary to garantee that syscalls have determinable service-times. My design would only garantee that a piece of independent code has garanteed response-times. This can handle polling of devices that change state quickly, and it can handle AD and DA converters running at high speed. If the code for instance needs to save samples to disc, a soft realtime thread could do this independently of the sample code by use of a shared buffer.

rdos · Post by **rdos** » Fri Nov 11, 2011 6:51 pm

turdus wrote:Study Solaris kernel, they have done that. They use several priority levels, on top there are real-time queues, and at the bottom time shared ones. I won't say it's easy to read, but definitely better than Linux source.

Such a design csnnot handle AD / DA converters, or polling of fast signals. Especially AD / DA converts need a fixed period, something that cannot be achieved with soft realtime only.

Brendan · Post by **Brendan** » Fri Nov 11, 2011 8:31 pm

Hi,

rdos wrote:Such a design csnnot handle AD / DA converters, or polling of fast signals. Especially AD / DA converts need a fixed period, something that cannot be achieved with soft realtime only.

It's very easy to find "AD/DA converter" PCI cards for sample rates up to about 1 MHz. Just slap one into a boring old WinXP box and let the card's buffering and bus mastering stuff do all the work.

If you need hard real-time to do this in software (on 80x86), then you're doing it wrong and you will get pounded by unpredictable delays caused by SMM/SMI.

Cheers,

Brendan

rdos · Post by **rdos** » Sat Nov 12, 2011 4:55 am

Brendan wrote:It's very easy to find "AD/DA converter" PCI cards for sample rates up to about 1 MHz. Just slap one into a boring old WinXP box and let the card's buffering and bus mastering stuff do all the work.

Yes, of course, but you cannot construct real-time filters when bus mastering large chunks of AD / DA data. Another example is to decode unusual signals, for instance a parallel card-reader clock & data signal that can easily reach tens of KHz if the user pulls the card quickly. Or to measure frequency of voltage-to-frequency converters. All of these things can be implemented with dedicated hardware (PIC controllers or signal processor cards), or even PCI-cards, but for small series production avoiding that is more cost-effective. Also, as multicore computers become commonplace, using one or more cores for dedicated tasks becomes feasible as it won't increase system cost like a dedicated controller would.

However, the main advantage of having the control-loop in x86 hardware, as opposed to PIC controller or signal processor card, is that it is easy to debug, involves no communication link x86 to PIC / signal processor, and never is hampered with an inability to upgrade the control-loop.

Brendan wrote:If you need hard real-time to do this in software (on 80x86), then you're doing it wrong and you will get pounded by unpredictable delays caused by SMM/SMI.

Nope. Intel's ACPI-driver, which I use, disables SMM/SMI and replaces it with a usual IRQ that will never execute on a realtime core.

Brendan · Post by **Brendan** » Sat Nov 12, 2011 5:44 am

Hi,

rdos wrote:
Brendan wrote:It's very easy to find "AD/DA converter" PCI cards for sample rates up to about 1 MHz. Just slap one into a boring old WinXP box and let the card's buffering and bus mastering stuff do all the work.
Yes, of course, but you cannot construct real-time filters when bus mastering large chunks of AD / DA data. Another example is to decode unusual signals, for instance a parallel card-reader clock & data signal that can easily reach tens of KHz if the user pulls the card quickly. Or to measure frequency of voltage-to-frequency converters. All of these things can be implemented with dedicated hardware (PIC controllers or signal processor cards), or even PCI-cards, but for small series production avoiding that is more cost-effective. Also, as multicore computers become commonplace, using one or more cores for dedicated tasks becomes feasible as it won't increase system cost like a dedicated controller would.

Just how did you think you're going to get the data into the CPU to begin with? I don't think Intel/AMD CPUs have an adequately isolated analogue input, and I don't think some hacked together hardware connected to a legacy parallel port would handle the data rate, so I'm guessing you're going to need either an AD/DA card or a data capture card anyway.

rdos wrote:
Brendan wrote:If you need hard real-time to do this in software (on 80x86), then you're doing it wrong and you will get pounded by unpredictable delays caused by SMM/SMI.
Nope. Intel's ACPI-driver, which I use, disables SMM/SMI and replaces it with a usual IRQ that will never execute on a realtime core.

Enabling ACPI does cause some things (e.g. power management events) to be delivered as SCI (where they're under the OS's control) rather than SMI; however "some" doesn't imply "all". Get hold of the datasheet for any modern Intel chipset (especially those designed for workstation/server, with QuickPath and/or ECC) and search the datasheet for "SMI". Do this for a few chipsets, then decide if you rally want your company to make guarantees (without even knowing which motherboards/chipsets the end user might be using, and without having extensive evaluation programs and a whitelist/blacklist of supported motherboards).

Cheers,

Brendan

rdos · Post by **rdos** » Sat Nov 12, 2011 10:34 am

Brendan wrote:Just how did you think you're going to get the data into the CPU to begin with? I don't think Intel/AMD CPUs have an adequately isolated analogue input, and I don't think some hacked together hardware connected to a legacy parallel port would handle the data rate, so I'm guessing you're going to need either an AD/DA card or a data capture card anyway.

Many embedded / PC104 cards come with GPIOs that are on the PCI or LPC bus. These can be connected to serial AD/DA converters, can be used to decode clock & data signals for magnetic card-readers, and can also be used to measure frequency. And even if these are not present, or too simple for the task, one can construct a much simpler PC/104 card without PIC controllers or digital signal processors.

Brendan wrote:Enabling ACPI does cause some things (e.g. power management events) to be delivered as SCI (where they're under the OS's control) rather than SMI; however "some" doesn't imply "all". Get hold of the datasheet for any modern Intel chipset (especially those designed for workstation/server, with QuickPath and/or ECC) and search the datasheet for "SMI". Do this for a few chipsets, then decide if you rally want your company to make guarantees (without even knowing which motherboards/chipsets the end user might be using, and without having extensive evaluation programs and a whitelist/blacklist of supported motherboards).

That does not sound good. But OTOH, isn't SMI delivered to the bootstrap core? In that case, a rule could be to never use the BSP as the realtime core.

Owen · Post by **Owen** » Sat Nov 12, 2011 6:30 pm

SMI is delivered to whatever core. For example, an IO port write may trigger an SMI, or the chipset may trigger one on its own to any core (for example, the keyboard controller in USB emulation mode would likely trigger an SMI on a USB packet which would then trigger a virtual keyboard IRQ)

rdos · Post by **rdos** » Sun Nov 13, 2011 3:16 am

Owen wrote:SMI is delivered to whatever core. For example, an IO port write may trigger an SMI, or the chipset may trigger one on its own to any core (for example, the keyboard controller in USB emulation mode would likely trigger an SMI on a USB packet which would then trigger a virtual keyboard IRQ)

Irrelevant. Realtime threads cannot use USB, or keyboards (emulated or not), and the USB / keyboard IRQ is delivered to non-realtime cores only. Besides, RDOS doesn't use emulated keyboards, rather has a native USB keyboard driver. It doesn't use emulated IDE either, rather has a native AHCI driver.

Besides, BIOS starts up with only the BSP activated, so I doubt they would send SMIs to halted cores, which they in fact would do in your scenario before those are started by the OS.

Combuster · Post by **Combuster** » Sun Nov 13, 2011 5:23 am

rdos wrote:Irrelevant.

Learn to read. SMI != IRQ

rdos · Post by **rdos** » Sun Nov 13, 2011 7:21 am

Combuster wrote:
rdos wrote:Irrelevant.
Learn to read. SMI != IRQ

Both are handled by the APIC, and both have a core that it is delivered to (or uses "lowest priority delivery mode"). In fact, any IRQ-line can be set to be delivered as an SMI, just as IPIs can be delivered as SMIs. So, I think it is not me that needs to learn to read.

Basically, if the BIOS sets up the chipset to deliver SMIs on certain events it would need to:
1. Deliver to specifc core (which would be BSP)
2. Deliver to lowest priority core (which would not be the realtime core).

There are no other options that can be infered from the Intel documents.

My guess is that Intel would use MSI for delivering these events, and then the rules of MSI messages apply (and these are either to a specific core or to lowest priority core).

Additionally, the keyboard emulation cannot work like is described above with SMIs on USB-packets, as it is the OS that owns the USB controller (including the keyboard port), and that sets up handlers and physical addresses for USB. It could potentially work like that if the OS doesn't have an USB-driver, but not otherwise. Both a SMI handler and the OS cannot own the USB-controller.

Combuster · Post by **Combuster** » Mon Nov 14, 2011 3:37 pm

1. Deliver to specifc core (which would be BSP)

You don't know that. On thermal issues you might even want to interrupt each core to throttle them. Or deliberately select the last core for performance reasons.

gravaera · Post by **gravaera** » Tue Nov 15, 2011 6:08 am

Firmware is written sadly, by programmers. Firmware programmers have as many problems interpreting specs as you do, and they're just as unwilling to read them as you are, and assuming they do put in their most genuine effort, they are still vulnerable to errors of misunderstanding, etc. You can't make assumptions about the sanity of firmware. Assume only as much as the operating system pertinent parts of the architectural specification allow you to.

OSDev.org

Mixing hard-realtime with an ordinary multitasking OS

Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS

Re: Mixing hard-realtime with an ordinary multitasking OS