OSDev.org

Posted: **Thu Jan 18, 2007 6:48 pm**

How are interrupts handled robustly in an OS that allows a driver to service them, if the driver is not privileged (i.e. part of the kernel and/or running at a privileged level)?

I'm thinking of exokernels - do they trade-off robustness for the ability to access / service interrupts? (I'm not developing an exokernel, per se, but I some of the concepts I'd like to use are similar.)

The problem I'm having is that I'd like non-privileged services (essentially, user-level apps) to register for / request interrupts from the kernel. As part of the request, the service would provide the ISR for the interrupt; assuming the request is granted, the requested interrupt could be enabled / disabled by the service, and all interrupts would be handled by the provided ISR.

Assuming many interrupts are for hw devices needing servicing (usually in a timely fashion), and to prevent race conditions, corruption, etc. interrupts are usually disabled for at least a portion of the ISR. Sometimes, all other interrupts can be reenabled immediately and only the interrupt needing to be service needs to remain disabled. (There are many variations on this theme.)

So, it's easy enough to switch stacks when the interrupt is received.
It's possible to adjust the priv level when executing the ISR (i.e., the true ISR is a privileged wrapper and calls the user-level "ISR", but the wrapper adjusts and restores the priv level around the call).

What's difficult is :
a) guaranteeing that the provided ISR reenables all other HW interrupts in a timely fashion? Otherwise, the system would come to a screeching halt. I'm thinking a "non-maskable watchdog" timer interrupt / monitor could be used - the timer would be set when the interrupt was dispatched to the user-level ISR, and then, when this timer interrupt occurs, it would reenable interrupts (STI), even if the ISR hasn't yet done so. There would need to be some notification back to the service that this occurred, so it knows of a possible error/corruption situation. And even keeping the serviced interrupt masked indefinitely may not be acceptable, as some interrupts can be shared(?).
b) doing the above quickly, without requiring a full task switch (using TSS on IA32) - otherwise, all interrupts (not just the timer) essentially become a preemptive trigger to reschedule the executing context (which it really is, but we want to keep this lightweight). Some HW interrupts occur fast enough that we don't want a full task switch.
c) Each service will also be allowed to access I/O ports and/or physical memory addresses that it registered for via the OS. So when the interrupt occurs, the privileged level ISR will have to set the appropriate permissions to allow those ports / memory regions to be accessed. I think I know how I could do this, but it's cumbersome.

All this to ask...
Am I thinking about this correctly?
What are alternatives that I should consider?
What do common OS implementations do?

Thanks in advance for any feedback on this!

Posted: **Fri Jan 19, 2007 2:25 am**

I guess I'll implement this in a similar fashion to the L4 microkernel: The kernel receives all interrupts (it provides the ISRs) and immediately forwards them as messages to a dedicated interrupt service thread (a userspace thread that listens for those interrupt messages).

The kernel does everything concerning the interrupt flag and interrupt controller, the service thread is part of the device driver and does everything concerning the device itself (ie, acknowledge the interrupt to the device). The messaging part must be very lightweight and low-latency, of course, and the service thread must have an appropriately high priority, so it can't be starved by other applications.

b) doing the above quickly, without requiring a full task switch (using TSS on IA32) - otherwise, all interrupts (not just the timer) essentially become a preemptive trigger to reschedule the executing context (which it really is, but we want to keep this lightweight). Some HW interrupts occur fast enough that we don't want a full task switch.

I guess this depends on how heavyweight a task switch is in your OS.

c) Each service will also be allowed to access I/O ports and/or physical memory addresses that it registered for via the OS. So when the interrupt occurs, the privileged level ISR will have to set the appropriate permissions to allow those ports / memory regions to be accessed. I think I know how I could do this, but it's cumbersome.

If the userspace service is a process of itself, allowing access to physical memory regions shouldn't be that much of a problem (just map them into the address space), allowing access to IO ports can be done using the TSS (this can be a bit more overhead, switching TSS's around during task switches....)

But these things don't need to be done everytime an interrupt occurs - establishing the mappings and IO bitmap once (when the process is granted access to the resources, during startup for example) should be enough.

It could be that I got you wrong, however. Just correct me in that case!

cheers
joe

Posted: **Fri Jan 19, 2007 2:40 am**

I've just read into the L4 docs now and wanted to add something to the above:

When the L4 kernel receives a hardware interrupt, it masks the interrupt to the interrupt controller and immediately acknowledges it. This way, the system can immediately receive another one, but it prevents the same device to flood the system with interrupts (since the interrupt is not yet acknowledged to the device). Then it sends a message to the device driver and waits for a response. As soon as the device driver has serviced the interrupt, it sends a response to re-enable the IRQ again. The kernel will then unmask it in the interrupt controller.

I think this system is pretty secure, there is just one problem: when multiple devices share the same IRQ line, and one of them triggers an interrupt, all the others have to wait (since the IRQ is masked until the device driver has serviced it). So it seems that a device driver should really hurry up in servicing interrupt messages, there could be other waiting on the line. It's also a security/stability issue. I can't think up a different (better) design however other than to deny sharing of interrupts.

cheers
Joe

Posted: **Fri Jan 19, 2007 3:18 am**

What i intend to do in my kernel is to allow code uploads for ISRs:
- A driver provides a kernel-level ISR
- the kernel checks wether its 100% safe
- it then merges it into the kernel itself and have it executed without the overhead of stack/task switches.
This has a side-effect of having the data ready to be used by a program without the driver receiving CPU time.

But wether its in common use, i don't know.

Posted: **Fri Jan 19, 2007 10:56 am**

Thanks for the responses - definitely a lot to think about!

I'm wondering, in L4, is there actually a service thread per driver, or is there a single service thread that is used in common? The problem with the former is the number of threads, and the problem with the latter is that, if the interrupt handler is misbehaved (stack overflow, corruption, etc.), the stability of unrelated drivers may be compromised (because one handler may be interrupted by another according to some priority scheme, but both still utilize the same stack).

The L4 approach of masking the particular interrupt and then reenabling interrupts in general is along the lines of what I was thinking. And, yes, it does seem to obviate interrupt sharing.

allowing access to physical memory regions shouldn't be that much of a problem (just map them into the address space), allowing access to IO ports can be done using the TSS

yeah, that's where I was headed...

- the kernel checks wether its 100% safe

Combuster - How do you intend to do this? I'm not aware of anything that could be done to guarantee this - other than have a list of "known good" drivers and checking signatures on load. This scheme sounds similar to a monolithic kernel with privileged "plug-ins" - the driver code is privileged (via promotion) and essentially becomes part of the kernel itself.

This is not a bad approach, but probably not where I'm headed. The rub for me is "who will verify and maintain the list of known good drivers" - it's an awfully restrictive model for a general OS. Either each administrator must have the time and knowledge to review and perform driver validation (which is error prone, also) - often this degrades to "try it and see" approach, which tends to allow problems arising from subtle interactions and infrequent "misbehaviors" to creep into the system. I do think, though, that this is a very good way to go for an embedded project and/or a hobby OS, or if the OS developer wants to take on the "official" task of authorizing / validating drivers.

More thoughts on system stability...
Of course, the kernel / scheduler would reserve the PIT interrupt. It may be necessary to have the kernel reserve other "dangerous" interrupts (such as APIC timer) to avoid a destabilizing "flood" of interrupts (intentional or otherwise).

I'm thinking I may want to put a mechanism whereby the kernel can monitor / throttle how often an interrupt is dispatched (and, if necessary, kill "offensive" tasks). The exact frequency and "cycle time" (service time / period between dispatches) thresholds allowed could be set by some configuration (maintained by the "admin" of the system).

Finally - am I correct in thinking that CPU traps must be handled on the offending thread itself, since the offending instruction is (usually) restarted? I suppose the actual handling could be "forwarded" to another servicing thread, but this must be synchronous, as the situation must be handled prior to "returning" from the trap (thus restarting the instruction).

Thanks again for the discussion!

Posted: **Fri Jan 19, 2007 2:33 pm**

If the interrupt handler is misbehaved (stack overflow, corruption, etc.), the stability of unrelated drivers may be compromised (because one handler may be interrupted by another according to some priority scheme, but both still utilize the same stack).

You probably mean something like that ?

Code: Select all

Kernel received interrupt (mouse)
Mouse driver is informed via IPC

Nested interrupt is received (timer)
{
    Timer driver is notified
    Driver crashes
    ....
}

Mouse driver unmasks its interrupt

Unfortunately I don't recall how exactly L4 handles this situation, but from what I can tell you could solve the problem by making the driver notification a asynchronous message. If the kernel doesn't have to wait for the driver to return there's no need to keep a stack-frame and a crashed handler wouldn't cause any problems for unrelated devices. Once the driver has finished with its ISR it just issues a second asynchronous message to unmask the IRQ line.

Remote procedure call (synchronous):

Code: Select all

kernel --call-> driver mouse -------o
                                    |
kernel --call-> driver timer --o    |
kernel <--ret-- driver timer --o    |
                                    |
kernel <--ret-- driver mouse -------o

IPC Message (asynchronous):

Code: Select all

kernel --msg--> driver mouse -------o
kernel --msg--> driver timer --o    |
                               |    |
kernel <--msg-- driver mouse -------o    
                               |
kernel <--msg-- driver timer --o

Combuster - How do you intend to do this? I'm not aware of anything that could be done to guarantee this - other than have a list of "known good" drivers and checking signatures on load.

Provided that the drivers are supplied in a secure language the system can analyse the byte-code to verify its correctness . You might want to have a look at the "singluarity" project..

Am I correct in thinking that CPU traps must be handled on the offending thread itself, since the offending instruction is (usually) restarted?

As traps aren't generated for no reason it should be obvious that there's no point in returning to a task before the problem hasn't been handled.

What exactly did you have in mind ?

regards,
gaf

Posted: **Fri Jan 19, 2007 3:22 pm**

Hi,

youngmj wrote:How are interrupts handled robustly in an OS that allows a driver to service them, if the driver is not privileged (i.e. part of the kernel and/or running at a privileged level)?

IMHO this is more complicated than it seems at first....

First, you've got a choice - let the hardware handle IRQ priorities, or use your scheduler to handle the priority of (user-level) interrupt handling code. For the PIC chips the IRQ priorities are messed up (so I'd use my scheduler's "thread priorities" instead), but for I/O APICs the priorities are good/usable.

To use your scheduler's priorities, your kernel would mask the IRQ in the PIC (or I/O APIC) and send an EOI straight away. When the interrupt handling code completes your kernel unmasks the IRQ.

To use the PIC's (or I/O APIC's) IRQ priorities you only send an EOI to the PIC/APIC when the interrupt handling code completes. In this case the IRQ handling code's "scheduler priority" should correspond to the PIC/APIC's IRQ priority.

The whole idea here is to prevent a high priority interrupt handler from being delayed by a lower priority IRQ.

For the kernel's IRQ handlers (after they mask the IRQ and sent the EOI, if you're not using hardware IRQ priorities) you need to send some IPC to *each* piece of code that requested it (as IRQs can be shared) while incrementing a "IRQ handlers in progress" counter. When each piece of code calls the kernels "Interrupt done" function the kernel decrements the "IRQ handlers in progress" counter, and if the counter reaches zero it either sends the EOI (for hardware IRQ priorities) or unmasks the IRQ (for software IRQ priorities).

This seems simple enough, but there's 2 problems.

What if an IRQ handler fails to respond? In this case device drivers that share the same IRQ can lock up (and if you're using hardware IRQ priorities, all lower priority devices can lock up too). To prevent this you'll need a timer - if the IRQ handler doesn't respond within a certain length of time then kill it.

The second problem is what happens if a (user-level) device driver crashes or terminates itself when an IRQ is pending? In this case the kernel needs to be able to detect this and call the "Interrupt done" itself, so that other device drivers, etc don't lock up. This doesn't solve the problem in all cases though - if the hardware is generating a level triggered interrupt, then it will repeatedly generate that interrupt until the device is serviced by the device driver (i.e. it can cause an "interrupt flood").

To prevent this you'd need a way to either disable the device entirely or force it to de-assert it's IRQ. The problem here is that it'd need to be generic code in the kernel that knows nothing about the device itself. For PCI this may be possible by writing a zero to the device control field in the device's configuration space (specifically bit 10, which is an "interrupt disable"). Unfortunately I'm not sure when this interrupt disable was introduced (it wasn't in PCI version 2.2, but is in PCI version 3.0) and I'm not sure how an older PCI card behaves when a zero is written to the device control field. PCI version 2.2 states that writing a zero to this field "logically isolates the device from the bus for all accesses except configuration accesses", but I don't know if this includes de-asserting any currently asserted interrupt signals.

IMHO it's a good idea to write a zero to a PCI device's device control field when the device driver is terminated (either voluntary termination or because it crashed) anyway, as this (should) also stop any ongoing bus-master transactions (for PCI cards that support it).

ISA devices typically aren't designed for interrupt sharing (or level triggered interrupts), and therefore don't cause a problem for unexpected device driver termination.

Lastly, for shared IRQs it may be a good idea for the device driver to tell the kernel if the device was (or wasn't) responsible for generating the IRQ. That way the kernel will send an "IRQ received message" to one device driver and see if it was the cause, then send it to the next device driver, etc, until one of the device drivers says it's device generated the IRQ. In this case it'd also be good to keep track of how often a device driver was responsible for generating an IRQ, so that you can optimize the order that the "IRQ received messages" are sent to device drivers. For e.g. if an 100 MHz ethernet card is responsible for 10 interrupts per second and is sharing an IRQ with a sound card that is responsible for 3 interrupts per second, then you'd want to send the "IRQ received message" to the ethernet card's driver first (and only send it to the sound card's driver if the ethernet card wasn't responsible).

As an extension on this idea, for an "N-CPU" system it might be a good idea to send the "IRQ received message" to N different device drivers at the same time, so their IRQ handlers can run in parallel on different CPUs to reduce interrupt latency. Of course this might also be a bad idea because you'd be pre-empting whatever is running (i.e. doing a thread switch from the currently running thread to the IRQ handling thread). This becomes a compromise between IRQ latency and "CPU intensive workload throughput".

Cheers,

Brendan

Posted: **Fri Jan 19, 2007 5:11 pm**

youngmj wrote:
- the kernel checks wether its 100% safe
Combuster - How do you intend to do this? I'm not aware of anything that could be done to guarantee this - other than have a list of "known good" drivers and checking signatures on load. This scheme sounds similar to a monolithic kernel with privileged "plug-ins" - the driver code is privileged (via promotion) and essentially becomes part of the kernel itself.

This is not a bad approach, but probably not where I'm headed. The rub for me is "who will verify and maintain the list of known good drivers" - it's an awfully restrictive model for a general OS. Either each administrator must have the time and knowledge to review and perform driver validation (which is error prone, also) - often this degrades to "try it and see" approach, which tends to allow problems arising from subtle interactions and infrequent "misbehaviors" to creep into the system. I do think, though, that this is a very good way to go for an embedded project and/or a hobby OS, or if the OS developer wants to take on the "official" task of authorizing / validating drivers.

The ease depends on how far you want to take things. I saw the suggestion of the singularity approach, but that'd be too complicated for a preferrably small kernel like mine. Nonetheless it might be interesting. Actually, I borrowed the idea of safe code running in kernel space from there. The bottom line is: the checker should not return false positives (bad programs that are tested as safe). The amount of false negatives you want to have is up to you and the amount you want to spend on it.

what I intended to do is to do a disassembly of the uploaded code and assert the following properties:
- check that the opcodes used are one of the following: MOV (regs only) ADD ADC SUB SBB OR AND XOR NEG NOT IN OUT STOS (obviously subject to change depending on the support you want to provide)
- for OUTs and INs, assert that the port is constant (practical implementation: theres a MOV DX, constant and DX is unmodified between the MOV and the OUT)
- for OUTs and INs, assert that the port involved is owned by the calling driver
- there are no memory accesses except STOSx
- EDI is unmodified (except implicitly by STOS)

if these are satisfied, the code is safe:
- it wont mess with resources it shouldn't touch
- it will end (due to the absence of jumps)

the code will then count the amount of bytes written using STOS, as well as the code size, submits it to some sanity checks (code size, bytes written, available memory) to make sure it doesnt hold up interrupts too long or use. (safeness does not guarantee usefulness)

If passed, the kernel will create store buffers, add entry and exit code (to load EDI with the buffer address, and to return from interrupt after completion), and registers the interrupt handler for operation.

As proof of concept, i could submit the following snippet:

Code: Select all

MOV DX, KBC_PORT
IN AL, DX
STOSB

which would obviously be pretty safe to run in kernel space (you can assert the statements above as reader exercise, but unless the caller isnt the kb driver they'll hold)

the kernel would change it into this:

Code: Select all

PUSHAD
MOV EDI, [buffer_address]
; original driver code
   MOV DX, KBC_PORT
   IN AL, DX
   STOSB
CMP EDI, [buffer_end]
JNE .notreset
MOV EDI, [buffer_start]
.notreset:
MOV [buffer_address], EDI
OUT 0x20, PIC_EOI
POPAD
IRET

then point the IDT at this.

Note that even though you can isolate processes to not touch the things they are not supposed to, you can't practically impose that limit on all devices without resorting to something along the lines of WHQL.

OSDev.org

Questions about interrupt handling in an OS (ISRs)

Questions about interrupt handling in an OS (ISRs)