How to map MSIs and IOAPIC ISRs to real int numbers

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

How to map MSIs and IOAPIC ISRs to real int numbers

Post by rdos »

This seems to be a complicated issue. First, there are quite a few PCI devices that can request up to 16 (or is it 32?) continous interrupts. These are best allocated in some way. Currently, I have an allocator for MSI ints that can allocate a number of continous MSIs out of a pool of 64 possible MSIs. For the IO-APIC, I simply reserve 24 ints.

Problem: My 6-core AMD phenom computer has two IO-APICs (according to ACPI APIC table), and according to Windows, it uses the second IO-APIC since the Realtek network-card has ISR #40. That breaks my int allocation since I only have 24 ints available at this position.

OTOH, the IO-APIC can specify physical int-number per interrupt slot, so there is really no need for 24 continous ints per IO-APIC. It would be possible to allocate an int when an IO-APIC entry is created. That would mean that there should be a single pool of available ints that can be allocated 1,2,4,8,16 and 32 entries per call. The first ints are reserved by Intel/AMD, and then I reserve int 66, int 67, int 9A and possibly some more, but that still would mean it would possible to create a pool with at least 200 ints for allocation.

In order to convert global IRQ numbers (which ACPI uses) to physical int number, a table could be used.

There would also be a need to keep a vector of available IO-APICs. Maybe this could be combined with the global IRQ table since each IO-APIC has a base global IRQ-number.

Maybe something like this (in C syntax):

Code: Select all

struct TGlobalIntEntry
{
    short int IoApicSel;
    char PhysicalInt;
    char TriggerMode;
};

#define MAX_GLOBAL_INTS   128

TGlobalIntEntry GlobalIntArr[MAX_GLOBAL_INTS];
I have also discovered (on the same 6-core AMD), that using rdmsr with entry 0x1b to get the local APIC base is not reliable. The only reliable method seems to be to use the ACPI APIC table.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by gerryg400 »

Rdos, you do speak C. I knew you'd come around !! :wink:
If a trainstation is where trains stop, what is a workstation ?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by Brendan »

Hi,
rdos wrote:This seems to be a complicated issue. First, there are quite a few PCI devices that can request up to 16 (or is it 32?) continous interrupts.
For MSI, you tell the device what value to write and which address to write it. The device itself doesn't know or care what the value is or what the address is (partly because PCI and MSI are platform independent and not intended for 80x86 alone). Therefore, it's impossible for any PCI device to require contiguous interrupt numbers and impossible for the device to know if its interrupts are contiguous or not.
rdos wrote:OTOH, the IO-APIC can specify physical int-number per interrupt slot, so there is really no need for 24 continous ints per IO-APIC.
The interrupt priority scheme used by the APICs implies that interrupts should be allocated dynamically. There's 256 interrupt vectors, the first 32 is reserved/used by exceptions. That leaves a pool of 224 interrupts, arranged as 15 priority groups with 16 interrupts per priority.

First, you want to allocate a few very low priority interrupts for spurious IRQs - one for the APIC's spurious IRQ and 2 for the PIC chip's spurious IRQs (note: you can't mask/prevent the PIC's spurious IRQs even when all PIC IRQs are masked). You'll probably also want a few interrupts for the local APIC (local APIC timer, performance monitoring, etc); plus maybe one for the kernel API (e.g. "int 0x80"), plus some for IPIs. All of these would use fixed/"hard-coded" interrupt numbers scattered all over the place (various interrupt priorities). After this, any interrupt number that is left over is free for dynamic allocation.

For the remainder; the interrupt a device uses should depend on how much the device is effected by IRQ latency. A device that needs low latency (e.g. ethernet) should ask for a high priority interrupt, and a device that doesn't need low latency (e.g. floppy) should ask for a low priority interrupt; and the kernel/OS should find interrupt number that is closest to the desired priority. A device that has multiple IRQs asks for interrupts one at a time (where the priority requested for each interrupt may be different).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by Brendan »

Hi,

My apologies. I looked into it, and it seems PCI *does* have a "multiple message" capability, where a device can request up to 32 contiguous interrupt vectors (and the OS can give it anywhere from 1 interrupt vector to the number of vectors the device requested). :oops:

I guess this means you need a function to dynamically allocate a group of interrupt vectors at a certain priority; but the function can find a group at a different priority (or "closest available priority"), and can refuse and only allocate a smaller group of interrupt vectors (e.g. "You asked for 16 contiguous interrupt vectors, but I could only find 4 contiguous interrupt vectors, so that's all you're getting").


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by rdos »

Brendan wrote:For MSI, you tell the device what value to write and which address to write it. The device itself doesn't know or care what the value is or what the address is (partly because PCI and MSI are platform independent and not intended for 80x86 alone). Therefore, it's impossible for any PCI device to require contiguous interrupt numbers and impossible for the device to know if its interrupts are contiguous or not.
Well, that is wrong. Part of the MSI specification is number of requested (continous) ints. A device (for example AHCI) might request one int per port, and then the OS would allocate a number of continous ints with the correct alignment, and write this to PCI. If for instance, the OS allocates 8 ints, it must align them on 8-int boundaries. The device will then use the three lowest bits of int number to signal which int it wants to raise. If 16 continous ints are allocated, the device will use the 4 lowest bits to signal the exact int, and so on. This is portable and not only useful on x86.

And this is extremely useful for AHCI, as having one int per port, rather than device, reduces decoding in the int-handler, and also allows multiple ports to have pending ints on different cores.

So I would say that the ability to allocate 2, 4, 8, 16 and perhaps 32 ints with the correct alignment for MSI is an essential feature.
Brendan wrote:The interrupt priority scheme used by the APICs implies that interrupts should be allocated dynamically. There's 256 interrupt vectors, the first 32 is reserved/used by exceptions. That leaves a pool of 224 interrupts, arranged as 15 priority groups with 16 interrupts per priority.

First, you want to allocate a few very low priority interrupts for spurious IRQs - one for the APIC's spurious IRQ and 2 for the PIC chip's spurious IRQs (note: you can't mask/prevent the PIC's spurious IRQs even when all PIC IRQs are masked). You'll probably also want a few interrupts for the local APIC (local APIC timer, performance monitoring, etc); plus maybe one for the kernel API (e.g. "int 0x80"), plus some for IPIs. All of these would use fixed/"hard-coded" interrupt numbers scattered all over the place (various interrupt priorities). After this, any interrupt number that is left over is free for dynamic allocation.
You have a point here. Using dynamic allocation would also solve the issue with priorities. IOW, requested priority should be a parameter when allocating interrupt numbers.
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by rdos »

Brendan wrote:I guess this means you need a function to dynamically allocate a group of interrupt vectors at a certain priority; but the function can find a group at a different priority (or "closest available priority"), and can refuse and only allocate a smaller group of interrupt vectors (e.g. "You asked for 16 contiguous interrupt vectors, but I could only find 4 contiguous interrupt vectors, so that's all you're getting").
Again, using AHCI as an example, it is not useful to give the device 4 vectors if it wants 16. Either it needs one per port (often 16), or the OS should revert to one (one int per device). The AHCI spec does deal with lesser ints, but I don't find this worth the trouble, as some chips might be broken in this respect. What this means is that if the OS allocates 16 ints on behalf of a device, it would either get 16, or the function should fail. If it fails for the AHCI device, the device would then try to allocate a single int instead, and if this also fail, fail installation (or use timers).

Additionally, if the AHCI-device supports 16 ports, it would typically ask for 16 ints (in order to use one int per port). However, in most computers, only 1 or 2 ports are actually used for something. What I do in that case is that I free the ints that are related to unused ports so other devices can allocate these.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by Brendan »

Hi,
rdos wrote:
Brendan wrote:I guess this means you need a function to dynamically allocate a group of interrupt vectors at a certain priority; but the function can find a group at a different priority (or "closest available priority"), and can refuse and only allocate a smaller group of interrupt vectors (e.g. "You asked for 16 contiguous interrupt vectors, but I could only find 4 contiguous interrupt vectors, so that's all you're getting").
Again, using AHCI as an example, it is not useful to give the device 4 vectors if it wants 16. Either it needs one per port (often 16), or the OS should revert to one (one int per device). The AHCI spec does deal with lesser ints, but I don't find this worth the trouble, as some chips might be broken in this respect. What this means is that if the OS allocates 16 ints on behalf of a device, it would either get 16, or the function should fail. If it fails for the AHCI device, the device would then try to allocate a single int instead, and if this also fail, fail installation (or use timers).
I'd assume it'd work fine. If the device wants 16 IRQs and you give it 4, then the device driver would need to check each part that might be sharing that interrupt (where checking 4 possible sources of that interrupt is going to be faster than only having one interrupt and checking all 16 possible sources).

If you're worried about broken/faulty hardware, maybe you could provide an alternative "alloc_range_of_interrupts_or_nothing()" function that device drivers for broken/faulty devices can use instead.
rdos wrote:Additionally, if the AHCI-device supports 16 ports, it would typically ask for 16 ints (in order to use one int per port). However, in most computers, only 1 or 2 ports are actually used for something. What I do in that case is that I free the ints that are related to unused ports so other devices can allocate these.
That seems very dodgy for something things like SATA controllers that naturally support hot-plug. Just because a channel isn't in use when the OS boots doesn't mean it won't be needed later.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by rdos »

Brendan wrote:I'd assume it'd work fine. If the device wants 16 IRQs and you give it 4, then the device driver would need to check each part that might be sharing that interrupt (where checking 4 possible sources of that interrupt is going to be faster than only having one interrupt and checking all 16 possible sources).

If you're worried about broken/faulty hardware, maybe you could provide an alternative "alloc_range_of_interrupts_or_nothing()" function that device drivers for broken/faulty devices can use instead.
I'm also worried about the device-driver itself not working. It is hard enough to verify that the one-or-all approach works on all AHCI implementations, and to add the complexity of half-baked allocations doesn't sound like a good idea. Not receiving 16 ints must be considered an exceptional case that basically never happens, so therefore there is no need to add code for it that might not even work properly.
Brendan wrote:That seems very dodgy for something things like SATA controllers that naturally support hot-plug. Just because a channel isn't in use when the OS boots doesn't mean it won't be needed later.
I should have used the term "implemented". There is a ports implemented (PI) register in AHCI. Often, AHCI chips support a lot of ports, but then the surrounding chips only implement a few of them. This is often the case in portables.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by Brendan »

Hi,
rdos wrote:I'm also worried about the device-driver itself not working. It is hard enough to verify that the one-or-all approach works on all AHCI implementations, and to add the complexity of half-baked allocations doesn't sound like a good idea. Not receiving 16 ints must be considered an exceptional case that basically never happens, so therefore there is no need to add code for it that might not even work properly.
I'd also be worried about running out of interrupt numbers on larger machines (servers). When there's only about 210 interrupts left over for devices to use; if each device can consume as many as 16 (or 32) interrupts then you could run out of interrupts very fast, especially when you take alignment restrictions and interrupt priorities into account.

If a device driver developer can test that "max. number of interrupts allocated" works, then it should be easy for them to test if half of that works (or a quarter, or...).
rdos wrote:
Brendan wrote:That seems very dodgy for something things like SATA controllers that naturally support hot-plug. Just because a channel isn't in use when the OS boots doesn't mean it won't be needed later.
I should have used the term "implemented". There is a ports implemented (PI) register in AHCI. Often, AHCI chips support a lot of ports, but then the surrounding chips only implement a few of them. This is often the case in portables.
Ah - that makes more sense. I'd be tempted to do a warning in that case ("WARNING: Scumbag device trying to hog interrupts for no reason due to lazy manufacturer")... :)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
rdos
Member
Member
Posts: 3276
Joined: Wed Oct 01, 2008 1:55 pm

Re: How to map MSIs and IOAPIC ISRs to real int numbers

Post by rdos »

Brendan wrote:I'd also be worried about running out of interrupt numbers on larger machines (servers). When there's only about 210 interrupts left over for devices to use; if each device can consume as many as 16 (or 32) interrupts then you could run out of interrupts very fast, especially when you take alignment restrictions and interrupt priorities into account.

If a device driver developer can test that "max. number of interrupts allocated" works, then it should be easy for them to test if half of that works (or a quarter, or...).
Yes, that could be a problem. Some complex scheme would be needed in order to address this. For instance, devices should first be required to inform the OS about how many ints it would need. Then the OS could check if it can satisify these needs, and if not, it would do cut-downs on number of ints for some devices, possibly in a negotiation process. However, for a non-server OS, I'd not implement this as it would be overkill. 200 ints would be enough for any environment running RDOS. :D
Post Reply