I/O-APIC init

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

I/O-APIC init

Post by FlashBurn »

I´m so far that I can startup all cpus and I can parse MPS or ACPI tables, but when it comes to the I/O-APIC i have problems.

The first thing is, if I want to use lowest priority mode for sending the interrupts, what have I to write into the destination field of the int in the io-apic? And how have I to init the local apics to get the ints on the cpu which has the lowest priority? I mean if I only want to change the value in the task priority register and what do I have to write into the arbitration and processor priority register?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: I/O-APIC init

Post by Brendan »

Hi,
FlashBurn wrote:The first thing is, if I want to use lowest priority mode for sending the interrupts, what have I to write into the destination field of the int in the io-apic?
The default state of the I/O APIC is "everything disabled". If you're using the I/O APIC at all then you need to configure each I/O APIC input (IRQ source) that you use (and make sure that all I/O APIC inputs you don't use are "disabled"), including telling the I/O APIC if the signal is level triggered or edge trigged, active high or active low, which interrupt vector (IDT entry) corresponds to the I/O APIC input, how to deliver the interrupt to the CPU, etc.

The information you need to setup each I/O APIC input is provided by the MP specification tables and/or the ACPI tables. For some old computers you may also need to mess with the IMCR to tell the chipset that device's IRQ lines should be routed to the I/O APIC/s and not the PICs (see Intel's MP specification for details). For newer computers (anything new enough to support ACPI) you won't need to mess with the IMCR.
FlashBurn wrote:And how have I to init the local apics to get the ints on the cpu which has the lowest priority? I mean if I only want to change the value in the task priority register and what do I have to write into the arbitration and processor priority register?
The Arbitration Priority Register (which doesn't exist on Pentium 4 and later CPUs) and the Processor Priority Register are read-only - you don't need to set them to anything (the CPU/chipset handles their values).

Mostly, to get lowest priority delivery mode to work right you need to:
  • use suitable priorities for all interrupts (where an interrupt's priority is determined by which interrupt vector it uses)
  • set the Task Priority Register (in each CPU) according to how important the code running on the CPU is
This means that (for e.g.) when the CPU is executing a high priority process/thread you might set the TPR to "CPU at medium priority", when the CPU is executing a low priority process/thread you might set the TPR to "CPU at low priority", and when the CPU has been put into a sleep state or power saving state you might want to set the TPR to "CPU at high priority". You might also consider boosting the TPR value to "CPU at fairly high priority" while the CPU is holding one or more re-entrancy locks in kernel code, to try to reduce lock contention (but don't forget that changing the TPR value adds a little overhead).

The tricky part with the TPR is that if an IRQ occurs and the priority of all target CPUs is too high, then the IRQ won't be delivered until a CPU drops to a "low enough" priority (which may never happen if all CPUs are in a sleep state, for e.g.). Usually this isn't what you want - you might want the IRQ to be delivered to the lowest priority CPU, even if all CPUs are set to "high priority". Because of this I only use the lowest 5 bits of the TPR - e.g. values 0x00 (lowest priority) to 0x1F (highest priority I use). The first 32 interrupt vectors are reserved for exceptions, so the priority of any IRQ or IPI has to be higher than the highest TPR value I use, and therefore all IRQs and IPIs are always delivered as soon as possible.

There are different ways to handle the CPU priorities though. For example, for a real-time OS you might want to prevent/postpone low priority IRQs when the CPU is running a very high priority process/thread.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: I/O-APIC init

Post by FlashBurn »

Thanks, you also answered questions I forgot to ask ;) But 1 question is still unanswered. Whai have I to write into the destination field of the int in the io-apic? Or is it equal if I use lowest priority?

At the moment I use ACPI and if that isn´t available I use the MP tables (I also handle the IMCR bit). The only thing is that in I don´t know from where to get the PCI routing (which PCI int goes to which pin of the I/O-Apic or PIC).

What happens if all cpus have the same priority?

Edit::

It would be also interesting if I should use physical or logical destination?

Edit2::

I think I got it. I set the LDR (logical destination register) of the apic to the apic ID of the cpu and the DFR (destination format register) is set to 0xff000000 (default). In the IO-Apic I have to use logical destination mode and set the destination field to all 1´s.

I haven´t tested if it works, if I change the TPR of the cpus, but I get an interrupt from the IO-Apic.

Edit3::

So I hope I finished this code. Because I also have only 32 priorities (0 - 0x1f) I also set the TPR to the value of the priority of the actual thread and if a cpu is idle I set the TPR to 32, so that all interrupts go to the lowest priority cpu, but one that is not idle. Also if all cpus are idle the interrupt will be delivered, because my irqs start at 0xD0.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: I/O-APIC init

Post by Brendan »

Hi,
FlashBurn wrote:Thanks, you also answered questions I forgot to ask ;) But 1 question is still unanswered. Whai have I to write into the destination field of the int in the io-apic? Or is it equal if I use lowest priority?
  • Destination Field = the logical destination (bits that are set or clear depend on what you want and how the local APIC's DFR and LDR are setup)
  • Interrupt Mask = must be clear to enable the interrupt
  • Trigger Mode = level or edge triggered, depends on what the MP specification tables or ACPI tables say for the interrupt
  • Interrupt Input Pin Polarity = active high or active low, depends on what the MP specification tables or ACPI tables say for the interrupt
  • Destination Mode = logical destination mode (physical destination mode can't be used for "send to lowest priority CPU")
  • Delivery Mode = 001b ("Lowest Priority")
  • Interrupt Vector = which IDT entry the IRQ uses, also determines IRQ priority (e.g. "priority = vector & 0xF0")
For determining the Interrupt Vector, I'd recommend dynamically allocating IDT entries with something like "vector = alloc_IDT_entry_for_IRQ(desired_priority)". The kernel/device manager/device driver would ask for an appropriate IRQ priority for the device, and the kernel would find the closest IDT entry that can be used (which may not be the desired priority - if there's 17 devices/IRQs that want the same IRQ priority then at least one of them will miss out and end up with a slightly higher or lower priority).
FlashBurn wrote:The only thing is that in I don´t know from where to get the PCI routing (which PCI int goes to which pin of the I/O-Apic or PIC).
For Intel's Multi-processor specification, see section "Section 4.3.4 - I/O Interrupt Assignment Entries" and "Appendix D.3 - I/O Interrupt Assignment Entries for PCI Devices". For PCI devices, each "I/O Interrupt Assignment Entry" will tell you which PCI interrupt line (A, B, C, D) for which device (bus:device:function) is connected to which I/O APIC input, and if it is edge triggered or level triggered, and if it's active high or active low. There will be an "I/O Interrupt Assignment Entry" for every interrupt for every PCI device (plus probably 16 more entries for the IRQs that come from the ISA bus).

For ACPI (version 4.0), see "Section 5.2.12.5 - Interrupt Source Override Structure" (in the "Multiple APIC Description Table" table). This gives you the same information that the MP specification gives you; except that it assumes that the 16 IRQs from the ISA bus are mapped directly to the first 16 I/O APIC inputs (e.g. I/O APIC input #0 is IRQ0/PIT, I/O APIC input #1 is IRQ1/keyboard, etc) and will only tell you about interrupts that aren't the same as this assumed mapping. For example, if IRQ0 (the PIT) is connected to I/O APIC input #0 then ACPI won't mention it, but if IRQ0 is connected to something else then there will be an "Interrupt Source Override Structure" for it.

It's probably also a good idea to determine which PCI devices are capable of using MSI ("Message Signalled Interrupts") and setting them up those devices first (to avoid PCI interrupt sharing, etc). In this case, for any device that is configured for MSI you'd ignore the details supplied by ACPI/MP spec.
FlashBurn wrote:What happens if all cpus have the same priority?
In that case, only one CPU receives the interrupt (but you can't easily predict which CPU it will be).
FlashBurn wrote:So I hope I finished this code. Because I also have only 32 priorities (0 - 0x1f) I also set the TPR to the value of the priority of the actual thread and if a cpu is idle I set the TPR to 32, so that all interrupts go to the lowest priority cpu, but one that is not idle. Also if all cpus are idle the interrupt will be delivered, because my irqs start at 0xD0.
There's different "power saving" states. In general more aggressive (deeper) power saving states use less power, but it takes longer for the CPU to enter and leave deeper power saving states (higher latency). HLT (and MONITOR/MWAIT) is the lightest sleep state with the lowest latency, and in this case it's probably better to use "TPR = 0x00". Using higher values for TPR (e.g. "TPR = 0x1f") makes more sense for deeper sleep states with much higher latency.

Of course power management is a very complex topic - decisions probably need to take lots of things into account; including the temperature of the CPUs, if the computer is running from mains power or battery (including laptops and servers with UPS), the user's preferences (e.g. if the user wants max. performance or energy savings or quiet operation/low CPU fan speeds), etc.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: I/O-APIC init

Post by FlashBurn »

This whole IO-APIC thing is a mess :(

I will only support 1 IO-APIC, because more than 1 cause only trouble.

So if I rely on the MPS, I get even more problems if my code runs on a pc with PCI Express, because there every slot is a unique bus and my code only recognizes 1 PCI bus.

So the PCI bus has 4 ints, A till D, and an actual IO-APIC has 24 input pins. So does this mean that 4 input pins are only for MSI capable PCI devices?
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: I/O-APIC init

Post by Owen »

MSI capable devices send interrupts directly to the local APIC (By performing a bus master write).

The four interrupt lines are for LSI (Level Signalled Interrupt) devices, which don't support MSI.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: I/O-APIC init

Post by FlashBurn »

Owen wrote:MSI capable devices send interrupts directly to the local APIC (By performing a bus master write).

The four interrupt lines are for LSI (Level Signalled Interrupt) devices, which don't support MSI.
You didn´t understand me. I mean an IO-APIC has 24 input pins, minus 16 ISA ints, minus 4 PCI ints (A till D) makes 4 input pins left. And these 4 inputs lines are for MSIs, aren´t they?
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: I/O-APIC init

Post by Owen »

No. MSI doesn't touch the IO APIC or interrupt lines: It's a signal (By bus mastering) direct to one of the CPU's local APICs. On my machine, six of the 8 lines above the ISA set are used:

Code: Select all

 16:       9007     501440   IO-APIC-fasteoi   nvidia               
 18:          0          3   IO-APIC-fasteoi   ohci1394             
 20:        263      46125   IO-APIC-fasteoi   ohci_hcd:usb2        
 21:          0          3   IO-APIC-fasteoi   ehci_hcd:usb1        
 22:      11349    2540568   IO-APIC-fasteoi   sata_nv, oss_hdaudio0
 23:       5728     407718   IO-APIC-fasteoi   sata_nv
(Particularly the nvidia there is surprising: I'd expect the GPU, as a PCI-E device, to use MSI; i'm also surprised that on chipset devices are sharing. A possibility is that OSS doesn't support MSI, but I think that unlikely. I'll have to investigate on my OpenSolaris box, when I find it's interrupt list)

My NIC is using MSI, however:

Code: Select all

 25:      11031     774633   PCI-MSI-edge      eth0
Note that it does NOT pass through the IO APIC.

OT: Does anyone know if the first numbers in the /proc/interrupts list is the vector Linux assigns? Or is it just some form of "Logical interrupt identifier"? I notice that some internal ones have no listed number, for example:

Code: Select all

LOC:    1187122    1269938   Local timer interrupts
TLB:      10733      10171   TLB shootdowns
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: I/O-APIC init

Post by Brendan »

Hi,
FlashBurn wrote:This whole IO-APIC thing is a mess :(

I will only support 1 IO-APIC, because more than 1 cause only trouble.
Why is it a mess?

If there's a more than one I/O APIC, then it's very likely that there's devices connected to the additional I/O APICs, which means your OS will be unable to support those devices.

For an example, one of the computers I have here has a pair of I/O APICs with 16 inputs per I/O APIC. ISA IRQs are connected to the first I/O APIC, and PCI IRQs are connected to the second I/O APIC. If you don't support the second I/O APIC then the SCSI controller, sound, ethernet, etc all won't work, and only legacy ISA devices (floppy, parallel and serial ports, and PS/2 keyboard/mouse) will work.

FlashBurn wrote:So if I rely on the MPS, I get even more problems if my code runs on a pc with PCI Express, because there every slot is a unique bus and my code only recognizes 1 PCI bus.
Your code needs to support more than one PCI bus. Otherwise it won't work properly on most (all?) computers made in the last 10 years.

Your code should also support computers with more than one PCI host controller...
FlashBurn wrote:So the PCI bus has 4 ints, A till D, and an actual IO-APIC has 24 input pins. So does this mean that 4 input pins are only for MSI capable PCI devices?
Each PCI host controller (and all PCI buses connected via. that PCI host controller - through PCI to PCI bridges, etc) has 4 PCI interrupt lines. Computers with multiple PCI host controllers can share the same interrupt lines or have separate interrupt lines (e.g. A to D on first controller, E to H on second controller). In some cases, device built into the chipset may use additional interrupt lines that aren't "standard PCI".

An actual I/O APIC has "n" inputs. For most chipsets there's 24 inputs but nothing says there can't be 32 separate I/O APICs with 1 input each, or one I/O APIC with 128 inputs, or anything else. In theory you could even have mixed sized I/O APICs - for e.g. the first I/O APIC might have 16 inputs and the second I/O APIC might have 8 inputs, and the third I/O APIC might have 11 inputs.

To cope with this the specifications use "global input numbering". For example, if there's 2 I/O APICs with 16 inputs each, then "global input number 27" would be the input 11 on the second I/O APIC.
If there's three I/O APICs with 16, 8 and 4 inputs respectively; then "global input number 27" would be input 3 on the third I/O APIC.

It's fairly common for the first 16 I/O APIC inputs to be used for legacy ISA IRQs, and for I/O APIC inputs above 16 to be used for PCI IRQs. However, there's no rules that say this is how it *must* be - you have to assume that any interrupt (from any device) can be routed to any I/O APIC input.

For MSI, the device sends it's IRQ directly - it doesn't use a normal PCI interrupt line and doesn't use an I/O APIC input. You tell the device what to send to the local APIC/s (destination, delivery mode, interrupt vector, etc); and it's like the IRQ bypasses the I/O APIC completely.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: I/O-APIC init

Post by FlashBurn »

So maybe I have to change how my loader and kernel works, but I hope that I get a better idea how to solve problems with the init of the IO-APIC.

How are you initing the IO-APIC and when? (because I´m doing it in my loader) And how do you solve the problem of a variable number of interrupt handlers? I have hardcoded 24 interrupt handlers for the IO-APIC in my kernel, but how could I make this variable?

For now I will create my own data structure where stands which ISA irq goes to which global interrupt and which PCI device goes into which global interrupt.
Cognition
Member
Member
Posts: 191
Joined: Tue Apr 15, 2008 6:37 pm
Location: Gotham, Batmanistan

Re: I/O-APIC init

Post by Cognition »

A tad off topic here, but I'm curious about the _PRT object that sits in the ACPI namespace under a root bridge. Is this a legacy interface at this point or one that exists specifically for reconfiguring the interrupt resources being mapped to a PCI device?
Reserved for OEM use.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: I/O-APIC init

Post by Brendan »

Hi,
FlashBurn wrote:How are you initing the IO-APIC and when? (because I´m doing it in my loader) And how do you solve the problem of a variable number of interrupt handlers? I have hardcoded 24 interrupt handlers for the IO-APIC in my kernel, but how could I make this variable?
I'd do basic I/O APIC init when the kernel is being initialised. This is mostly just setting all I/O APIC inputs to "masked" though.

After that, I'd setup each IRQ (allocate and install the IDT entry and configure the I/O APIC input) when the corresponding device driver is started (and mask the IRQ if the corresponding device driver is terminated). Configuring the I/O APIC inputs once during boot would make it harder to support "hot-plug PCI" later.
FlashBurn wrote:For now I will create my own data structure where stands which ISA irq goes to which global interrupt and which PCI device goes into which global interrupt.
Sounds like a good idea, just don't forget that the same IRQ can be shared by multiple devices (a simple "device_ID = interruptTable[IO_APIC_input_number]" array won't work) and the same device can have multiple interrupts (a simple "IO_APIC_input_number = interruptTable[device_ID]" array won't work either). ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: I/O-APIC init

Post by Owen »

FlashBurn wrote:So maybe I have to change how my loader and kernel works, but I hope that I get a better idea how to solve problems with the init of the IO-APIC.

How are you initing the IO-APIC and when? (because I´m doing it in my loader) And how do you solve the problem of a variable number of interrupt handlers? I have hardcoded 24 interrupt handlers for the IO-APIC in my kernel, but how could I make this variable?

For now I will create my own data structure where stands which ISA irq goes to which global interrupt and which PCI device goes into which global interrupt.
Hardcode 256 interrupt handlers. Yes, it's wasteful, but your root interrupt handlers should be small anyway.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: I/O-APIC init

Post by FlashBurn »

Owen wrote:Hardcode 256 interrupt handlers. Yes, it's wasteful, but your root interrupt handlers should be small anyway.
I also thought about this way, but I try to keep the memory use of my kernel as small as possible I can do.

At the moment I will try to have 1 general irq handler and then I copy the code (and modify the adresses of the used vars) for every irq.
Brendan wrote:Sounds like a good idea, just don't forget that the same IRQ can be shared by multiple devices (a simple "device_ID = interruptTable[IO_APIC_input_number]" array won't work) and the same device can have multiple interrupts (a simple "IO_APIC_input_number = interruptTable[device_ID]" array won't work either). ;)
Oh, then it could happen, that there are more than 1 entries in the MPS for 1 device?!

I just took a look at the PCI specs and in every PCI cfg space stands to which input pin (or was it the PCI int #) the irq goes, so I only need to know on which PCI bus the device is. Will this work? So I need only to save to which IO-APIC (and input pin) every PCI int (for every bus) goes. Something like this:

Code: Select all

struct ioapicPCIbus_t {
 uint8t busID;
 uint8t ioapicID;
 uint8t inputPin[4];
};
Or could there be more than 4 PCI ints on a bus?

Edit::

@Brendan

Could you tell me on which addresses are your 2 IO-APICs on the pc that has 2? I´m asking, because I need to know if I could assume that every IO-APIC is at an 4KB address and that 1 IO-APIC "uses" a whole 4KB page.
Last edited by FlashBurn on Tue Nov 24, 2009 5:22 pm, edited 1 time in total.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: I/O-APIC init

Post by Owen »

The bus is limited to 4 int pins. Things can get confusing though because the pins "cycle" between devices to minimize sharing
Post Reply