Page 1 of 1

MP Spec tables

Posted: Sun Jan 13, 2008 9:36 am
by snooky
Hi everyone,

I'm trying to add SMP support to an x86_64 OS and I'm new to the SMP field. For the moment I'm able to find and parse the MPS tables. But there are some things, that confuse me. I hope someone can help me to bring some clearness.

(1) Qemu's BIOS seems to provide broken tables. It isn't reporting a PCI bus, but of course there is one with devices with interrupts. Further it reports, that we are in Virtual Wire Mode (no IMCR present), but in there are no interrupt assignments listed with an ExtInt to the IOAPIC. This must be wrong for Virtual Wire. So, how do I cope with such tables?

(2) Now on a "real" system. I get tables that make sense. There are two PCI busses listed and I can get the device numbers off the tables for the interrupts. No problem so far. But: How do I find that device on the PCI bus. Is the bus ID the BIOS provides in the MPS tables the bus number that I use for addressing on the PCI (in this single case it would work, is this assumption generally true)?

(3) I search the MP Floating Pointer Structe in the EBDA, then 639K-640K and then in the BIOS ROM. Linux seems to search it in 0K-1K in the first place and in the EBDA only if all others fail and then report that tables in EBDA are unsafe. Seems to be not compilant to the MP Spec, but okay, we're in a unperfect world. Is the Linux way of things the state of the art or is it ... well .. Linux.?

(4) Am I correct, that I have to program LocalAPICs/IOAPIC(s) to what the MPS tables tell me? My approach at the moment is:
* disable the 8259A's
* program the IOAPIC(s), all interrupts masked
* program the local APIC
* set IMCR, if I need to
* unmask the interrupts in the IOAPIC, that I can handle for now (e.g. no PCI devices...)

(5) Which vectors I have to assign to the IRQs/pins? Of course there are design aspects that should me tell (network cards need to have a high priority or so). But is there some general recommendation of what to or, more important, what not to do (except mapping all the IRQs to consecutive vector numbers).

Thank you in advance!

Re: MP Spec tables

Posted: Sun Jan 13, 2008 11:04 am
by Brendan
Hi,
snooky wrote:(1) Qemu's BIOS seems to provide broken tables. It isn't reporting a PCI bus, but of course there is one with devices with interrupts. Further it reports, that we are in Virtual Wire Mode (no IMCR present), but in there are no interrupt assignments listed with an ExtInt to the IOAPIC. This must be wrong for Virtual Wire. So, how do I cope with such tables?
Are you sure that Qemu doesn't have a "local APIC interrupt assignment entry" that says that the PIC's ExtINT is connected to a local APIC input?

I honestly don't know (I don't use Qemu) - if it does then that's OK, but if it doesn't Qemu is broken. Even if Qemu is broken, you shouldn't really need ExtINT anyway (I assume you'd use the I/O APIC/s and stop using PICs for SMP).
snooky wrote:(2) Now on a "real" system. I get tables that make sense. There are two PCI busses listed and I can get the device numbers off the tables for the interrupts. No problem so far. But: How do I find that device on the PCI bus. Is the bus ID the BIOS provides in the MPS tables the bus number that I use for addressing on the PCI (in this single case it would work, is this assumption generally true)?
For "I/O Interrupt Assignment Entries" for PCI devices, the "source bus ID" field will tell you which PCI bus number (that can be used for PCI configuration space addressing). The "source bus IRQ" field has a special encoding (see section D.3) that tells you the device number on that PCI bus and which PCI IRQ line (A, B, C or D).
snooky wrote:(3) I search the MP Floating Pointer Structe in the EBDA, then 639K-640K and then in the BIOS ROM. Linux seems to search it in 0K-1K in the first place and in the EBDA only if all others fail and then report that tables in EBDA are unsafe. Seems to be not compilant to the MP Spec, but okay, we're in a unperfect world. Is the Linux way of things the state of the art or is it ... well .. Linux.?
From the Linux source code:

Code: Select all

void __init find_smp_config (void)
{
	unsigned int address;

	/*
	 * FIXME: Linux assumes you have 640K of base ram..
	 * this continues the error...
	 *
	 * 1) Scan the bottom 1K for a signature
	 * 2) Scan the top 1K of base RAM
	 * 3) Scan the 64K of bios
	 */
	if (smp_scan_config(0x0,0x400) ||
		smp_scan_config(639*0x400,0x400) ||
			smp_scan_config(0xF0000,0x10000))
		return;
	/*
	 * If it is an SMP machine we should know now, unless the
	 * configuration is in an EISA/MCA bus machine with an
	 * extended bios data area.
	 *
	 * there is a real-mode segmented pointer pointing to the
	 * 4K EBDA area at 0x40E, calculate and scan it here.
	 *
	 * NOTE! There are Linux loaders that will corrupt the EBDA
	 * area, and as such this kind of SMP config may be less
	 * trustworthy, simply because the SMP table may have been
	 * stomped on during early boot. These loaders are buggy and
	 * should be fixed.
	 *
	 * MP1.4 SPEC states to only scan first 1K of 4K EBDA.
	 */

	address = get_bios_ebda();
	if (address)
		smp_scan_config(address, 0x400);
}
I think the comments alone rule out "state of the art"... ;)

I have no idea why Linux checks from 0x00000000 to 0x00000400 - this space is the real mode IVT and isn't mentioned by Intel's MP specification. I'd guess that once upon a time someone thought it might be a good idea, and nobody has wondered why it's there since (or if they did wonder they didn't want to break anything and left it in "just in case").
snooky wrote:(4) Am I correct, that I have to program LocalAPICs/IOAPIC(s) to what the MPS tables tell me?
Technically you should program the local APIC's inputs and the I/O APIC inputs to what the MPS tables tell you. In theory the BIOS has probably already programmed the local APIC's inputs.

Also, there's slight differences for ACPI machines where the chipset's SMI/SCI is routed to an I/O APIC input (where this interrupt won't be mentioned by MPS tables because it's only used if/when the OS enables "ACPI mode"). There can also be other minor things not mentioned by the MPS tables that you can ignore (e.g. one of the chipset docs I've read mentioned a "TCO interrupt").
snooky wrote:(5) Which vectors I have to assign to the IRQs/pins? Of course there are design aspects that should me tell (network cards need to have a high priority or so). But is there some general recommendation of what to or, more important, what not to do (except mapping all the IRQs to consecutive vector numbers).
The only good advice I can give is to spread the interrupt vectors out as much as you can. Bunching them all together and (for e.g.) using vectors 0x80 to 0xA0 would only give you 2 effective IRQ priorities; while spreading them out and using vectors 0x20 to 0xFE gives you 14 effective IRQ priorities.

Other than that it depends on what the hardware is and how the OS uses it. For example, for my previous kernel I made the RTC/CMOS IRQ the highest priority IRQ because I used it to keep track of the system time.

IMHO, except for special devices (e.g. the scheduler's timer) an OS should dynamically assign the interrupt vectors/priorities during boot. For example, as the device manager finds the devices or as the device drivers are started, you'd decide which interrupt priority the device/s should use (depending on device type/s perhaps) and then search for a free IDT entry that is closest to the interrupt priority you want.

Also IMHO, it'd be even better if the IRQ priorities could be adjusted while the OS is running (e.g. either automatically adjusted by a device manager or performance optimizing tool or manually adjusted by a system administrator), especially if the IRQ priorities could be adjusted without the device driver itself knowing or caring that it's interrupt vector/priority was changed.

Lastly, don't forget about (eventually) supporting MSI (or "Message Signaled Interrupts"), where the PCI device sends the interrupt itself (including delivery mode, interrupt vector, etc) and bypasses the I/O APIC inputs completely. This means that if the I/O APIC only has 20 inputs, you could have 20 PCI devices using MSI and still need a total of 40 IDT entries for IRQs.


Cheers,

Brendan

Re: MP Spec tables

Posted: Sun Jan 13, 2008 1:06 pm
by snooky
Brendan wrote:
snooky wrote:(1) Qemu's BIOS seems to provide broken tables. It isn't reporting a PCI bus, but of course there is one with devices with interrupts. Further it reports, that we are in Virtual Wire Mode (no IMCR present), but in there are no interrupt assignments listed with an ExtInt to the IOAPIC. This must be wrong for Virtual Wire. So, how do I cope with such tables?
Are you sure that Qemu doesn't have a "local APIC interrupt assignment entry" that says that the PIC's ExtINT is connected to a local APIC input?
There aren't "local APIC interrupt assignment" entries.

From the rombios32.c for the BochsBIOS, that Qemu uses:

Code: Select all

    q = mp_config_table;
    putstr(&q, "PCMP"); /* "PCMP signature */
    putle16(&q, 0); /* table length (patched later) */
    putb(&q, 4); /* spec rev */
    putb(&q, 0); /* checksum (patched later) */
#ifdef BX_QEMU
    putstr(&q, "QEMUCPU "); /* OEM id */
#else
    putstr(&q, "BOCHSCPU");
#endif
    putstr(&q, "0.1         "); /* vendor id */
    putle32(&q, 0); /* OEM table ptr */
    putle16(&q, 0); /* OEM table size */
    putle16(&q, smp_cpus + 18); /* entry count */
    putle32(&q, 0xfee00000); /* local APIC addr */
    putle16(&q, 0); /* ext table length */
    putb(&q, 0); /* ext table checksum */
    putb(&q, 0); /* reserved */
The "smp_cpus + 18" is because there are entries for each cpu, 1 bus entry for ISA, 1 IOAPIC entry and 16 I/O interrupt entries for the 16 ISA interrupts, which are 1:1 mapped (ISA-IRQ 0 -> pin 0, ISA-IRQ 1 -> pin 1, and so on).

By the way... Is something wrong with Qemu, or why is Bochs mostly used (at least by members on this forum, as it seems to me).
Brendan wrote: Even if Qemu is broken, you shouldn't really need ExtINT anyway (I assume you'd use the I/O APIC/s and stop using PICs for SMP).
Yes, I want to switch to Symmetric Mode. But - can I trust the MPS tables? Well, I have to (I guess), unless I want to parse the ACPI tables (which could be broken, too).

So, there is the problem left, that the BIOS of Qemu/Bochs doesn't report a PCI bus with interrupts coming from there.

So long
snooky

Re: MP Spec tables

Posted: Mon Jan 14, 2008 1:11 am
by Brendan
Hi,
snooky wrote:The "smp_cpus + 18" is because there are entries for each cpu, 1 bus entry for ISA, 1 IOAPIC entry and 16 I/O interrupt entries for the 16 ISA interrupts, which are 1:1 mapped (ISA-IRQ 0 -> pin 0, ISA-IRQ 1 -> pin 1, and so on).
That sounds broken to me - no ExtINT entry means it's impossible to use "mixed mode".
snooky wrote:By the way... Is something wrong with Qemu, or why is Bochs mostly used (at least by members on this forum, as it seems to me).
I'm not sure about other people, but I use Bochs because the debugger is very useful (more powerful than Qemu's debugger) and because Bochs can be configured to emulate a wide variety of CPUs.

Bochs is also slower, which makes it easier to find parts of my code that need to be improved (which is an advantage I don't get with real hardware or Qemu, that can be too fast to notice bad code)... ;)
snooky wrote:
Brendan wrote:Even if Qemu is broken, you shouldn't really need ExtINT anyway (I assume you'd use the I/O APIC/s and stop using PICs for SMP).
Yes, I want to switch to Symmetric Mode. But - can I trust the MPS tables? Well, I have to (I guess), unless I want to parse the ACPI tables (which could be broken, too).
You can mostly trust the MPS tables, but don't expect IRQ 2 to be connected to anything even though it's listed...

I'm not sure how broken the ACPI tables are in Qemu or Bochs.
snooky wrote:So, there is the problem left, that the BIOS of Qemu/Bochs doesn't report a PCI bus with interrupts coming from there.
It's unfortunate, but perhaps:

Code: Select all

    if( ( strncmp( MPS_header->OEMID, "BOCHSCPU" ,8) == 0) ||
      strncmp( MPS_header->OEMID, "QEMUCPU " ,8) == 0) ) {
        MPS_handle_messed_up_emulator_buses();
    } else {
        MPS_handle_sane_buses();
    }
Cheers,

Brendan

Re: MP Spec tables

Posted: Mon Jan 14, 2008 11:48 am
by snooky
Hi,
Brendan wrote:
snooky wrote:The "smp_cpus + 18" is because there are entries for each cpu, 1 bus entry for ISA, 1 IOAPIC entry and 16 I/O interrupt entries for the 16 ISA interrupts, which are 1:1 mapped (ISA-IRQ 0 -> pin 0, ISA-IRQ 1 -> pin 1, and so on).
That sounds broken to me - no ExtINT entry means it's impossible to use "mixed mode".
I don't care, because I don't want to use "mixed mode" anyway. Frankly speaking, I don't see any pros for such a configuration.

Somewhere I found a post on some forum (sorry, I haven't a URL or something, because I can't find it anymore). It was saying something like: "The Qemu/Bochs BIOS is correct, they are simply doing hardware 'wrong'. They put the PCI interrupts under any circumstance into a routing network that makes ISA interrupts out of them."

If so, the tables aren't wrong for the I/O Interrupt Assignment entries. But it is strange that there aren't Local Interrupt Assignment entries.

In general: Is it "meaningful" (defined by the MP Spec), that there is no Local Interrupt Assignment? _I_ found nothing there. Maybe I read over it?


I have (for now?!) a last question about "APIC configuring". If I understand the whole thing, the Local APICs have a LINT0 and LINT1 pin and they have a connection to s bus for interrupt controller communication (ICC). Am I right, that the LINT0/LINT1 pins are unused when in Symmetric Mode, because all interrupt signaling comes over the ICC. Okay, at least I need one of LINT0/LINT1 for NMI.

Many thanks for your time!