Page 1 of 1

Bring up more processors on real hardware

Posted: Fri Feb 03, 2012 4:15 am
by TryHarder
Hey guys,
I have an issue that might require some of your help. I'm writing an OS and the development almost reached the milestone - long-mode, userspace, interrupts, system calls, preemptive scheduling, and multiprocessing. Last one I'd like to discuss:
CPUs are detected by parsing MP header according to Intel's MP spec, and they are booted via LAPICs. It works fine on QEMU and it boots up to 16 cpus without troubles. However, I've tried to run the same image at real hardware and it stuck at booting 5'th cpu. So far we know that:
1) QEMU emulates cpus as there are single slot for each (16 cpus - 16 chips. Each CPU is single-core).
2) My real hardware architecture is: 2 slots with QUADcores (Intel Xeon).
I think here comes a trouble.
So maybe you can suggest something that I'm probably missing before I dive into linux source and try to figure out what's wrong with my code.
Thanks

Re: Bring up more processors on real hardware

Posted: Fri Feb 03, 2012 4:35 am
by Combuster
Did you try using the right APIC ID (instead of counting from 0)?

Re: Bring up more processors on real hardware

Posted: Fri Feb 03, 2012 5:10 am
by TryHarder
Combuster wrote:Did you try using the right APIC ID (instead of counting from 0)?
I think so. lapicid taken from MPPROC entry type. I've dumped id's that my OS uses with ones reported by linux /proc/cpuinfo. My OS found 8 MPPROC entries with id's that corresponds to first 8 entries of /proc/cpuinfo. Last 8 entries of cpuinfo are irrelevant (I guess) because they comes from hyperthreading that I'm supporting yet.
Interesting observation: x86info reports different id's - but claims that:
WARNING: Detected SMP, but unable to access cpuid driver.
Used Uniprocessor CPU routines. Results inaccurate.

Re: Bring up more processors on real hardware

Posted: Fri Feb 03, 2012 5:36 pm
by Brendan
Hi,
TryHarder wrote:So far we know that:
1) QEMU emulates cpus as there are single slot for each (16 cpus - 16 chips. Each CPU is single-core).
2) My real hardware architecture is: 2 slots with QUADcores (Intel Xeon).
3) MP specification tables only list one CPU per core, and when hyper-threading is involved there's 2 logical CPUs per core
4) Some modern machines don't have valid MP specification tables, and either have no table at all or a dummy table (e.g. with almost nothing in it) - a modern OS should use ACPI tables, and only use MP specification tables if ACPI isn't present
5) Qemu doesn't care about timing during the AP CPU startup sequence, while real hardware does care about timing
6) It's hard to debug code you haven't seen - you end up making lots of random guesses :)


Cheers,

Brendan

Re: Bring up more processors on real hardware

Posted: Sat Feb 04, 2012 6:55 am
by TryHarder
First, thanks for reply.
I think that I've managed to bring up all of them now. There were a problem with lapic id <-> cpu mapping (thanks Combuster for the clue where to look).
Brendan wrote:5) Qemu doesn't care about timing during the AP CPU startup sequence, while real hardware does care about timing
Can you elaborate here please? For now I've used some waiting at booting, according to Intel's spec. In particular, I think that there probably might be some problems while accessing APICs via memory mapped region. I mean that simple loop that probes for ID on the same CPU sometimes yields strange results:

Code: Select all

while(1) { printf("Lapic id: %d\n", lapic[ID] >> 24); }

Code: Select all

Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 1 <<< This id is reported only by Linux ACPI and relates to hyper-threading. Should not appear at all
Lapic id: 2
Seems like race to me, but the loop is running on single CPU. Maybe I should wait some ums before reading again?
And does 'inb 0x84' is still good for short delays?
Thanks in advance.

Re: Bring up more processors on real hardware

Posted: Sat Feb 04, 2012 8:07 am
by Brendan
Hi,
TryHarder wrote:
Brendan wrote:5) Qemu doesn't care about timing during the AP CPU startup sequence, while real hardware does care about timing
Can you elaborate here please?
For both Bochs and Qemu, you don't need any delays at all for the "INIT-SIPI-SIPI" sequence. However, for Qemu (but not Bochs) if you have a time-out (to detect when a CPU fails to start correctly) you need a relatively long delay (e.g. 15 ms or more) so that you don't accidentally think the CPU failed to start up when it actually did; even though this "relatively long" delay can be much shorter on real hardware (or Bochs).
TryHarder wrote:For now I've used some waiting at booting, according to Intel's spec. In particular, I think that there probably might be some problems while accessing APICs via memory mapped region. I mean that simple loop that probes for ID on the same CPU sometimes yields strange results:

Code: Select all

while(1) { printf("Lapic id: %d\n", lapic[ID] >> 24); }

Code: Select all

Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 1 <<< This id is reported only by Linux ACPI and relates to hyper-threading. Should not appear at all
Lapic id: 2
I'd assume that is an entirely different problem. For example, it could just be something dodgy in your "kprintf()" code (or maybe an IRQ handler that isn't saving/restoring all registers it uses).
TryHarder wrote:And does 'inb 0x84' is still good for short delays?
"inb 0x84" was never good for short delays.


Cheers,

Brendan