Bring up more processors on real hardware

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
TryHarder
Member
Member
Posts: 28
Joined: Mon Aug 22, 2011 12:45 am

Bring up more processors on real hardware

Post by TryHarder »

Hey guys,
I have an issue that might require some of your help. I'm writing an OS and the development almost reached the milestone - long-mode, userspace, interrupts, system calls, preemptive scheduling, and multiprocessing. Last one I'd like to discuss:
CPUs are detected by parsing MP header according to Intel's MP spec, and they are booted via LAPICs. It works fine on QEMU and it boots up to 16 cpus without troubles. However, I've tried to run the same image at real hardware and it stuck at booting 5'th cpu. So far we know that:
1) QEMU emulates cpus as there are single slot for each (16 cpus - 16 chips. Each CPU is single-core).
2) My real hardware architecture is: 2 slots with QUADcores (Intel Xeon).
I think here comes a trouble.
So maybe you can suggest something that I'm probably missing before I dive into linux source and try to figure out what's wrong with my code.
Thanks
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Bring up more processors on real hardware

Post by Combuster »

Did you try using the right APIC ID (instead of counting from 0)?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
TryHarder
Member
Member
Posts: 28
Joined: Mon Aug 22, 2011 12:45 am

Re: Bring up more processors on real hardware

Post by TryHarder »

Combuster wrote:Did you try using the right APIC ID (instead of counting from 0)?
I think so. lapicid taken from MPPROC entry type. I've dumped id's that my OS uses with ones reported by linux /proc/cpuinfo. My OS found 8 MPPROC entries with id's that corresponds to first 8 entries of /proc/cpuinfo. Last 8 entries of cpuinfo are irrelevant (I guess) because they comes from hyperthreading that I'm supporting yet.
Interesting observation: x86info reports different id's - but claims that:
WARNING: Detected SMP, but unable to access cpuid driver.
Used Uniprocessor CPU routines. Results inaccurate.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Bring up more processors on real hardware

Post by Brendan »

Hi,
TryHarder wrote:So far we know that:
1) QEMU emulates cpus as there are single slot for each (16 cpus - 16 chips. Each CPU is single-core).
2) My real hardware architecture is: 2 slots with QUADcores (Intel Xeon).
3) MP specification tables only list one CPU per core, and when hyper-threading is involved there's 2 logical CPUs per core
4) Some modern machines don't have valid MP specification tables, and either have no table at all or a dummy table (e.g. with almost nothing in it) - a modern OS should use ACPI tables, and only use MP specification tables if ACPI isn't present
5) Qemu doesn't care about timing during the AP CPU startup sequence, while real hardware does care about timing
6) It's hard to debug code you haven't seen - you end up making lots of random guesses :)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
TryHarder
Member
Member
Posts: 28
Joined: Mon Aug 22, 2011 12:45 am

Re: Bring up more processors on real hardware

Post by TryHarder »

First, thanks for reply.
I think that I've managed to bring up all of them now. There were a problem with lapic id <-> cpu mapping (thanks Combuster for the clue where to look).
Brendan wrote:5) Qemu doesn't care about timing during the AP CPU startup sequence, while real hardware does care about timing
Can you elaborate here please? For now I've used some waiting at booting, according to Intel's spec. In particular, I think that there probably might be some problems while accessing APICs via memory mapped region. I mean that simple loop that probes for ID on the same CPU sometimes yields strange results:

Code: Select all

while(1) { printf("Lapic id: %d\n", lapic[ID] >> 24); }

Code: Select all

Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 1 <<< This id is reported only by Linux ACPI and relates to hyper-threading. Should not appear at all
Lapic id: 2
Seems like race to me, but the loop is running on single CPU. Maybe I should wait some ums before reading again?
And does 'inb 0x84' is still good for short delays?
Thanks in advance.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Bring up more processors on real hardware

Post by Brendan »

Hi,
TryHarder wrote:
Brendan wrote:5) Qemu doesn't care about timing during the AP CPU startup sequence, while real hardware does care about timing
Can you elaborate here please?
For both Bochs and Qemu, you don't need any delays at all for the "INIT-SIPI-SIPI" sequence. However, for Qemu (but not Bochs) if you have a time-out (to detect when a CPU fails to start correctly) you need a relatively long delay (e.g. 15 ms or more) so that you don't accidentally think the CPU failed to start up when it actually did; even though this "relatively long" delay can be much shorter on real hardware (or Bochs).
TryHarder wrote:For now I've used some waiting at booting, according to Intel's spec. In particular, I think that there probably might be some problems while accessing APICs via memory mapped region. I mean that simple loop that probes for ID on the same CPU sometimes yields strange results:

Code: Select all

while(1) { printf("Lapic id: %d\n", lapic[ID] >> 24); }

Code: Select all

Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 2
Lapic id: 1 <<< This id is reported only by Linux ACPI and relates to hyper-threading. Should not appear at all
Lapic id: 2
I'd assume that is an entirely different problem. For example, it could just be something dodgy in your "kprintf()" code (or maybe an IRQ handler that isn't saving/restoring all registers it uses).
TryHarder wrote:And does 'inb 0x84' is still good for short delays?
"inb 0x84" was never good for short delays.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply