Problem with booting AP in SMP

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

Problem with booting AP in SMP

Post by limp »

Hi all,

I am adding SMP support to my OS and I have a problem of starting up my APs. Initially, I am using a very simple start up code which just puts 0xFF in a memory location (0x1000) which is previously (before setting up smp) set to 0x00. However, the problem is that after transiting the SIPI interrupt, the code doesn't seem to be executed since the value at the location 0x1000 remains 0. I am also checking the Error Status Register and the delivery status bit at the Interrupt Command Register and they're both 0 meaning that there was no error transmiting the SIPI and that no IPI is pending.

The AP startup code that I am using is the following:

Code: Select all

.globl  AP_startup_start
.globl  AP_startup_end

.equ	TRAMPOLINE_MAGIC,	0x4D415254

AP_startup_start:

	/* magic */
	.long   TRAMPOLINE_MAGIC

	cli

	xor %ax, %ax
	mov %ax, %cs
	mov %ax, %ds
	mov %ax, %ss
	mov 0x100, %sp

	movb $0xFF,%al
	movb %al,(0x1000)

AP_startup_end:
Could someone suggest something or give a hint on what may causing the problem?

Thanks in advance!
cyr1x
Member
Member
Posts: 207
Joined: Tue Aug 21, 2007 1:41 am
Location: Germany

Re: Problem with booting AP in SMP

Post by cyr1x »

1. The code must be copied to an address that is a multiple of 0x1000 and below 1MiB.
2. Why do you put the "magic value" at the beginning of the code? The APs will execute that "code"!
3. Use Bochs!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Problem with booting AP in SMP

Post by Brendan »

Hi,

Continuing Cyr1x's list:
4. Put a "JMP $" or something at the end (just before "AP_startup_end:") so the AP doesn't execute garbage if/when it does start.
5. Check to make sure your assembler is generating 16-bit code and not 32-bit code
6. If it still doesn't work (and Bochs didn't help for some reason), post the code that sends the "INIT-SIPI-SIPI" sequence (including time delays, etc).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Problem with booting AP in SMP

Post by Combuster »

adding to that:
7. you can't move to CS
3. use bochs :mrgreen:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
IanSeyler
Member
Member
Posts: 326
Joined: Mon Jul 28, 2008 9:46 am
Location: Ontario, Canada
Contact:

Re: Problem with booting AP in SMP

Post by IanSeyler »

Check the code here:

http://www.cs.usfca.edu/~cruse/cs630f06/smphello.s

I used this while I was writing my SMP init code.
BareMetal OS - http://www.returninfinity.com/
Mono-tasking 64-bit OS for x86-64 based computers, written entirely in Assembly
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Problem with booting AP in SMP

Post by Brendan »

Hi,
iseyler wrote:Check the code here:

http://www.cs.usfca.edu/~cruse/cs630f06/smphello.s

I used this while I was writing my SMP init code.
There's a few problems with that code:
  • You shouldn't assume the local APIC/s are at 0xFEE00000. It's likely they are, but this isn't guaranteed. Parse the MP specification tables and/or ACPI tables instead.
  • You shouldn't assume that the computer isn't single-CPU (they still exist - I bought a new computer last week with VIA's newest "Nano" CPU in it - it's VIA's first 64-bit CPU, but I guess they haven't figured out how to do dual core yet).
  • You shouldn't broadcast the INIT-SIPI-SIPI sequence, because it can wake faulty CPUs that have been disabled (failed their BIST/Built In Self Test), and it defeats the user's ability to disable hyper-threading in the BIOS (which is often done on netburst CPUs as hyperthreading can make performance worse).
  • If you combine the last 2 things, it means you need to parse the MP specification tables and/or ACPI tables to find out the local APIC ID for any CPUs that are present, and send the INIT-SIPI-SIPI sequence to each CPU that you know exists (one by one). This also means that you can detect when a CPU fails to respond to the INIT-SIPI-SIPI sequence with a time-out (which isn't a bad idea).
  • You don't need a 200 ms delay after sending the second SIPI.
  • For some CPUs, you don't need the second SIPI at all; and the AP CPU can start executing code before you've sent the second SIPI. For example (for this code), the AP could receive the first SIPI, then execute the "lock incw n_cpu" instruction, then receive the second SIPI, then execute the "lock incw n_cpu" instruction again; and you'll think there's 7 CPUs when there's only 4. To avoid problems caused by this it's best to have some sort of "wait until BSP sends the second SIPI and clears a flag" code very close to the beginning of the AP startup code; or to use some other synchronization. If you're smart, you can avoid the second SIPI and some of the 200 us delay after the first SIPI on some systems.
  • I wouldn't rely on PIT timer 2. If the user holds their finger on a key then the BIOS keyboard buffer will fill up, and the BIOS will use this timer to generate a beep (which could stuff up the delay, or worse). You might think this is unlikely, but I've kept my finger on the DELETE key for ages before (trying to get into the BIOS setup screen, and not realizing the correct key is actually F2). I'd also be worried that the BIOS is using SMM and HPET to emulate the timer, or that the BIOS is using SMM to emulate the keyboard controller (for USB keyboard/mouse), and wouldn't assume that this emulation isn't extremely dodgy.

Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
cyr1x
Member
Member
Posts: 207
Joined: Tue Aug 21, 2007 1:41 am
Location: Germany

Re: Problem with booting AP in SMP

Post by cyr1x »

Brendan wrote: [*]You shouldn't broadcast the INIT-SIPI-SIPI sequence, ...
I wonder why Intel do exactly this in their manuals. Are they just lazy or is there something ...?
Or is that code only meant for the BIOS?
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Problem with booting AP in SMP

Post by Owen »

Brendan wrote:
  • You shouldn't assume the local APIC/s are at 0xFEE00000. It's likely they are, but this isn't guaranteed. Parse the MP specification tables and/or ACPI tables instead.
Any reason not to just read the APIC base MSR?

Not that it's relevant to me - I remap all my APICs to a determined section of physical address space (The very top of whatever the processor supports) anyway - though I am assuming that all long mode capable CPUs support doing this.
geppyfx
Member
Member
Posts: 87
Joined: Tue Apr 28, 2009 4:58 pm

Re: Problem with booting AP in SMP

Post by geppyfx »

Brendan wrote:and send the INIT-SIPI-SIPI sequence to each CPU that you know exists (one by one). This also means that you can detect when a CPU fails to respond to the INIT-SIPI-SIPI sequence with a time-out (which isn't a bad idea).
I am wondering if anything in the CPU or on multisocket motherboard that would prevent correct delivery of INIT & SIPI to the processors if I send INIT->CPU1, INIT->CPU2, SIPI->CPU1, INIT->CPU3, SIPI->CPU2, SIPI->CPU3 ?
And I think I am OK with code to detect which x86-64 cpu started or not .
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Problem with booting AP in SMP

Post by Brendan »

Hi,
cyr1x wrote:
Brendan wrote: [*]You shouldn't broadcast the INIT-SIPI-SIPI sequence, ...
I wonder why Intel do exactly this in their manuals. Are they just lazy or is there something ...?
Or is that code only meant for the BIOS?
AFAIK Intel's example in the Software Developer's Manual is for BIOS developers.

Intel's Multi-processor Specification has "Appendix B - Operating System Programming Guidelines", which doesn't directly state "You must not broadcast", but does have at least one clue (bold highlighting is mine):
Intel MP Spec, B.4 Application Processor Startup wrote:If the MP configuration table does not exist on an MP-compliant system, the system must be of default configuration type. The MP specification requires local APIC IDs to be numbered sequentially, starting at zero for all default configurations. As a result, the BSP can determine the AP’s local APIC ID in default, two-processor configurations by reading its own local APIC ID. Since there are only two possible local APIC IDs in this case, zero and one, when the APIC ID of the BSP is one, the APIC ID of the AP is zero, and vice versa. This is important, because a BSP cannot start up an AP unless it already knows the local APIC ID.
You don't need to know an AP CPU's APIC ID if you broadcast, but Intel says an OS must know an AP CPUs APIC ID...

Of course this document was last updated in 1997, which is several years before Intel introduced hyper-threading.

Owen wrote:
Brendan wrote:
  • You shouldn't assume the local APIC/s are at 0xFEE00000. It's likely they are, but this isn't guaranteed. Parse the MP specification tables and/or ACPI tables instead.
Any reason not to just read the APIC base MSR?
The APIC base MSR would be fine, except that it's a Model Specific Register. Basically it's not supported on 80486 or Pentium CPUs (and only became an "Architectural MSR" with P6). If you check that the CPU is a P6 or later Intel CPU first then it should be fine. For other CPU manufacturers (especially smaller ones like Cyrix, SiS, VIA, etc) you'll probably crash on any CPU; although maybe that can be avoided by checking the CPUID feature flags (on CPUs that support CPUID).

To be honest, you need to parse the MP Specification tables and/or ACPI tables anyway (to get local APIC IDs), and reading the local APIC base MSR sounds like more hassle than it's worth to me.
Owen wrote:Not that it's relevant to me - I remap all my APICs to a determined section of physical address space (The very top of whatever the processor supports) anyway - though I am assuming that all long mode capable CPUs support doing this.
You remap the local APICs simply because you think you can, or are you trying to make sure that PCI MSI (Message Signalled Interrupts) won't work for 32-bit PCI cards?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Problem with booting AP in SMP

Post by Owen »

Brendan wrote:You remap the local APICs simply because you think you can, or are you trying to make sure that PCI MSI (Message Signalled Interrupts) won't work for 32-bit PCI cards?
Grr. I hadn't realised PCI MSI required LAPIC access; I kind of assumed it was handled more inside the PCI controlle. Fortunately, I don't suppose theres any reason I need access to all of the CPU's APICs; It's just handy to have
cyr1x
Member
Member
Posts: 207
Joined: Tue Aug 21, 2007 1:41 am
Location: Germany

Re: Problem with booting AP in SMP

Post by cyr1x »

One CPU cannot access the LAPIC of a second CPU, if you meant that.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Problem with booting AP in SMP

Post by Owen »

It was a while ago, but I remember the Intel manuals saying this was the case if they were remapped. Perhaps my memory is faulty. I'm not sure about on AMD platforms; I'll have to recheck my AMD manuals.
limp
Member
Member
Posts: 90
Joined: Fri Jun 12, 2009 7:18 am

Re: Problem with booting AP in SMP

Post by limp »

cyr1x wrote:2. Why do you put the "magic value" at the beginning of the code? The APs will execute that "code"!
Oopps!!
Combuster wrote:7. you can't move to CS
Yeah, I know, silly mistake!

Firstly, I would like to thank you all for your help so far, I really appreciate it!

So after fixing these two errors, the code is running properly. After that, I am trying to load a temp GDT, switch to protected mode and jump to a 32-bit segment using ljmp. From there, I am doing a

Code: Select all

   movb $0x55,%al
   movb %al,(0x1000)
to make sure that I am actually jumping there. However, the value of 0x1000 still remains to 0xFF and not changing to 0x55 indicating that I am not jumping successfully.

I have put my GDT at 0x15004 to 0x1500B (3 * 8 Bytes) and I am doing something like that:

Code: Select all

.globl  AP_startup_start
.globl  AP_startup_end
.globl AP_flush

.text
.code16
AP_startup_start:
	cli

	xor %ax, %ax
	mov %ax, %ds
	mov %ax, %ss
	mov 0x100, %sp

	movb $0xFF,%al
	movb %al,(0x1000)

	/* Load temporary GDT */

       /* Move Limit (16-bits) to 0x1501C (Limit is (3*8) -1)) */
	movl $0x1501C, %eax
	movw $0x17, (%eax)

       /* Move Base (32-bits) to 0x15020 */
	movl $0x15020, %eax
	movl $0x15004, (%eax)

       /* Load from Limit  */
	movl $0x1501C, %eax
	lgdt (%eax)

	/* Switch to Protected Mode */
	mov %cr0,%eax
	inc %eax
	mov %eax,%cr0

	/* Jmp to AP_flush */
	ljmp $0x8, $(0x7000 + AP_flush - AP_startup_start)

.code32
AP_flush:
	mov $0x10, %eax
	mov %eax, %ds
	mov %eax, %es
	mov %eax, %fs
	mov %eax, %gs
	mov %eax, %ss

	/*For debugging*/
	movb $0x55,%al
	movb %al,(0x1000)
AP_startup_end:
frank
Member
Member
Posts: 729
Joined: Sat Dec 30, 2006 2:31 pm
Location: East Coast, USA

Re: Problem with booting AP in SMP

Post by frank »

I don't see where you setup the temporary GDT. I would use the already booted processor to build a GDT and GDT_LOC at fixed addresses in memory, then just do a lgdt fixed_address. The way the code is setup now, you would have complications if you started more than 1 processor at a time. Also it might be a good idea to 16 byte align structures in memory that the CPU has to access.
Post Reply