I troubled in X2APIC and MP system

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Js2xxx
Member
Member
Posts: 48
Joined: Sat Dec 31, 2016 1:43 am
Libera.chat IRC: wrgq
Location: China

I troubled in X2APIC and MP system

Post by Js2xxx »

I had let my OS loader enter long mode and I had also enabled X2APIC in a right way. I used VMware to run my OS loader, and I had set the number of CPU to 2. Now I sent INIT-SIPI-SIPI to the other processor. However, it wasn't effective - the screen kept quiet.
The code to send IPI is like this:

Code: Select all

        mov	ecx, IA32_X2APIC_ICR (I'm sure the MSR address is right.)
	mov	edx, 0
	mov	eax, 0xC4500
	wrmsr
	nop
	nop
	nop
	mov	eax, 0xC4690
	wrmsr
	nop
	nop
	wrmsr
Doing steadfastly, or doing nil.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: I troubled in X2APIC and MP system

Post by Brendan »

Hi,

Broadcasting the INIT-SIPI-SIPI sequence is a huge mistake. There's supposed to be delays between sending each IPIs and you have none (NOP on modern CPUs only consumes space and doesn't consume time, and if it actually did consume time it still wouldn't be anywhere near long enough). There should also be a time-out for the last SIPI that causes some sort of (e.g.) "CPU #123 failed to start" error message.

I'd also recommend not following Intel's sequence exactly and putting a time-out on the first SIPI too. The idea is that you only send the second SIPI if the time-out for the first SIPI expires; so that if the CPU starts on the first SIPI you skip the second SIPI.
Js2xxx wrote:I had let my OS loader enter long mode and I had also enabled X2APIC in a right way. I used VMware to run my OS loader, and I had set the number of CPU to 2. Now I sent INIT-SIPI-SIPI to the other processor. However, it wasn't effective - the screen kept quiet.
How do you know if the CPUs started or not? For example, what if the CPUs did start but crashed soon after starting?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Js2xxx
Member
Member
Posts: 48
Joined: Sat Dec 31, 2016 1:43 am
Libera.chat IRC: wrgq
Location: China

Re: I troubled in X2APIC and MP system

Post by Js2xxx »

Dear Sir or Madam,
Now I've corrected it already. Then I inserted 'EA 00 01 00 90' that means 'jmp 9000h:0100h' to address 0x90000, which is APs' start-up address. The address 9000h:0100h contains the code to initialize APs. But the CPU reset - APs received SIPI, but failed to execute the initial code. Is there something wrong? Or should I load another module for APs to execute?
Thanks for any help.
Doing steadfastly, or doing nil.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: I troubled in X2APIC and MP system

Post by Brendan »

Hi,
Js2xxx wrote:Now I've corrected it already. Then I inserted 'EA 00 01 00 90' that means 'jmp 9000h:0100h' to address 0x90000, which is APs' start-up address. The address 9000h:0100h contains the code to initialize APs. But the CPU reset - APs received SIPI, but failed to execute the initial code. Is there something wrong? Or should I load another module for APs to execute?
Thanks for any help.
Real mode 9000h:0100h is not (32-bit physical) address 0x00090000, it's 0x00090100.

Typically you put some "trampoline code and data" at an address that can be put in the "vector" field of the SIPI where that code is designed for "CS = address >> 4". For example, you might use "vector = 0x90", copy your "trampoline code and data" to 0x00090000, and design that "trampoline code and data" to run with "CS = 0x9000". You don't need any JMP.

Typically that "trampoline code and data" might do something like (for plain 32-bit paging, untested):

Code: Select all

    mov byte [cs:0x0800],1    ;Set flag to tell BSP that the CPU did start

.wait:
    cmp byte [cs:0x0800],2    ;Has BSP acknowledged that the CPU has started?
    jne .wait                 ; no, wait until it does

    mov eax,0x80000001        ;eax = value for CR0
    mov bx,0x0010             ;bx = value for data segments
    xor cx,cx                 ;cx = zero
    mov edx,[cs:0x0804]       ;edx = value to load into CR3
    mov esp,[cs:0x0808]       ;Set ESP to whatever address BSP allocated for this CPU's stack

    lgdt [cs:0x0120]          ;Load GDT
    mov cr3,edx               ;Load page directory
    mov cr0,eax               ;Enable protected mode and paging
    mov ss,bx                 ;Set SS to "big flat data"
    mov ds,bx                 ;Set DS to "big flat data"
    mov es,bx                 ;Set ES to "big flat data"
    mov fs,cx                 ;Set FS to "NULL"
    mov gs,cx                 ;Set GS to "NULL"

    jmp far [dword cs:0x0810] ;Jump to kernel's entry point
The BSP would prepare the data in the trampoline's code (including clearing the flag used for synchronisation, allocating a stack for the CPU to use, etc); then it'd send the INIT-SIPI-SIPI to the AP CPU while monitoring the flag used for synchronisation (so it knows if/when the AP CPU starts).

Note that the value for CR3 may be a special value that is only used by trampolines (where the trampoline's code and data is identity mapped).

After the AP CPU enters the kernel, code in the kernel might set the flag used for synchronisation to 3 to tell the BSP that the trampoline is no longer being used by the AP CPU, and allow the BSP to recycle the trampoline for the next CPU or free any temporary stuff (e.g. the page directory it created to identity map the trampoline and the page used for the trampoline itself).

Also note that keeping the data in the trampoline (and using CS to access it) allows you to have 2 or more independent "trampoline pages" and start 2 or more CPUs at the same time (without them getting the wrong stack, or interfering with each other's synchronisation flag, etc).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Js2xxx
Member
Member
Posts: 48
Joined: Sat Dec 31, 2016 1:43 am
Libera.chat IRC: wrgq
Location: China

Re: I troubled in X2APIC and MP system

Post by Js2xxx »

Eventually, I solved this problem by loading a new module for APs. Thanks anyway.
Doing steadfastly, or doing nil.
Post Reply