SMP Trampoline Template

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

SMP Trampoline Template

Post by Ethin »

I'm trying to create an SMP trampoline template that I can use to bootstrap my other processors. Thus far I've modified the code from the entering long mode article to use my paging table address from CR# in the BSP as well as the SIDT and SGDT instructions to get the address for my IDT and GDT. Is this code correct?

Code: Select all

%define CODE_SEG     0x0008
%define DATA_SEG     0x0010
ALIGN 4
SwitchToLongMode:
push di
mov ecx, 0x1000
xor eax, eax
cld
rep stosd
pop di
mov al, 0xff
out 0xa1, al
out 0x21, al
nop
nop
lidt [idtaddr]
mov eax, 0xa0
mov cr4, eax
mov edx, cr3_addr_from_bsp
mov cr3, edx
mov ecx, 0xc0000080
rdmsr
or eax, 0x00000100
wrmsr
mov ebx, cr0
or ebx, 0x80000001
mov cr0, ebx
lgdt [gdt_from_bsp]

[BITS 64]
LongMode:
mov ax, DATA_SEG
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
Where idtaddr, gdt_from_bsp, and cr3_addr_from_bsp are the values from SIDT, SGDT, and CR3 from the BSP. The problem is that I'm not really sure where to jump to to flip an atomic counter in my kernel, for example. One idea I had to solve this was to use the ICR register to send an interrupt to the BSP to let it know "Hey, this processor is done initializing". Also, should I set up the APIC in this code, or do that later?
Edit: Fixed a typo
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: SMP Trampoline Template

Post by bzt »

Ethin wrote:I'm trying to create an SMP trampoline template that I can use to bootstrap my other processors.
There's an example Assembly in the Intel manual on how to do exactly that. See also this thread.
Ethin wrote:The problem is that I'm not really sure where to jump to to flip an atomic counter in my kernel, for example. One idea I had to solve this was to use the ICR register to send an interrupt to the BSP to let it know "Hey, this processor is done initializing".
You should probably do
on BSP:

Code: Select all

static volatile int cntcpu = 1; /* start with one, because we are on BSP */
startup APs, send SIPI etc.
while(cntcpu < numcpu); /* spin loop, wait for APs */
on AP(s) as soon as core is ready to receive work from the BSP:

Code: Select all

atomic_inc(cntcpu);
Ethin wrote:Also, should I set up the APIC in this code, or do that later?
Probably before. You'll need the local APIC to send the SIPI. Also you must set up APIC on each core (the IO/APIC needs one global initialization, but every core has it's own local APIC, that's why it's called "local").

Cheers,
bzt
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: SMP Trampoline Template

Post by Octocontrabass »

Ethin wrote:Thus far I've modified the code from the entering long mode article to use my paging table address from CR# in the BSP as well as the SIDT and SGDT instructions to get the address for my IDT and GDT.
Keep in mind you can't load the upper 32 bits of those registers before you enter long mode. This will be a problem if CR3 or the GDTR point to an address above 4GiB.

You may want to initialize some other registers with the same values used by the BSP, such as CR4.

Code: Select all

SwitchToLongMode:
push di
What values do SS, SP, and DI have before this instruction?

Code: Select all

rep stosd
Why are you zeroing memory?

Code: Select all

out 0xa1, al
out 0x21, al
You should have already disabled the PICs, you don't need to do it again every time you start an AP.

Code: Select all

lidt [idtaddr]
The purpose of this instruction in the code you copied it from is to load a zero-length IDT to ensure any uncaught exceptions or unexpected NMIs cause a triple fault. You should wait until you're in long mode to load the final IDT.

Code: Select all

mov cr0, ebx
lgdt [gdt_from_bsp]
The MOV to CR0 that sets bit 0 (CR0.PE) should be immediately followed by a far JMP to set CS. That means you must move the LGDT instruction above the MOV to CR0, and restore the far JMP you deleted. (The code you copied from has these instructions in the wrong order too.)
Ethin wrote:The problem is that I'm not really sure where to jump to to flip an atomic counter in my kernel, for example.
You could jump to the code that flips the atomic counter. :P Actually, Intel says you need to send two STARTUP IPIs, so I'd recommend having the AP spin waiting for the BSP to set a variable or something before jumping to the code that flips the atomic counter to make sure it doesn't get set twice.
Ethin wrote:One idea I had to solve this was to use the ICR register to send an interrupt to the BSP to let it know "Hey, this processor is done initializing".
That is also an option.

A third option would be to create a bitmap and have each AP set a bit to indicate it has started successfully. That way, if an AP fails to start, you know exactly which one has failed and can inform the user.
Ethin wrote:Also, should I set up the APIC in this code, or do that later?
It doesn't matter when you set up each AP's local APIC, as long as it gets done before you start scheduling processes on the AP.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: SMP Trampoline Template

Post by kzinti »

You should also add some comments to the code. Also some empty lines between unrelated logic. This will help people who want to read your code and help you.
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: SMP Trampoline Template

Post by nexos »

You need to use a far jump to enter long mode.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Ethin
Member
Member
Posts: 625
Joined: Sun Jun 23, 2019 5:36 pm
Location: North Dakota, United States

Re: SMP Trampoline Template

Post by Ethin »

bzt wrote:
Ethin wrote:I'm trying to create an SMP trampoline template that I can use to bootstrap my other processors.
There's an example Assembly in the Intel manual on how to do exactly that. See also this thread.
Ethin wrote:The problem is that I'm not really sure where to jump to to flip an atomic counter in my kernel, for example. One idea I had to solve this was to use the ICR register to send an interrupt to the BSP to let it know "Hey, this processor is done initializing".
You should probably do
on BSP:

Code: Select all

static volatile int cntcpu = 1; /* start with one, because we are on BSP */
startup APs, send SIPI etc.
while(cntcpu < numcpu); /* spin loop, wait for APs */
on AP(s) as soon as core is ready to receive work from the BSP:

Code: Select all

atomic_inc(cntcpu);
Ethin wrote:Also, should I set up the APIC in this code, or do that later?
Probably before. You'll need the local APIC to send the SIPI. Also you must set up APIC on each core (the IO/APIC needs one global initialization, but every core has it's own local APIC, that's why it's called "local").

Cheers,
bzt
Where is the example assembly? The intel manuals don't seem to have it (I've looked in section 8.4 ("Multiple-Processor (MP) Initialization") and 9.10 ("Initialization and Mode Switching Example"). Section 9.10 has an assembly listing, but I don't think its designed for APs but BSPs (but I might be wrong).
Thanks, Octocontrabass, kzinti, and nexos. I don't know what would be in those registers before the push di instruction, and I hesitate to presume that they would be cleared (we're talking about firmware here, after all). I assume the rep stosd is unnecessary. Thanks for the other pointers. This is the first time I've tackled this so this is interesting.
Octocontrabass
Member
Member
Posts: 5568
Joined: Mon Mar 25, 2013 7:01 pm

Re: SMP Trampoline Template

Post by Octocontrabass »

Ethin wrote:I don't know what would be in those registers before the push di instruction, and I hesitate to presume that they would be cleared (we're talking about firmware here, after all).
If you manage to find an ancient 486 or <75MHz Pentium multiprocessor system, you can't assume anything about initial register values because the AP starts executing BIOS code and it's up to you to tell the BIOS to run your code. On anything newer than that, the IPIs you send will set the registers to default values. Either way, it's a bad idea to use the stack without setting it up first, since you could be overwriting something important.
Ethin wrote:I assume the rep stosd is unnecessary.
Your assumption is correct. The comments in the original code should explain its purpose there.
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: SMP Trampoline Template

Post by xeyes »

Is this code correct?
Why don't you give a try?

IIRC AP boot code was the most triple fault prone part I've encountered (maybe I haven't reached other really difficult parts yet) and it doesn't help that Intel designed the system to work in such a way that any AP triple faulting brings every other CPU down with it.

One thing I'd really look out for: assembler not generating the instruction you want, esp. around mode changes.

Some other things I did (and may not be necessary) for your consideration:

1. CLI asap unless you like the fun of "handling" interrupts that might hit at anytime when nothing is set up yet

2. Initialize the segment selectors

2. Enter protected mode and enable caches in CR0 (maybe not needed if you have UEFI? not sure as I've not had any experience with that)

3. Plan for a boot lock and let each AP grab it asap unless you are 200% sure that all structures /routines along your boot path are thread safe already

4. Get Local APIC ID early, so you can take a look and safely disable (hlt loop) extra CPU that you don't need. For example, if you've set up a fixed sized array for per CPU stuff and you notice that the system has more CPUs than expected (aka more than that array can hold). This might be esp. necessary if you broadcast SIPI.

Also some hacks I've done against the manual and better judgement:

1. One SIPI broadcast only, it works well on the machines I tried, and makes the difficult to do 200us delay unnecessary

2. Copy the trampoline myself as grub won't load something into the 1st MB

Good luck and have fun :D
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: SMP Trampoline Template

Post by kzinti »

I sure had a lot of fun tracking down the causes of triple faults when implement SMP startup trampolines.

I will point you to my code and hope it can help you... This is probably the most annoying thing I worked on so far on my OS (or was it IPC? I am not sure...):

32 bits: https://github.com/kiznit/rainbow-os/bl ... ia32/smp.S
64 bits: https://github.com/kiznit/rainbow-os/bl ... 6_64/smp.S
Code using the trampoline to start APs: https://github.com/kiznit/rainbow-os/bl ... 86/smp.cpp
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: SMP Trampoline Template

Post by bzt »

Ethin wrote:Where is the example assembly? The intel manuals don't seem to have it
Granted, in the latest 253668-072US (May 2020) version Intel has removed the example code (along with lots of useful documentation and code examples too), but you can still find the old version 253668-026US (Feb 2008) online pretty easily. I did a simple "intel mp initialization example" search and it was on the first page.

See Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, section 7.5.4 MP Initialization Example

It explains in very great detail what steps are required on the BSP as well as on the APs, including local APIC setup among other things.

Also take a look at
https://wiki.osdev.org/Symmetric_Multiprocessing
http://www.osdever.net/tutorials/view/m ... -explained
http://www.uruk.org/mps/

Hope this helps.

Cheers,
bzt

ps: if your websearch engine did not show this link, then you've faced the deepest hell of modern internet plagued with destructive information bubbles. Try deleting all cookies, use private mode, change browser identification string and change IP if you can. Or try to train the search AI that you need technical documents, good luck with that.
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: SMP Trampoline Template

Post by nullplan »

Ah, trampoline code. I've made mine as self-contained as possible. It must be copied into the low 1MB anyway, and being self-contained means multiple CPUs can be initialized at the same time.

Intel recommends two startup IPIs because of old hardware. I kept to an idea by Brendan from back in the day: A data word in the trampoline itself is used to communicate readiness to the booting CPU. The BSP can periodically check back in if the bit has come on, and can thus perform other tasks instead of waiting for APs. Here's the simple version:

Code: Select all

.section .rodata,"a"
.code16
.global trampoline
.global trampoline_end
/* assumption: Starts with CS:IP = xx00:0000, where xx is startup IPI number */
trampoline:
  jmp real_start
.align 4
cr3val: .long 0 /*CR3 for this core (must be below 4GB) */
kcode: .quad 0 /* where to go after entering long mode */
kstack: .quad 0 /* kernel stack pointer to load before going there */
kgsval: .quad 0 /* GS value */
commword: .long 0 /* communication word */
gdt: .word 0, gdt_end - gdt - 1
  .long gdt - trampoline
  .quad 0x00cf9a000000ffff
  .quad 0x00cf92000000ffff
  .quad 0x00af9a000000ffff
  .quad 0x00af92000000ffff
gdt_end:
f32ptr: .long prot_start - trampoline, 0x8
f64ptr: .long long_start - trampoline, 0x18
real_start:
/* set DS to avoid CS prefixes all over the place */
  movw %cs, %bx
  movw %bx, %ds
/* signal readiness */
  orl $1, commword - trampoline
/* no wait here (no need for it) */
/* relocate pointers */
  shll $4, %ebx  /* base address in ebx */
  addl %ebx, gdt - trampoline + 4 /* absolute GDT base address */
  addl %ebx, f32ptr - trampoline  /* absolute protected mode base address */
  addl %ebx, f64ptr - trampoline  /* absolute long mode base address */

  lidt gdt - trampoline /* load 0-length IDT to crash this processor should anything happen */
  lgdt gdt - trampoline + 2 /* GDT pointer is folded into first GDT entry. */

/* enable protected mode */
  movl %cr0, %eax
  btsl $0, %eax
  movl %eax, %cr0
/* jump to 32-bit protected mode */
  ljmpl *(f32ptr - trampoline)

.code32
prot_start:
/* initialize data segment registers */
  movw $0x10, %ax
  movw %ax, %ds
  movw %ax, %es
  movw %ax, %fs
  movw %ax, %gs
  movw %ax, %ss
  leal 0x1000(%ebx), %esp /* also a stack at the end of the trampoline. */
  pushl $0 /* clear all flags */
  popfd
/* Enter long mode: Enable PAE */
  movl %cr4, %eax
  btsl $5, %eax
  movl %eax, %cr4
/* Load CR3 */
  movl cr3val - trampoline(%ebx), %eax
  movl %eax, %cr3
/* Enable long mode */
  movl $0xc0000080
  rdmsr
  btsl $8, %eax
  wrmsr
/* Enable paging */
  movl %cr0, %eax
  btsl $31, %eax
  movl %eax, %cr0
/* jump to long mode */
  ljmpl *f64ptr - trampoline(%ebx)

.code64
long_start:
/* initialize data segments again (just to be sure, it probably can't hurt) */
  movw $0x20, %ax
  movw %ax, %ds
  movw %ax, %es
  movw %ax, %fs
  movw %ax, %gs
  movw %ax, %ss
/* clear upper half of RBX */
  orl %ebx, %ebx
/* load GS base */
  movl $0xc0000101, %ecx
  movl kgsval - trampoline(%rbx), %eax
  movl kgsval - trampoline + 4(%ebx), %edx
  wrmsr
/* Load kernel stack */
  movq kstack - trampoline(%rbx), %rsp
  xorl %ebp, %ebp
  movq kcode - trampoline(%rbx), %rax
  movq %rbx, %rdi /* pass trampoline base as first arg. */
  callq *%rax
1:
  cli
  hlt
  jmp 1b
trampoline_end:
The debug version additionally has exception handlers for all three modes, telling the BSP when an exception has happened, and where, and in what mode. That is a lot of repetitive code, and not always worth it, since mode change and IDT change cannot be made atomic. Anyway, this trampoline is self-contained, so anyone should be able to use it. Yes, it is tailored to my needs, but should be easy to expand. In my case, the kernel is in C, so after setting up a stack, nothing more is needed to call into C code. It is not expected that that routine would ever return, but I added a safety net, just in case. The run-time memory allocation for the trampoline is 4kB anyway, since start vectors can only be placed on 4kB aligned addresses.

The start-up code in the BSP is quite simple: Allocate three pages for kernel stack and map them contiguously in kernel space. Calculate the address of a struct cpu at the top of that range, initialize it. Allocate a page in the 1MB zone. Copy the trampoline there. Allocate a page in the 4GB zone. Copy the PML4 there, and ensure its first entry equals its 128th entry (this because the first half of kernel-space is a linear mapping of all RAM, so this ensures in the simplest possible way the trampoline will be identity mapped when it runs). Fill in the data at the start of the trampoline (kernel GS will be the base of the struct cpu, kernel stack will be the same, except aligned to 16 bytes downward, kernel code will be the address of a noreturn function.

Then send an INIT IPI, wait a little (the spec says how much), send a startup IPI, wait for the commword to become 1. If this times out, send a second startup IPI and yield to the scheduler. After a long time out (several seconds), if the CPU still has not set the commword, send another INIT IPI (in case the CPU did start running, and is running amok somewhere else), free all the memory and log a failure. As soon as commword is observed to be 1, all the memory in the kernel stack, the other CPU's PML4, and the trampoline, all belongs to the other CPU.

The AP landing pad will then load the real GDT, IDT, and TSS, initialize CR0 and CR4 to their final values, load whatever MSRs are still needed, free the trampoline page (low memory is precious, after all), clear the low half of the PML4, announce its presence to the scheduler (which, among other things, involves a fetch-and-add on a global variable, setting the logical CPU number), then run the scheduler in infinite loop. Therefore it will never return, as required.

This code is so self-contained that it can be run in multiple threads, on multiple CPUs. I don't necessarily need the BSP to do all the booting. So before reading the MADT, I initialize the scheduler (which is possible after initializing the memory managers), and then I just queue up tasks to start each CPU I find in there. Then the APs can join in starting other APs as soon as they themselves are ready.
Carpe diem!
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: SMP Trampoline Template

Post by bzt »

nullplan wrote:being self-contained means multiple CPUs can be initialized at the same time.
That's not what self-contained means, just sayin'. Nice code btw, but I don't understand why you do all those insane address calculations in run-time when you copy this code to a fixed location.

Here's another example where I've done those address calculations by hand therefore the code is a lot simpler (should GAS support multiple ORG directives like fasm and this wouldn't be needed). The only complexity in this code is that it sets up CR3, CS, DS on the APs to the same value as the UEFI used on the BSP (unknown at compilation-time), hence the extra variables. The code that copies this trampoline under 1Mb and sets the variables is here. I start cores here, which is a direct rewrite of the Intel manual's Asm example in C.

I still haven't got over the fact that Intel modified its System Programming Guide (order number 253668) and removed all the code examples... This is such a d*ck move. It's like they're saying "no, you're not supposed to learn OSDev".

Cheers,
bzt
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: SMP Trampoline Template

Post by nexos »

Trampoline code was the worst thing I have ever developed. It is a big pain to do right. Of course, my problems were also caused by a GCC bug that was "optimizing" memcpy and memset, but I had so many problems getting the trampoline code correct. Here is the solution I came up with

Code: Select all

; ApStart.asm - contains AP startup code
; Distributed with NexKe, licensed under the MIT license
; See LICENSE

section .text

global rmodeStartAp
global rmodeEndAp

bits 16

rmodeStartAp:
    cli
    cld
    mov ax, 0
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov eax, cr4
    or eax, 1 << 5              ; Set PAE bit
    mov cr4, eax

    mov ecx, 0xB000
    mov cr3, ecx

    mov ecx, 0xC0000080
    rdmsr
    or eax, 1 << 8
    or eax, 1 << 11
    wrmsr

    mov eax, cr0
    or eax, 0x80000001
    mov cr0, eax

    lgdt [0x8000]
    jmp dword 0x08:0xA000

    cli
    hlt
rmodeEndAp:

bits 64

global lmodeStartAp
global lmodeEndAp

lmodeStartAp:
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax

    mov rax, cr0
    and ax, 0xFFFB		            ; Clear coprocessor emulation CR0.EM
    or ax, 0x2			            ; Set coprocessor monitoring  CR0.MP
    mov cr0, rax

    mov rax, cr4
    or ax, 3 << 9		            ; Set CR4.OSFXSR and CR4.OSXMMEXCPT at the same time
    mov cr4, rax

    mov rsp, qword [0xC000]
    mov rbp, 0
    push 0x08
    push HalApEntry
    mov byte [0xC008], 1
    retfq
lmodeEndAp:

extern HalApEntry
Of course, half of the troubles were in the C code that set up the trampoline code. So, I will show you the relevant portion of that

Code: Select all

extern VOID* rmodeStartAp;
extern VOID* rmodeEndAp;

extern VOID* lmodeStartAp;
extern VOID* lmodeEndAp;

volatile INT apInitDone = 0;

VOID HalStartAp(BYTE id)
{
    QWORD* stackLoc = (QWORD*)0xC000;
    QWORD stackAddr = (QWORD)KeHeapAllocate(5000);
    stackAddr += 0x1000;
    *stackLoc = stackAddr;

    ++stackLoc;
    *stackLoc = 0;

    HalSendIPI(id, IPI_DSH_DEST, IPI_INIT, 0);
    HalWaitEarly(10);

    HalSendIPI(id, IPI_DSH_DEST, IPI_STARTUP, (0x9000 >> 12));
    HalWaitEarly(5);
    while(*stackLoc == 0);
}

VOID CreateCpuStruct()
{
    CPU_DATA* data = (CPU_DATA*)KeHeapAllocate(sizeof(CPU_QUEUE));
    HalWriteMsr(0xC0000102, (QWORD)data);
    HalWriteMsr(0xC0000101, (QWORD)data);
    data->numTicks = 0;
    data->apicId = smpInfo->lapics[HalGetCPU()];
    data->selfPtr = data;
    HalCreateTss();
}

CPU_DATA* HalGetCpuData()
{
    QWORD data = 0;
    asm volatile("movq %%gs:0x0, %0" : "=r" (data) : : "memory");
    return (CPU_DATA*)data;
}

VOID HalStartAps()
{
    HalCopyGdt();
    CreateCpuStruct();
    KeEnable();
    QWORD rsize = (QWORD)&rmodeEndAp - (QWORD)&rmodeStartAp;
    CopyMemory((VOID*)PHYS_BASE + 0x9000, &rmodeStartAp, rsize);

    QWORD lsize = (QWORD)&lmodeEndAp - (QWORD)&lmodeStartAp;
    CopyMemory((VOID*)PHYS_BASE + 0xA000, &lmodeStartAp, lsize);

    QWORD dirAddr = (QWORD)HalGetDir();
    dirAddr += PHYS_BASE;
    CopyMemory((VOID*)PHYS_BASE + 0xB000, (VOID*)dirAddr, 0x1000);

    for(DWORD i = 1; i < smpInfo->numCpus; i++)
    {
        HalStartAp(smpInfo->lapics[i]);
    }
    while(!apInitDone);
    HalUnmapLow();
}

VOID HalApEntry()
{
    HalLoadGdtAp();
    HalLoadIdtAp();
    HalSwitchDir((QWORD)HalGetDir());
    CreateCpuStruct();
    HalLapicInitAp();
    if(HalGetCPU() == smpInfo->numCpus - 1)
    {
        apInitDone = 1;
    }
    SchedApInit();
    for(;;);
}
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: SMP Trampoline Template

Post by nullplan »

bzt wrote:That's not what self-contained means, just sayin'. Nice code btw, but I don't understand why you do all those insane address calculations in run-time when you copy this code to a fixed location.
Because I don't copy the code to a fixed location. The code is copied once for every AP that starts up, and can be located at any page boundary in low memory (due to the limitations of the startup IPI only allowing the thing to start there). This might be wasteful, but it is the only way to duplicate the data section at the start, and that allows me to assume that GS is correctly set up by the time the code reaches my AP landing pad. So then I can differentiate CPUs in the code by looking up the structure saved there.

Failure to allocate a page in low memory is treated specially by the AP startup task: It will yield to the scheduler and try again once the 1MB zone allocator signals something was freed there. It's the only such allocation in the kernel so far, and only because there is a grand total of 256 pages possible due to x86, and more likely only 160 pages due to PC compatibility, and at least two of those are going to be reserved for IVT, BDA, and EBDA. But then, I have so far not had a CPU with 158 cores.
bzt wrote:Here's another example where I've done those address calculations by hand therefore the code is a lot simpler (should GAS support multiple ORG directives like fasm and this wouldn't be needed).
Well, GAS doesn't support multiple org directives, but it does support assignment to the location counter. But that is only the local location counter. GAS always assembles into object files, so the location counter is always relative to some section, and references to other sections might as well reference other files (that's how they are treated, anyway). You can also define symbols, and those will even show up in the object file at the end.
bzt wrote:The only complexity in this code is that it sets up CR3, CS, DS on the APs to the same value as the UEFI used on the BSP (unknown at compilation-time), hence the extra variables.
But why? By the time you start the APs, UEFI is long forgotten, and you could just be loading those registers with values that make sense for your OS.
bzt wrote:I still haven't got over the fact that Intel modified its System Programming Guide (order number 253668) and removed all the code examples... This is such a d*ck move. It's like they're saying "no, you're not supposed to learn OSDev".
Maybe they realized that in software development, all roads lead to StackOverflow, and people will just be copying whatever stuff you give them. They do still have code examples, but shorter ones, ones that only work when you actually do what the comments only suggest, and then there is this one example where they tell you how to start your BIOS.
nexos wrote:Trampoline code was the worst thing I have ever developed. It is a big pain to do right. Of course, my problems were also caused by a GCC bug that was "optimizing" memcpy and memset, but I had so many problems getting the trampoline code correct.
Yeah, like bzt, you are hardcoding some addresses. You made your assembly simpler, but your C code more complicated. You need the address 0x8000 to be free for a GDT, and 0xA000 to be free for the long mode trampoline. Add to that that you are out-of-spec and I can see how this would impede you. Directly entering 64-bit mode from real mode is not part of the specification. And it doesn't actually save you that much memory, anyway.
Carpe diem!
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: SMP Trampoline Template

Post by nexos »

nullplan wrote:Yeah, like bzt, you are hardcoding some addresses. You made your assembly simpler, but your C code more complicated. You need the address 0x8000 to be free for a GDT, and 0xA000 to be free for the long mode trampoline. Add to that that you are out-of-spec and I can see how this would impede you. Directly entering 64-bit mode from real mode is not part of the specification. And it doesn't actually save you that much memory, anyway.
I though about all that myself :) . That was from my last kernel, in my current one, I plan on making my trampoline code much better.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Post Reply