Page 1 of 1
[SOLVED] AP CPUs do not started up after SIPI IPI
Posted: Sat Aug 22, 2020 11:06 am
by iman
Hi.
I have initialized the apic for the BSP cpu, and now trying to bring the AP cpus up.
The trampoline code was copied at physical address
0x8000 as follows:
Code: Select all
extern gdt_pointer
extern idt_pointer
global trampoline_code
trampoline_code:
BITS 16
cli
xor ax, ax
mov ds, ax
a32 lidt [idt_pointer]
a32 lgdt [gdt_pointer]
mov eax, cr0
or eax, 1
mov cr0, eax
jmp DWORD 0x08:prot_mode
prot_mode:
BITS 32
mov ax, 0x10
mov ds, ax
mov es, ax
mov ss, ax
mov fs, ax
mov gs, ax
mov esp, @pm_stack
; this is for debugging ------------------------------------------------------- printing a letter to the screen
mov eax, 0xB8000
mov BYTE [eax], 'O'
add eax, 1
mov BYTE [eax], 0x02
; this is for debugging -------------------------------------------------------
hlt
section .data
space times 4096 db 0
@pm_stack:
the
gdt_pointer and
idt_pointer are the same pointers specified for the BSP cpu during the boot up.
I don't get any error, but also not able to see the expected letter printed on screen to show that the AP cpu was up or not.
two questions:
- how should I check if the AP cpu, without any flaw, received and responded to INIT and SIPI IPIs?
- what might be wrong with the code above?
Best.
Iman.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sat Aug 22, 2020 11:39 am
by PeterX
I tried your Github LiBOS repo. But I couldn't find in which file you do the copying of the trampoline code and in which file you setup SMP. Maybe I'm just too stupid...
What you post here, is pretty normal code (and not SMP specific) and I personally don't see any bugs (maybe someone else sees?) [EDIT: Maybe the HLT instruction should be surrounded by an endless loop?]
And a very small problem I have with your repository:
A LibOS is a library operating system, meaning it has only an minimal (exo)kernel and does all typical kernel functionality in libraries. So may a rename would avoid confusion? Like LBOS or LiBoOS? But if you don't agree, never mind and keep the name.
And something else: I like to see that someone else develops an Assembler. So I'm not the only one messing with programming language/development tools while he is developing an OS. But there are many others which do so, I know. But it's still nice to see that here.
Greetings
Peter
Re: AP CPUs do not started up after SIPI IPI
Posted: Sat Aug 22, 2020 11:57 am
by iman
PeterX wrote:I like to see that someone else develops an Assembler.
Yes. I did it twice. First time I started to write a minimal x86 assembler in C with nearly similar mnemonics and machine codes to NASM. The second time, I did it without anything. Only paper and pen and looking into machine codes specifications in order to bootstrap the assembler. I never finished the second one but someday I will.
Regarding your trying to find the SMP and trampoline copying in my repo, you are right. They are still not there. I finish something and then upload it
The memcpy is getting the address of the trampoline code and copy it at physical address 0x8000 with the specified bytes.
LiBOS naming. I am aware of what you mentioned.
I could not come up with anything better, but I will give another try.
Best.
Iman.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sat Aug 22, 2020 12:08 pm
by mkfree
Hello, I did the initialization of the CPUs and it works correctly:
1- A quick question, when you assembled the springboard code you told the compiler
Which will assemble at address 0x80000 ?.
I did it in the following way:
Code: Select all
[BITS 16]
[ORG 0x01000]
CLI
LGDT [GDT_TAB]; ONLY FOR CHANGE TO 32-BIT PROTECTED MODE
MOV EAX, CR0; SWITCH TO PROTECTED MODE
OR EAX, 0x1
MOV CR0, EAX
JMP SHORT $ + 2; INTEL RECOMMENDED EMPTY INSTRUCTION LIST
JMP 0x8: PROTECTED_MODE; I AM IN PROTECTED MODE 16 BIT INSTRUCTIONS
PROTECTED_MODE:
JMP 0x10: CODE32; I AM IN PROTECTED MODE 32 BIT INSTRUCTIONS
GDT_TABLE:; NULL DESCRIPTOR INDEX 0
NULL:
LIMITE_L DW 0
BASE_L DW 0
BASE_M DB 0
ACCES DB 0
LIMITE_H DB 0
BASE_H DB 0
ECS:; INDEX 1 CS FOR 16 BIT
DW 0FFFFH
DW 0
DB 0
DB 9BH
DB 08FH
DB 0
CS_FLAT_32:; INDEX 2 CS PARA 32 BIT
DW 0FFFFh
DW 0
DB 0
DB 9BH
DB 0DFH
DB 0
END_GDT_TABLE:
GDT_TAB:
dw END_GDT_TABLE - GDT_TABLE - 1
dd GDT_TABLE
; FROM HERE THE INSTRUCTIONS ARE AT 32 BITS
use32
CODE32:
LGDT [KGDT]; CHANGED TO THE KERNEL GDT
MOV EAX, 0x10; INDEX OF DESCRIPTOR 2 FOR DATA
MOV DS, EAX
MOV ES, EAX
MOV FS, EAX
MOV GS, EAX
JMP 0x8: CHANGE; CHANGE TO CS REGISTER OF KERNEL INDEX 1
CHANGE:
; THE PAGE IS ACTIVATED
XOR EAX, [directoryPages]
MOV CR3, EAX
MOV EAX, CR4
OR EAX, 0x00000010
MOV CR4, EAX
MOV EAX, CR0
OR EAX, 0x80000000
MOV CR0, EAX
; KERNEL INDEX 2 SS STACK IS PREPARED
MOV EAX, 0x18
MOV SS, EAX
MOV ESP, [stackCpu]
MOV EBP, 0x0
; THE TSS IS PREPARED FOR THE CHANGE OF TASK
MOV AX, 0x38
LTR AX
; MAGIC NUMBER TELLING KERNEL THAT THE CPU WAS INITIALIZED
MOV AX, 0x0507
MOV [bootCPUmagicNumber], AX
; WE ARE ALREADY IN THE KERNEL !!!
JMP [bootCpusCode]
TIMES 492 - ($ - $$) DB 0; DEFINED VALUE IS USED BY KERNEL TO LOCATE THE DATA AREA IS SAY OFFSET 500
; KERNEL STRUCTURE TO ACCESS THIS ADDRESS SPACE SptrBootCpusData
KGDT:
DW 0x0
DD 0x0
stackCpu DD 0x0
directoryPages DD 0x0
bootCpusCode DD 0x0
bootCPUmagicNumber DW 0x0507
Re: AP CPUs do not started up after SIPI IPI
Posted: Sat Aug 22, 2020 1:23 pm
by iman
mkfree wrote:you assembled the springboard code you told the compiler
Which will assemble at address 0x80000 ?
No I did not and it might be the reason why it did not work. The addresses should have been relocated.
I will fix it and soon report the result to you here.
Thanks.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sat Aug 22, 2020 11:43 pm
by nullplan
For this reason, I am using a pretty much self-contained trampoline that will relocate itself. The far jump is absolute, so the address you give to it must be relocated. I'd probably do it like this:
Code: Select all
align 8
bits 16
trampoline:
jmp start16
align 4
kcode: dd 0
kstack: dd 0
kcr3: dd 0
commword: dd 0
far32ptr: dd start32 - trampoline
dw 8
align 8
gdt: dw 0, .end - $ - 1
dd gdt - trampoline
dq 0x00cf9b000000ffff
dq 0x00cf93000000ffff
.end:
start16:
mov cs, ebx
mov ds, bx
lock or dword [commword - trampoline], 1
.spin:
pause
test dword [commword - trampoline], 2
jz .spin
shl ebx, 4
add [far32ptr - trampoline],ebx
add [gdt + 4 - trampoline],ebx
lgdt [gdt + 2 - trampoline]
mov eax, cr0
bts eax, 0
mov cr0, eax
jmp far dword [far32ptr]
bits 32
start32:
mov ax, 16
mov ds, eax
mov es, eax
mov fs, eax
mov gs, eax
mov eax, [kcr3 - trampoline + ebx]
mov cr3, eax
mov eax, cr0
bts eax, 31
mov cr0, eax
mov esp, [kstack - trampoline + ebx]
call [kcode - trampoline + ebx]
ud2
The host kernel merely has to fill in the blanks at the start after copying the trampoline wherever. I included code to turn on paging, so your AP is never entering your kernel without paging active, else that would be kind of a mess to have a mix of paging and non-paging code. The CR3 given to the AP must point to paging structures that identity map at least the trampoline page, in addition to mapping the kernel.
I also included a communication step at the start: The commword will get its least significant bit set once the AP is running at all. After that it will wait for the BSP to signal that it may continue with the second bit. This is so the BSP can know whether the AP started, and it can perform whatever changes needed to hand over ownership of CPU-local data structures to the AP before allowing it to continue.
The trampoline can be loaded at any page boundary in the first 1 MB of address space. It expects to be run with an initial IP of 0, which is what a SIPI will provide.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sun Aug 23, 2020 5:57 am
by PeterX
@nullplan What puzzles me is this instruction:
Are you sure about it?
Greetings
Peter
Re: AP CPUs do not started up after SIPI IPI
Posted: Sun Aug 23, 2020 7:30 am
by iman
mkfree wrote:when you assembled the springboard code you told the compiler which will assemble at address 0x80000 ?
nullplan wrote:For this reason, I am using a pretty much self-contained trampoline
Now I solved my problem. As I understood from your two replies, I forgot to rebase the physical addresses to be available for the AP CPUs in the trampoline code. My working trampoline code now looks something like this:
Code: Select all
%define REBASE(ADDR) (ADDR - ap_cpu_trampoline_code + 0x7000)
%define REBASE32(ADDR) (ADDR - mp_32_start + 0x8000)
extern printk
extern gdt_pointer
extern idt_pointer
ap_cpu_trampoline_code:
[BITS 16]
cli
mov eax, cr0
or eax, 0x1
mov cr0, eax
;a32 lidt [REBASE(idt16_ptr)]
a32 lgdt [REBASE(gdt16_ptr)]
jmp 0x18:0x8000
gdt16_base: ; GDT descriptor table
.null: ; 0x00 - null segment descriptor
dd 0x00000000 ; must be left zero'd
dd 0x00000000 ; must be left zero'd
.code32: ; 0x01 - 32bit code segment descriptor 0xFFFFFFFF
dw 0xFFFF ; limit 0:15
dw 0x0000 ; base 0:15
db 0x00 ; base 16:23
db 0x9A ; present, iopl/0, code, execute/read
db 0xCF ; 4Kbyte granularity, 32bit selector; limit 16:19
db 0x00 ; base 24:31
.data32: ; 0x02 - 32bit data segment descriptor 0xFFFFFFFF
dw 0xFFFF ; limit 0:15
dw 0x0000 ; base 0:15
db 0x00 ; base 16:23
db 0x92 ; present, iopl/0, data, read/write
db 0xCF ; 4Kbyte granularity, 32bit selector; limit 16:19
db 0x00 ; base 24:31
.code16: ; 0x03 - 16bit code segment descriptor 0x000FFFFF
dw 0xFFFF ; limit 0:15
dw 0x0000 ; base 0:15
db 0x00 ; base 16:23
db 0x9A ; present, iopl/0, code, execute/read
db 0x0F ; 1Byte granularity, 16bit selector; limit 16:19
db 0x00 ; base 24:31
.data16: ; 0x04 - 16bit data segment descriptor 0x000FFFFF
dw 0xFFFF ; limit 0:15
dw 0x0000 ; base 0:15
db 0x00 ; base 16:23
db 0x92 ; present, iopl/0, data, read/write
db 0x0F ; 1Byte granularity, 16bit selector; limit 16:19
db 0x00 ; base 24:31
gdt16_ptr: ; GDT table pointer for 16bit access
dw gdt16_ptr - gdt16_base - 1 ; table limit (size)
dd gdt16_base ; table base address
idt16_ptr: ; IDT table pointer for 16bit access
dw 0x03FF ; table limit (size)
dd 0x00000000 ; table base address
trampoline_end:
mp_32_start:
jmp 0x08:REBASE32(next)
[BITS 32]
next:
; --- to make sure ------> enable A20
in al, 0x92
or al, 2
out 0x92, al
mov esp, @mp_stack
mov ax, 0x10
mov ds, ax
mov es, ax
mov ss, ax
mov fs, ax
mov gs, ax
;mov eax, 0xB8000
;mov BYTE [eax], 'O'
;add eax, 1
;mov BYTE [eax], 0x0F
;;++++++++++++++++++++++++++++++++++++++++++++++++++++++++ <---- PROBLEM HERE
push hello_message
call printk
;;++++++++++++++++++++++++++++++++++++++++++++++++++++++++ <---- PROBLEM HERE
hlt
mp_32_end:
section .bss
space resb 4096
@mp_stack
section .data
hello_message: db "Hello from mp", 0
It works fine and prints expected letter to the screen, but any call to the specified
extern functions (e.g.
printk in my above mentioned code) will be resulted into a triple fault and rebooting.
- I had the suspicion that A20 line can solve the problem, but it did not help (here is the code for A20 activation through port 0x92 but I also tried the activation via keyboard).
- even doing
Code: Select all
xor ax, ax
mov ds, ax
lgdt [gdt_pointer]
lidt [idt_pointer]
right after
mov esp, @pm_stackwill be resulted to triple fault. Both idt_pointer and gdt_pointer are the BSP kernel pointers.
- Even if I call the physical address of printk instead of its extern label, the same problem occurs (and true for idt_pointer and gdt_pointer).
What did I do wrong?
Best.
Iman.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sun Aug 23, 2020 10:00 am
by nullplan
PeterX wrote:@nullplan What puzzles me is this instruction:
Are you sure about it?
Greetings
Peter
Dangit. My brain was still in GAS mode. That should have read
{code]mov ebx,cs[/code]
Weirdly, the line below that is correct again. Although from another thread I just read I learned that this doesn't do what I thought it would, so I probably should just zero out EBX the normal way. What I thought would happen is that CS would get zero-extended to 32 bits, thereby overwriting the top half of EBX with zeroes. Now, it shouldn't really matter, since the INIT IPI should initialize all GPRs to zero, but you never know what they'll manage to break these days. I wanted to only make this code dependent on the CS:IP of xx00:0000 at the start, that should be guaranteed by the startup IPI. Well, and that the EFLAGS are lacking the interrupt flag even before we start, because otherwise, a whole lot of things will go wrong.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sun Aug 23, 2020 10:08 am
by nullplan
iman wrote:- I had the suspicion that A20 line can solve the problem, but it did not help (here is the code for A20 activation through port 0x92 but I also tried the activation via keyboard).
No, A20 is about the hardware external to the CPU. You already took care of it when bringing up the BSP, you don't need to do that again. And for other hardware, it can be actively harmful.
iman wrote:- even doing
[...]
right after mov esp, @pm_stackwill be resulted to triple fault. Both idt_pointer and gdt_pointer are the BSP kernel pointers.
Hmmm... tripple fault? Weird. Well, if those pointers are virtual addresses, then obviously that won't work when the AP still hasn't set up paging. That will likely mean setting ESP to an address that doesn't exist, and GDTR and IDTR as well. So you probably got machine check errors or the like, or GPFs when trying to access those descriptors, and since the IDTR was also still broken, those couldn't be serviced. Therefore tripple fault. That it still happened when you replaced everything with physical addresses is weird, though.
Re: AP CPUs do not started up after SIPI IPI
Posted: Sun Aug 23, 2020 10:22 am
by mkfree
Since I don't see the printk routine, it would be good if I created a routine
just to test that the CPU stays in the address space of the
kernel, for example jmp mykernel, or you can enter from kernel
the jump place, declaring a variable where it will be, that's how it was
as in the end I decided it would end up like this: jmp [mykernel]
void mykernel () {
while (1);
}
if it does not give you problems you can verify in quemu, in monitor use the command "info cpus"
and see what value the pc has, and see if you want in the code if you are there with (OBJDUMP).
I give you an example of how I did it at the time I was testing the code, I was doing it
per step, I tell you a simple routine since you still need to activate pagination, and others
more than you need, I did everything from the springboard code in assembler and from the kernel I wrote the necessary data to make the cpu ready to enter the kernel:
Code: Select all
// This function is passed from the kernel
void c_bootCpus () {
while (1);
}
//
struct SptrBootCpusData {
Sgdtr kgdt;
u32 stackCpu;
u32 directoryPages;
u32 bootCpusCode;
u16 bootCPUmagicNumber;
} __attribute __ ((packed));
extern u32 c_bootCpus;
bool Clapic :: initializedCPUs (Scpus * cpu) {
core.memory.memcpy ((char *) 0x1000, (char *) & asm_bootCPUs, 512); // Trampoline code for the cpu
SptrBootCpusData * ptrBootCpus = (SptrBootCpusData *) (0x1000 + 492);
ptrBootCpus-> bootCpusCode = (u32) & c_bootCpus; // Kernel jump code
ptrBootCpus-> bootCPUmagicNumber = 0x0; // Magic number CPU initialized
ptrBootCpus-> kgdt = cpu-> gdtr; // Table of global descriptors
ptrBootCpus-> directoryPages = VM_KERNEL_PAGE_DIR; // Directory of pages
ptrBootCpus-> stackCpu = cpu-> kstack; // Stack stack for this CPU
// CPU initialization
// .........................................
Re: AP CPUs do not started up after SIPI IPI
Posted: Mon Aug 24, 2020 1:08 am
by iman
nullplan wrote:That will likely mean setting ESP to an address that doesn't exist, and GDTR and IDTR as well. So you probably got machine check errors or the like, or GPFs when trying to access those descriptors, and since the IDTR was also still broken, those couldn't be serviced. Therefore tripple fault. That it still happened when you replaced everything with physical addresses is weird, though.
I found a way to get it worked perfectly.
First of all, I reserved a DWORD-sized memory as a temporary place holder and had to copy the physical address of all
extern labels in it and use them indirectly.
Something like this:
Code: Select all
extern printk
extern gdt_pointer
extern idt_pointer
place_holder:
dd 0x00000000
and these codes:
Code: Select all
mov eax, printk
mov ebx, REBASE(place_holder)
mov DWORD[ebx], eax
push message
call DWORD[REBASE(place_holder)]
; ...
mov eax, gdt_pointer
mov ebx, REBASE(place_holder)
mov DWORD[ebx], eax
lgdt [REBASE(place_holder)]
; ... and the same for idt_pointer ...
Then I was able to use
lidt, lgdt, and any external
calls.
I have to figure out what might have been wrong, at first place, when I called
extern labels directly.
If there is no more comments, I'd like to label this question as
SOLVED.
Best regards.
Iman.