Page 1 of 1

[Solved] - SMP trampoline page faulting with allocated stack

Posted: Thu Oct 01, 2020 2:52 pm
by kzinti
I've been working on SMP trampoline code for the past few days and I am running into an issue I can't seem to figure out.

We are talking 32 bits x86 here, but I have the exact same problem with 64 bits x86.

1) I allocate a page in low memory (typically it is at 0x1000)
2) I copy the trampoline code from the kernel image to 0x1000
3) I write some data to 0x1F00 (parameters for the trampoline, it has the CR3, stack and entry point to use)
4) Do the IIPI-SIPI-SIPI dance
5) Trampoline executes and everything is looking good until the end where I push something unto the stack. Then according to QEMU I get a page fault.

Code: Select all

    # Setup stack
    movl    0xF08(%ebx), %esp

    # Jump to kernel
    movl    %ebx, %eax
    addl    $0x0F00, %eax       # eax = TrampolineContext*
    pushl   %eax                # Param 1: TrampolineContext*   --> PAGE FAULT
    call    0xF0C(%ebx)
Page fault details from QEMU:

Code: Select all

check_exception old: 0xffffffff new 0xe
     0: v=0e e=000b i=0 cpl=0 IP=0008:00001082 pc=00001082 SP=0010:ff7fe000 CR2=ff7fdffc
EAX=00001f00 EBX=00001000 ECX=00000000 EDX=000006f3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=ff7fe000
EIP=00001082 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     00001090 00000017
IDT=     00000000 00000000
CR0=80000011 CR2=ff7fdffc CR3=bffce000 CR4=000000a0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000f00 CCD=00001f00 CCO=ADDL    
EFER=0000000000000000
6) Something is not working with the stack I dynamically allocate. If I hardcode the stack location anywhere in the first 4GB of memory or if I place the stack inside the trampoline page (say at 0x1F00), everything works just fine. I don't get any page fault.
7) I believe it works when I place the stack at VMA < 4 GB because the first 4GB of physical memory was identity-mapped at boot time. I believe it doesn't work when I allocate the stack dynamically because of some memory cache/TLB synchronization between the two processors.
8) I tried memory barriers, full flushing and so on by reloading CR3 on the main processor. Nothing seems willing to make it work.
9) Accessing the stack from the BSP processor (after mapping it in VM) works just fine. Only the APs have problems.

I am using recursive page mapping at the moment (PAE). Here are the changes made to the page table when I allocate the stack:

Code: Select all

Stack physical address 0x0013a000, mapping it to virtual address 0xFF7FD000, flags 0x3 (present + write)
PML3 - previous 0xbffcd001, now 0xbffcd001 (unchanged, as expected)
PML2 - previous 0xbffc9163, now 0xbffc9163 (unchanged, as expected)
PML1 - previous 0x00000000, now 0x0013a103 (looks good, this is physical address with PAGE GLOBAL + PAGE WRITE + PAGE PRESENT
Again accessing that memory from BSP after the allocation + mapping works fine. In fact the VM allocation function has been running fine for months for other purposes.

Accessing that same memory from the AP (namely when pushing a value on the stack) results in a pagefault (exception 0xE):

Code: Select all

check_exception old: 0xffffffff new 0xe
     0: v=0e e=000b i=0 cpl=0 IP=0008:00001082 pc=00001082 SP=0010:ff7fe000 CR2=ff7fdffc
EAX=00001f00 EBX=00001000 ECX=00000000 EDX=000006f3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=ff7fe000
EIP=00001082 EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
- Error 0x0b --> PAGEFAULT RESERVED + PAGEFAULT WRITE + PAGE FAULT PRESENT
- %esp is where I expect it to be: 0xff7fe000
- CR2 is at 0xff7fdffc which is what happens when a stack push triggers a page fault

I've tried debugging using bochs. Same behaviour. Manually inspecting the memory and verifying the page tables didn't provide any clue as to what is going on.

Real hardware also resets, so likely the same problem.

I have no clue why the "reserved" bit is set on the page fault error, my code never ever sets the reserved bits anywhere and I am careful to zero-out all memory pages when I allocate them.

You can find the code where I allocate the stack and wake up the APs here:
https://github.com/kiznit/rainbow-os/bl ... #L100-L102

The stack access triggering the page fault:
https://github.com/kiznit/rainbow-os/bl ... smp.S#L107

You can also see where I access the stack from the BSP before starting the APs (no page fault here):
https://github.com/kiznit/rainbow-os/bl ... u.cpp#L112

Thanks for any help or comments here.

Re: SMP trampoline page faulting with allocated stack

Posted: Thu Oct 01, 2020 9:55 pm
by nullplan
kzinti wrote:8) I tried memory barriers, full flushing and so on by reloading CR3 on the main processor. Nothing seems willing to make it work.
Loading CR3 only does a TLB flush, which will be irrelevant to any other CPU. Actual memory barriers would be mfence, lfence, or sfence. But these should not matter here, since the initialization of the AP provably happens after changing the memory. Caches are coherent on x86, and memory ordering is strong. So this should not matter.
kzinti wrote:9) Accessing the stack from the BSP processor (after mapping it in VM) works just fine. Only the APs have problems.
Does the BSP see the same address? If so, then the paging structure you create for the AP does not contain the stack. That seems like a place where I would start searching.

Re: SMP trampoline page faulting with allocated stack

Posted: Thu Oct 01, 2020 10:54 pm
by kzinti
nullplan wrote:Actual memory barriers would be mfence, lfence, or sfence.
I did try with mfence (and mfence + CR3 load) and that didn't help.
nullplan wrote:But these should not matter here, since the initialization of the AP provably happens after changing the memory. Caches are coherent on x86, and memory ordering is strong. So this should not matter.
I did some research today about this and you are confirming my understanding. Thanks for this.
nullplan wrote:Does the BSP see the same address? If so, then the paging structure you create for the AP does not contain the stack. That seems like a place where I would start searching.
The same paging structure (same CR3 value) is used by both the BSP and the AP. Yet I can read/write to the stack from the BSP, but not from the AP.


I just double checked using bochs:

- CR3 is set to the same value for both the BSP and AP.

- I also manually dumped the page tables to verify that all entries are present in the right places.

Code: Select all

Stack physical address 0x0013a000, mapping it to virtual address 0xFF7FD000, flags 0x3 (present + write)
PML3 - previous 0xbffcd001, now 0xbffcd001 (unchanged, as expected)
PML2 - previous 0xbffc9163, now 0xbffc9163 (unchanged, as expected)
PML1 - previous 0x00000000, now 0x0013a103 (looks good, this is physical address with PAGE GLOBAL + PAGE WRITE + PAGE PRESENT
- The BSP can write to the newly allocated memory:

Code: Select all

*((uint32_t*)0xff7fd000) = 65;
- The AP crashes when I try to do the same in the trampoline:

Code: Select all

    movl    %eax, 0xff7fd000
check_exception old: 0xffffffff new 0xe
0: v=0e e=000b i=0 cpl=0 IP=0008:0000107b pc=0000107b SP=0010:ff7fe000 CR2=ff7fd000
EAX=00000010 EBX=00001000 ECX=00000000 EDX=000006f3
ESI=00000000 EDI=00000000 EBP=00000000 ESP=ff7fe000
EIP=0000107b EFL=00000006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 000010a0 00000017
IDT= 00000000 00000000
CR0=80000011 CR2=ff7fd000 CR3=bffce000 CR4=000000a0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00001000 CCD=0000105c CCO=ADDL
EFER=0000000000000000

Re: SMP trampoline page faulting with allocated stack

Posted: Fri Oct 02, 2020 2:54 pm
by nullplan
Well I am fresh out of ideas. The error code indicates you are using some reserved bit, which tells me you may be using a feature on the BSP you didn't turn on on the AP. Do the CR4 values match? I couldn't find any optional bits you are using, though. Not in those three levels, anyway.

Re: SMP trampoline page faulting with allocated stack

Posted: Fri Oct 02, 2020 4:45 pm
by kzinti
CR4s are matching as well.

Thanks for your help and taking the time to look at my code. I am also out of ideas. At this point I will refactor my code to have fault handlers in the trampoline and hope to find more data points.

Re: SMP trampoline page faulting with allocated stack

Posted: Fri Oct 02, 2020 10:00 pm
by bzt
kzinti wrote:- The AP crashes when I try to do the same in the trampoline:
kzinti wrote:The same paging structure (same CR3 value) is used by both the BSP and the AP.
Are you sure? Because
CR0=80000011 CR2=ff7fd000 CR3=bffce000 CR4=000000a0
I don't think that's a valid physical address with the paging tables (but could be). How much RAM does your VM have?

Cheers,
bzt

Re: SMP trampoline page faulting with allocated stack

Posted: Fri Oct 02, 2020 10:35 pm
by kzinti
The page fault details above are from running QEMU with 8 GB of ram. If it was an invalid address, how could anything run at all?

To verify that CR3 is the same on both CPUs I was using Bochs with 2 GB of RAM.

I basically read CR3 on the BSP and pass it as a parameter to the trampoline. It is hard to imagine that they could be different:

https://github.com/kiznit/rainbow-os/bl ... u.cpp#L109

https://github.com/kiznit/rainbow-os/bl ... .S#L69-L70

Re: SMP trampoline page faulting with allocated stack

Posted: Sat Oct 03, 2020 1:11 am
by linuxyne
EFER.nxe isn't set.

"If IA32_EFER.NXE = 0 and the P flag of a PDE or a PTE is 1, the XD flag (bit 63) is reserved."

Re: SMP trampoline page faulting with allocated stack

Posted: Sat Oct 03, 2020 1:14 am
by kzinti
Of course. Thanks so much, I was going nuts over this one.