VMX - Guest hangs if the host has the APs started
Posted: Wed Feb 11, 2015 12:04 pm
This has a bit to do with OS development, but I think it's place is here.
My host starts in real mode, transitions into long mode, sets up a simple memory manager, an IDT, a GDT, all the basic stuff then it checks if the system has any APs and starts them up and brings them into long mode.
After this I initialize a VCPU for every active CPU I have. All the VCPU corresponding to APs are set up as unrestricted guests in halt state (more on this later).
BSP's VCPU is also unrestricted, with real mode set up and with a small 16 bit code that loads the MBR at 0x0000:0x7C00 and gives control to it.
There is a order in which this is done: the BSP won't start it's guest until all the APs have started (well, halted) theirs.
The problem is this: after the jump into the MBR code the guest hangs - no VM Exit, no exception, no nothing.
The weird part is that this hang happens only when the APs are started. If I never bother to wake them up, the Windows starts to boot up and I get various VM Exits according to what it is trying to do.
At first I thought that the VMCS's where poorly allocated and they overlapped, but the problem persists even if I create no VMCS and VMXON regions for the APs. The fist MB of memory is safely preserved (I obtain a e820 memory map when I first start up and I make sure I only use the ranges marked as free).
I have some simple debugging features for the guest and I know that the jump into the MBR code is successful and that it also gives control further (to the Windows' loader I presume). Then, protected mode is enabled (with no paging for the moment).
This moves pretty slowly as I have to make sure I don't screw things up (for example, it does some RDTSCs in a row and checks the difference between them - those I can escape with VM Exit on RDTSC, but I'm not sure what is the entire purpose of those TSC reads and I may be interfering with other things; for now I try to make sure I set break points after those TSC checking sequences.
I manage to find out where exactly the AP version is hanging.
The following snippet is just to illustrate the difference between the working and the not working versions, I don't think it, by itself, is of any help right now:
The working version seems to always have EBX = 1 at this point and ESI some other value (I'm not sure what this represent, but I'm hopeful I can at least make an idea about where they came from). The not working version is always making that jump. It seems to be an infinite loop as it will get back to this point after a while.
I managed to track down the sources for those values (2 previous calls), but I haven't walked through that piece of code yet. I'll update this post as soon as I can after I find out what's happening there. I'll also double check to make sure that this is not another TSC thingy. I can manipulate the guest with the debugger so if I set EBX to a different value that that of ESI everything works like the no-AP version. I can't tell if it will successfully detect the other CPUs on the system, but it gets past the point of hanging.
Maybe I did some other things wrong and all this attempt at debugging the guest is for nothing.
My APs starting sequence is this: I parse the APIC tables and I obtain the CPU IDs. For every AP I send a INIT-SIPI-SIPI (Actually, if the AP signals that it has awaken after the first SIPI I don't send the second one - should I send it anyway?). The first thing an AP does after it is awaken is trying to obtain a global spin lock, after it has obtain it it does some information exchange with the BSP (like obtaining a temporary, but unique, stack pointer), releases the lock, enters in protected mode, enables paging, PAE, gets into long mode, loads a GDT, TSS and IDT, and obtains a new unique long mode stack. After all these are done, each AP starts it's own VCPU. In the meantime, the BSP waits for all the APs to finish and only after that it starts it's virtual machine (all this is synchronized).
I'm not using fancy stuff. All the control field are mainly on the default values (obtained from the MSRs) with the added exception control vector which is set to 0xFFFFFFFF. I also have: ept enabled (identity mapped), unrestricted guest.
BSP's activity state is set to 0 (active).
When it comes to the APs I'm not sure if they should be in wait-for-sipi state or halt state. I tried both, nothing changes.
From what I understand, they should be set in wait for SIPI state, unrestricted and I should get a VM Exit due to an INIT signal. I'm not sure on this part and the usual try - fail - try again method that helps me when I'm a bit lost on what the manual says can't help me at the moment because everything fails.
If I set the APs to active they start to execute code from the address I set in their RIP state field.
Did I forgot to do something? Are there other places I should look into?
Sorry if this is a long post but it lacks enough details, but I'm trying to get opinions from other people while I'm working on solving this. Maybe I'm looking in the wrong place / I'm doing a beginner mistake that's evident to someone around here.
I'll update this thread.
My host starts in real mode, transitions into long mode, sets up a simple memory manager, an IDT, a GDT, all the basic stuff then it checks if the system has any APs and starts them up and brings them into long mode.
After this I initialize a VCPU for every active CPU I have. All the VCPU corresponding to APs are set up as unrestricted guests in halt state (more on this later).
BSP's VCPU is also unrestricted, with real mode set up and with a small 16 bit code that loads the MBR at 0x0000:0x7C00 and gives control to it.
There is a order in which this is done: the BSP won't start it's guest until all the APs have started (well, halted) theirs.
The problem is this: after the jump into the MBR code the guest hangs - no VM Exit, no exception, no nothing.
The weird part is that this hang happens only when the APs are started. If I never bother to wake them up, the Windows starts to boot up and I get various VM Exits according to what it is trying to do.
At first I thought that the VMCS's where poorly allocated and they overlapped, but the problem persists even if I create no VMCS and VMXON regions for the APs. The fist MB of memory is safely preserved (I obtain a e820 memory map when I first start up and I make sure I only use the ranges marked as free).
I have some simple debugging features for the guest and I know that the jump into the MBR code is successful and that it also gives control further (to the Windows' loader I presume). Then, protected mode is enabled (with no paging for the moment).
This moves pretty slowly as I have to make sure I don't screw things up (for example, it does some RDTSCs in a row and checks the difference between them - those I can escape with VM Exit on RDTSC, but I'm not sure what is the entire purpose of those TSC reads and I may be interfering with other things; for now I try to make sure I set break points after those TSC checking sequences.
I manage to find out where exactly the AP version is hanging.
The following snippet is just to illustrate the difference between the working and the not working versions, I don't think it, by itself, is of any help right now:
Code: Select all
RIP: 0x422879: CMP ebx, esi
RIP: 0x42287B: JZ 0xfffffffffffffff7
RIP: 0x42287D: RDTSC
I managed to track down the sources for those values (2 previous calls), but I haven't walked through that piece of code yet. I'll update this post as soon as I can after I find out what's happening there. I'll also double check to make sure that this is not another TSC thingy. I can manipulate the guest with the debugger so if I set EBX to a different value that that of ESI everything works like the no-AP version. I can't tell if it will successfully detect the other CPUs on the system, but it gets past the point of hanging.
Maybe I did some other things wrong and all this attempt at debugging the guest is for nothing.
My APs starting sequence is this: I parse the APIC tables and I obtain the CPU IDs. For every AP I send a INIT-SIPI-SIPI (Actually, if the AP signals that it has awaken after the first SIPI I don't send the second one - should I send it anyway?). The first thing an AP does after it is awaken is trying to obtain a global spin lock, after it has obtain it it does some information exchange with the BSP (like obtaining a temporary, but unique, stack pointer), releases the lock, enters in protected mode, enables paging, PAE, gets into long mode, loads a GDT, TSS and IDT, and obtains a new unique long mode stack. After all these are done, each AP starts it's own VCPU. In the meantime, the BSP waits for all the APs to finish and only after that it starts it's virtual machine (all this is synchronized).
I'm not using fancy stuff. All the control field are mainly on the default values (obtained from the MSRs) with the added exception control vector which is set to 0xFFFFFFFF. I also have: ept enabled (identity mapped), unrestricted guest.
BSP's activity state is set to 0 (active).
When it comes to the APs I'm not sure if they should be in wait-for-sipi state or halt state. I tried both, nothing changes.
From what I understand, they should be set in wait for SIPI state, unrestricted and I should get a VM Exit due to an INIT signal. I'm not sure on this part and the usual try - fail - try again method that helps me when I'm a bit lost on what the manual says can't help me at the moment because everything fails.
If I set the APs to active they start to execute code from the address I set in their RIP state field.
Did I forgot to do something? Are there other places I should look into?
Sorry if this is a long post but it lacks enough details, but I'm trying to get opinions from other people while I'm working on solving this. Maybe I'm looking in the wrong place / I'm doing a beginner mistake that's evident to someone around here.
I'll update this thread.