Page 1 of 1

Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 12:33 am
by wangt13
I am learning Intel VMX, and worked out a Linux based hypervisor.
The Linux is guest OS running in VMWare.

The hypervisor can run real-mode VM well, but the VM failed to enter protected mode from real mode.
Here is the guest code (referring to https://github.com/guilleiguaran/xv6/bl ... /bootasm.S).

Code: Select all

#define SEG_KCODE 1  // kernel code
#define SEG_KDATA 2  // kernel data+stack
#define SEG_KCPU  3  // kernel per-cpu data
#define SEG_UCODE 4  // user code
#define SEG_UDATA 5  // user data+stack
#define SEG_TSS   6  // this process's task state

#define CR0_PE          0x00000001      // Protection Enable

#define SEG_NULLASM                                             \
    .word 0, 0;                                             \
    .byte 0, 0, 0, 0

// The 0xC0 means the limit is in 4096-byte units
// and (for executable segments) 32-bit mode.
#define SEG_ASM(type,base,lim)                                  \
        .word (((lim) >> 12) & 0xffff), ((base) & 0xffff);      \
        .byte (((base) >> 16) & 0xff), (0x90 | (type)),         \
        (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)

#define STA_X     0x8       // Executable segment
#define STA_E     0x4       // Expand down (non-executable segments)
#define STA_C     0x4       // Conforming code segment (executable only)
#define STA_W     0x2       // Writeable (non-executable segments)
#define STA_R     0x2       // Readable (executable segments)
#define STA_A       0x1 // Accessed
# Start the first CPU: switch to 32-bit protected mode, jump into C.

        .code16
        .global code16, code16_end
code16:
        xor %ecx, %ecx
        mov %cr3, %eax
        mov %eax, %cr3
    seta20.1:
        inb     $0x64,%al               # Wait for not busy
        testb   $0x2,%al
        jnz     seta20.1

        movb    $0xd1,%al               # 0xd1 -> port 0x64
        outb    %al,$0x64

    seta20.2:
        inb     $0x64,%al               # Wait for not busy
        testb   $0x2,%al
        jnz     seta20.2

        movb    $0xdf,%al               # 0xdf -> port 0x60
        outb    %al,$0x60

        wrmsr

        lgdt    gdtdesc
        movl    %cr0, %eax
        orl     $CR0_PE, %eax
        movl    %eax, %cr0

        rdmsr      <======
//PAGEBREAK!
# Complete transition to 32-bit protected mode by using long jmp
# to reload %cs and %eip.  The segment descriptors are set up with no
# translation, so that the mapping is still the identity mapping.

         ljmp    $(SEG_KCODE<<3), $start32

        .code32  # Tell assembler to generate 32-bit code now.
start32:
cid:
        cpuid
        # Bootstrap GDT

        .p2align 2                                # force 4 byte alignment
gdt:
        SEG_NULLASM                              # NULL seg
        SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff)   # code seg
        SEG_ASM(STA_W, 0x0, 0xffffffff)         # data seg

gdtdesc:
        .word   (gdtdesc - gdt - 1)             # sizeof(gdt) - 1
        .long   gdt
code16_end:
The Makefile for the guest code is below,

Code: Select all

guest: guest_app.c
        $(CC) -Wall -Wextra -Werror $^ -o $@
        $(CC) $(G_CFLAGS) -fno-pic -nostdinc -I. -c code16.S
        $(LD) $(G_LDFLAGS) -N -e start -Ttext 0x7C00 -o bootblock.o code16.o
        $(OBJCOPY) -S -O binary -j .text bootblock.o bootblock.bin
The guest can NOT long jump to code32 to run.
To debug it, I added rdmsr just before ljmp.
The rdmsr can trigger VM_EXIT as expected.
The guest state and VMCS at that moment is as belows,

Code: Select all

VMCS fields.
 0x0000003F = control_VMX_pin_based
 0xA501E1F2 = control_VMX_cpu_based
 0x00000082 = control_VMX_proc2_based
 0x00000000 = control_exception_bitmap
 0x00000000 = control_pagefault_errorcode_mask
 0xFFFFFFFF = control_pagefault_errorcode_match
 0x00000000 = control_CR3_target_count
 0x00036FFB = control_VM_exit_controls
 0x000011FB = control_VM_entry_controls
 0x00000000 = control_VM_entry_interruption_information
 0x00000000 = control_VM_entry_exception_errorcode
 0x00000000 = control_VM_entry_instruction_length

 0xFFFFFFFFFFFFFFF7 = control_CR0_mask
 0xFFFFFFFFFFFFF871 = control_CR4_mask
 0x0000000060000010 = control_CR0_shadow
 0x0000000000000000 = control_CR4_shadow
 0x0000000000000000 = control_CR3_target0
 0x00000000B7934000 = control_CR3_target1
 0x0000000000000000 = control_CR3_target2
 0x0000000000000000 = control_CR3_target3


Guest state:
 CR0=0000000000000031  CR3=0000000000000000  CR4=0000000000002050

 RSP=0000000000007BFA  SYSENTER_ESP=0000000000000000
 RIP=0000000000007C2E  SYSENTER_EIP=0000000000000000
 DR7=0000000000000400  SYSENTER_CS=00000000  RFLAGS=0000000000000006

   ES=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   CS=0000  [ base=0000000000000000 limit=0000FFFF rights=0000009B ]
   SS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   DS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   FS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   GS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
 LDTR=0000  [ base=0000000000000000 limit=0000FFFF rights=00000082 ]
   TR=0000  [ base=0000000000000000 limit=0000FFFF rights=0000008B ]
      GDTR  [ base=0000000000007C3C limit=00000017 ]
      IDTR  [ base=0000000000000000 limit=0000FFFF ]

 EAX=60000011  ECX=00000000  ESI=00000000  ESP=00007BFA   extints=0
 EBX=00000000  EDX=00000000  EDI=00000000  EBP=00000000   nmiints=0

The cpuinfo of the Linux running in VMware is below.
processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
stepping        : 2
microcode       : 0x3c
cpu MHz         : 2397.291
cache size      : 15360 KB
physical id     : 2
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt arat
bugs            :
bogomips        : 4801.89
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management:
I don't know why the guest can NOT jump and start in start32.

Then, I ran kvm-hello-world in this Linux (it is a Linux VM in vmware), to get the VMCS of KVM.
I did NOT find big difference which can fail my hypervisor to start protected VM.

Thanks,

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 1:44 am
by alexfru
I can't immediately see a problem other than there being data (GDT) after cpuid. Normally you don't want to execute data as code. I'd put an infinite loop at the end of the code.
But what if you move your rdmsr to the beginning of the 32-bit code? Is it reached and intercepted by the hypervisor?

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 2:15 am
by wangt13
alexfru wrote:I can't immediately see a problem other than there being data (GDT) after cpuid. Normally you don't want to execute data as code. I'd put an infinite loop at the end of the code.
But what if you move your rdmsr to the beginning of the 32-bit code? Is it reached and intercepted by the hypervisor?
You are right, I should put a infinite loop there.
In my hypervisor, rdmsr instruction will cause VM exit, and quit.
So my expectation is rdmsr in start32 should trigger VM exit, and RIP will show that it is in start32 part.

But with rdmsr in start32, there is no such VM exit, instead, guest RIP is a random value, it means, the guest ran wildely.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 3:35 am
by iansjack
How does the assembler know where your code is located?

You should inspect the generated code and look at what location it is jumping to.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 3:46 am
by alexfru
iansjack wrote:How does the assembler know where your code is located?

You should inspect the generated code and look at what location it is jumping to.
The state dump seems reasonable. But yeah, just to make sure, I'd love to see the disassembly of the binary.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 3:56 am
by iansjack
The state dump doesn't show the value of the label $start32. I suspect that it is something like 0x002F. Jumping to $8:$002F would certainly lead to unpredictable results.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 5:11 am
by alexfru
iansjack wrote:The state dump doesn't show the value of the label $start32. I suspect that it is something like 0x002F. Jumping to $8:$002F would certainly lead to unpredictable results.
What about GDTR.base = 7C3C? Looks like LGDT worked fine and it doesn't use IP-relative addressing unlike short jumps.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 7:10 am
by iansjack
It's just a matter of what code is generated for "jmp $8, $start32". It's easy enough for the OP to check and seems to be an obvious first step in debugging the problem.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Fri Apr 20, 2018 7:29 am
by wangt13
I just recompiled the guest code. Here is the disassembly code of above guest code.

Code: Select all

bootblock.o:     file format elf32-i386



Disassembly of section .text:

00007c00 <code16>:
    7c00:       66 31 c9                xor    %ecx,%ecx
    7c03:       0f 20 d8                mov    %cr3,%eax
    7c06:       0f 22 d8                mov    %eax,%cr3

00007c09 <seta20.1>:
    7c09:       e4 64                   in     $0x64,%al
    7c0b:       a8 02                   test   $0x2,%al
    7c0d:       75 fa                   jne    7c09 <seta20.1>
    7c0f:       b0 d1                   mov    $0xd1,%al
    7c11:       e6 64                   out    %al,$0x64

00007c13 <seta20.2>:
    7c13:       e4 64                   in     $0x64,%al
    7c15:       a8 02                   test   $0x2,%al
    7c17:       75 fa                   jne    7c13 <seta20.2>
    7c19:       b0 df                   mov    $0xdf,%al
    7c1b:       e6 60                   out    %al,$0x60
    7c1d:       0f 30                   wrmsr
    7c1f:       0f 01 16 54 7c          lgdtw  0x7c54
    7c24:       0f 20 c0                mov    %cr0,%eax
    7c27:       66 83 c8 01             or     $0x1,%eax
    7c2b:       0f 22 c0                mov    %eax,%cr0
    7c2e:       0f 32                   rdmsr
    7c30:       ea 35 7c 08 00          ljmp   $0x8,$0x7c35

00007c35 <start32>:
    7c35:       0f 32                   rdmsr

00007c37 <spin>:
    7c37:       f4                      hlt
    7c38:       eb fd                   jmp    7c37 <spin>
    7c3a:       0f a2                   cpuid

00007c3c <gdt>:
        ...
    7c44:       ff                      (bad)
    7c45:       ff 00                   incw   (%bx,%si)
    7c47:       00 00                   add    %al,(%bx,%si)
    7c49:       9a cf 00 ff ff          lcall  $0xffff,$0xcf
    7c4e:       00 00                   add    %al,(%bx,%si)
    7c50:       00 92 cf 00             add    %dl,0xcf(%bp,%si)

00007c54 <gdtdesc>:
    7c54:       17                      pop    %ss
    7c55:       00 3c                   add    %bh,(%si)
    7c57:       7c 00                   jl     7c59 <gdtdesc+0x5>
        ...

and the running result is below,

Code: Select all

Guest State

 CR0=0000000000000031  CR3=0000000000000000  CR4=0000000000002050

 RSP=0000000000007BFA  SYSENTER_ESP=0000000000000000
 RIP=0000000000007C2E  SYSENTER_EIP=0000000000000000
 DR7=0000000000000400  SYSENTER_CS=00000000  RFLAGS=0000000000000006

   ES=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   CS=0000  [ base=0000000000000000 limit=0000FFFF rights=0000009B ]
   SS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   DS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   FS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
   GS=0000  [ base=0000000000000000 limit=0000FFFF rights=00000093 ]
 LDTR=0000  [ base=0000000000000000 limit=0000FFFF rights=00000082 ]
   TR=0000  [ base=0000000000000000 limit=0000FFFF rights=0000008B ]
      GDTR  [ base=0000000000007C3C limit=00000017 ]
      IDTR  [ base=0000000000000000 limit=0000FFFF ]

 EAX=60000011  ECX=00000000  ESI=00000000  ESP=00007BFA
 EBX=00000000  EDX=00000000  EDI=00000000  EBP=00000000

Another finding, when i run this in a bare-metal Linux, i found the PE bit in guest CR0 is NOT 1, when rdmsr caused VM_EXIT, still checking on it.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Sat Apr 21, 2018 2:13 am
by alexfru
There's one thing that bothers me in this code. It never disables interrupts, which are catastrophic since there are no proper protected mode handlers for them, and yet the dump indicates that interrupts are disabled (bit 9 of (e|r)flags, AFAIR). I'd throw in the CLI instruction for a good measure, just to exclude faults emanating from e.g. the timer or keyboard interrupts.

Re: Virtual machine failed to enter protected mode in VMX

Posted: Sat Apr 21, 2018 10:31 pm
by wangt13
I figured out the reason of the failure.
It comes from the settings of CR0_mask, and CR0_shadow.
I should follow Intel's SDM to set them, so that guest can set bit0_PE in CR0 to enable protected mode.
With this, guest can ljmp to start32 to run.

Thanks,