No, it didn't help.
[strike]As far as i understand Intel docs, when guest writes EFER.LME, then entering long mode is delayed until write to CR0. When guest writes to CR0 and enables paging, CPU updates EFER.LMA. Hypervisor updates virtual EFER.LMA, writes it but bochs somehow sees only LME (0x00000100) in it.
And one more thing: vm->vmentry_ctrls.VMX_VMENTRY_CTRL1_X86_64_GUEST is also handled by hypervisor (if hypervisor doesn't update it when guest enters long mode, then vmenter fails on real hardware). So (i don't know yet the complete graph of relations between variables inside bochs), i think that either both of vm->vmentry_ctrls.VMX_VMENTRY_CTRL1_X86_64_GUEST and EFER.LM
A should be updated by bochs when guest_enables_paging && EFER.LM
E, or neither of them (if bochs uses variables from VMCS, not his own copies built from the parsed VMCS).
Short variant of the above: why bochs didn't update efer.lma of the guest, although hypervisor had done it?
May be bochs tests efer.lme and cr0.pg on vmexit_save_guest_state, but they are set by hypervisor, so bochs can see that they are set only on vmenter?[/strike]
Added some hours later:
Looks like i wasn't attentive enough, now i catch some difference.
Hypervisor log when on real hardware:
Code: Select all
Booting from 0000:7c00
Guest CR4 Write. HW CR4 2000, shadow CR4 0, new CR4 20
CR4 update: new value 2020
EFER Read HI=0 LO=0
MSR write for msr 0xc0000080
EFER Write HI=0 LO=100
VCPU[0]: Old shadow CR0: 0x0, New shadow CR0: 0x80000001, Guest CR0: 0x10031
Guest enables long mode
Guest (unrestricted) enables paging. Leaving CR3, CR4 and EFER as set by guest
EFER Read HI=0 LO=500
MSR write for msr 0xc0000080
EFER Write HI=0 LO=d01
VCPU[0]: Old shadow CR0: 0x80000001, New shadow CR0: 0x80050033, Guest CR0: 0x80000031
EFER Read HI=0 LO=d01
...
But under bochs:
Code: Select all
Booting from 0000:7c00
Guest CR4 Write. HW CR4 2000, shadow CR4 0, new CR4 20
CR4 update: new value 2020
VCPU[0]: Old shadow CR0: 0x0, New shadow CR0: 0x80000001, Guest CR0: 0x10031
Guest (unrestricted) enables paging. Leaving CR3, CR4 and EFER as set by guest
palacios/src/palacios/vmx_handler.c(62): VM Entry failed due to invalid guest state
palacios/src/palacios/vmx_handler.c(63): Printing VMCS: (NOTE: This VMCS may not belong to the correct guest)
...
Looks like guest (under bochs) didn't exit on MSR access.
Almost two hours later:
Even if it doesn't exit on msr access, we don't need it.
But i still don't catch why bochs see wrong value of EFER.
IASDM Volume 3B, 23.3.2.1 contains instructions about how guest regs and msrs are loaded (and sometimes even modified) but it didn't help, 'cos when “load IA32_EFER” VM-entry control is 1 CPU must just load IA32_EFER field, and this control is 1.
Ok, one more thing: hypervisor didn't print "Guest enables long mode", that means that when guest wrote CR0, vmexit occured but IA32_EFER field from VMCS didn't contain right value, i.e.: efer.lme wasn't set after vmexit (although, of course, "save IA32_EFER" VM-exit control is 1) !
Okay, i'm gonna dive into gdb one more time.
An hour later
1. Guest must have exited on EFER read/write. But it didn't. I'm really interested why, although, now i know one more place where i must put breakpoint.
2. ...
Half an hour later
Looks like i've finally found the bug:
cpu/vmexit.cc:319
Code: Select all
if (msr & 0xC0000000) {
if (msr > 0xC0001FFF) vmexit = 1;
else {
// check MSR-HI bitmaps
bx_phy_address pAddr = vm->msr_bitmap_addr + (msr >> 3) + 1024 + ((op == VMX_VMEXIT_RDMSR) ? 0 : 2048);
access_read_physical(pAddr, 1, &field);
BX_DBG_PHY_MEMORY_ACCESS(BX_CPU_ID, pAddr, 1, BX_READ, BX_MSR_BITMAP_ACCESS, &field);
if (field & (1 << (msr & 7)))
vmexit = 1;
}
}
Using (msr >> 3) in address calculation looks a bad idea for high part of bitmap. As far as i inderstand, right variant is:
Code: Select all
if (msr >= 0xC0000000) {
if (msr > 0xC0001FFF) vmexit = 1;
else {
// check MSR-HI bitmaps
bx_phy_address pAddr = vm->msr_bitmap_addr + ((msr - 0xC0000000) >> 3) + 1024 + ((op == VMX_VMEXIT_RDMSR) ? 0 : 2048);
access_read_physical(pAddr, 1, &field);
BX_DBG_PHY_MEMORY_ACCESS(BX_CPU_ID, pAddr, 1, BX_READ, BX_MSR_BITMAP_ACCESS, &field);
if (field & (1 << (msr & 7)))
vmexit = 1;
}
}
Also, there are too many magic constants in this code, don't you find?
In HV we have the following style for calculating offset in bitmap (it's not ideal, although it has some nice points)
Code: Select all
#define LOW_MSR_START 0x00000000
#define LOW_MSR_END 0x1fff
#define HIGH_MSR_START 0xc0000000
#define HIGH_MSR_END 0xc0001fff
#define LOW_MSR_INDEX 0
#define HIGH_MSR_INDEX 1024
static int get_bitmap_index(uint_t msr)
{
if( (msr >= LOW_MSR_START) && msr <= LOW_MSR_END) {
return LOW_MSR_INDEX + msr;
} else if (( msr >= HIGH_MSR_START ) && (msr <= HIGH_MSR_END)) {
return (HIGH_MSR_INDEX * 8) + (msr - HIGH_MSR_START);
} else {
PrintError("MSR out of range: 0x%x\n", msr);
return -1;
}
}
/* Same as SVM */
static int update_map(struct v3_vm_info * vm, uint_t msr, int hook_reads, int hook_writes) {
int index = get_bitmap_index(msr);
uint_t major = index / 8;
uint_t minor = (index % 8);
uchar_t mask = 0x1;
uint8_t read_val = (hook_reads) ? 0x1 : 0x0;
uint8_t write_val = (hook_writes) ? 0x1 : 0x0;
uint8_t * bitmap = (uint8_t *)(vm->msr_map.arch_data);
*(bitmap + major) &= ~(mask << minor);
*(bitmap + major) |= (read_val << minor);
*(bitmap + 2048 + major) &= ~(mask << minor);
*(bitmap + 2048 + major) |= (write_val << minor);
return 0;
}
One more hour later, it still boots but it boots finally!11 Now guest linux almost booted the kernel (it's centos, so it has tons of drivers to init and many-many useless things, so 2*10^10 ticks passed but it still hasn't reached emergency initramfs console).
Looks like bochs won't help me to find the reasons why on real machine with Core i5 since winter i get strange INIT signal after guest tests NMI WDT (and do something more but it's the last thing that guest printks), may be that is really a hardware problem and not my fault. But i'm really happy that i have coped with this EFER issue. Sorry for wasting so much time and producing so much text, it was only the second day of my life dedicated to the study of bochs sources (and debugging C++ code in gdb).
I have some time left today, so i'll test svm support an write about the results here if i do it.
later
No, SVM failed, although it may be the problem of hypervisor code, but on bochs last events were:
Code: Select all
01512962829i[APIC1] Deliver Start Up IPI
01512962829i[CPU1 ] CPU 1 started up at 0600:00000000 by APIC
01512992912i[APIC1] Deliver Start Up IPI
01512992912i[CPU1 ] CPU 1 started up by APIC, but was not halted at the time
01513046436i[XGUI ] X11 Beep OFF
01515702280i[APIC1] set timer divide factor to 1
01517401482e[CPU0 ] RDMSR: Unknown register 0xc0010114
01518055360e[CPU1 ] RDMSR: Unknown register 0xc0010114
01754845090e[CPU0 ] VMRUN: VMCB SS.DPL != CPL
And hypervisor log:
Code: Select all
<6>Starting Guest OS...
V3 -- Starting VM (2 cores)
CORE 0 RIP=000000000000fff0
Starting virtual core 1 on logical core 1
run: core=1, func=0xffffffff8020b8bf, arg=0xffff81000053a5c8, name=[V3_VM]-1
virtual core 0 (on logical core 0, APIC ID 0): in start_core (RIP=000000000000fff0)
virtual core 1 (on logical core 1, APIC ID 1): in start_core (RIP=000000000000fff0)
palacios/src/palacios/svm_handler.c(300): Unhandled SVM Exit: VMEXIT_INVALID_VMCB
palacios/src/palacios/svm_handler.c(305): SVM Returned:(VMCB=ffff8100090d3000)
palacios/src/palacios/svm_handler.c(306): RIP: ffffffff8023daad
palacios/src/palacios/svm_handler.c(307): RIP Linear: ffffffff8023daad
palacios/src/palacios/svm_handler.c(309): SVM Returned: Exit Code: ffffffffffffffff
palacios/src/palacios/svm_handler.c(311): io_info1 low = 0x00000000
palacios/src/palacios/svm_handler.c(312): io_info1 high = 0x00000000
palacios/src/palacios/svm_handler.c(314): io_info2 low = 0x00000000
palacios/src/palacios/svm_handler.c(315): io_info2 high = 0x00000000
palacios/src/palacios/svm.c(608): Error in SVM exit handler (ret=-1)
palacios/src/palacios/svm.c(609): last Exit was -1 (exit code=0xffffffffffffffff)
SVM core 0: SVM ERROR!!
RIP: ffffffff8023daad
RIP Linear: ffffffff8023daad
NumExits: 1
Segments
CS: Sel=10, base=(null), limit=ffffffff (long_mode=1, db=0)
DS: Sel=0, base=(null), limit=0 (long_mode=0, db=0)
ES: Sel=0, base=(null), limit=0 (long_mode=0, db=0)
FS: Sel=0, base=(null), limit=ffffffff (long_mode=0, db=0)
GS: Sel=0, base=(null), limit=ffffffff (long_mode=0, db=0)
SS: Sel=18, base=(null), limit=ffffffff (long_mode=0, db=1)
LDTR: Sel=0, base=(null), limit=ffff (long_mode=0, db=0)
GDTR: Sel=0, base=ffffffff803b0000, limit=80 (long_mode=0, db=0)
IDTR: Sel=0, base=ffffffff80501000, limit=fff (long_mode=0, db=0)
TR: Sel=0, base=(null), limit=ffff (long_mode=0, db=0)
32 bit Ctrl Regs:
CR0=0x0000000080050033 (at ffff81000053a460)
CR2=0x(null) (at ffff81000053a468)
CR3=0x0000000000527000 (at ffff81000053a470)
CR4=0x00000000000006a0 (at ffff81000053a478)
CR8=0x(null) (at ffff81000053a480)
FLAGS=0x0000000000000293 (at ffff81000053a488)
EFER=0x0000000000001d01
64 bit GPRs:
RDI=0x(null) (at ffff81000053a3e0)
RSI=0x(null) (at ffff81000053a3e8)
RBP=0x(null) (at ffff81000053a3f0)
RSP=0xffff81000001fd88 (at ffff81000053a3f8)
RBX=0x(null) (at ffff81000053a400)
RDX=0x(null) (at ffff81000053a408)
RCX=0x(null) (at ffff81000053a410)
RAX=0x00000000090d3000 (at ffff81000053a418)
R8=0x(null) (at ffff81000053a420)
R9=0x(null) (at ffff81000053a428)
R10=0x(null) (at ffff81000053a430)
R11=0x(null) (at ffff81000053a438)
R12=0x(null) (at ffff81000053a440)
R13=0x(null) (at ffff81000053a448)
R14=0x(null) (at ffff81000053a450)
R15=0x(null) (at ffff81000053a458)
Memory Layout (all cores):
Base Region (all cores): 0x(null) - 0x000000007fffffff -> 0x0000000009200000
0: 0x00000000000a0000 - 0x00000000000bffff -> 0x00000000000a0000
(flags=0x17) (core=0xffff) (unhandled = 0xffffffff8021f258)
1: 0x00000000fec00000 - 0x00000000fec00fff -> 0x(null)
(flags=0x0) (core=0xffff) (unhandled = 0xffffffff802329c7)
2: 0x00000000fee00000 - 0x00000000fee00fff -> 0x(null)
(flags=0x0) (core=0x0) (unhandled = 0xffffffff802329c7)
3: 0x00000000fee00000 - 0x00000000fee00fff -> 0x(null)
(flags=0x0) (core=0x1) (unhandled = 0xffffffff802329c7)
4: 0x00000000fffe0000 - 0x00000000ffffffff -> 0x00000000092e0000
(flags=0x17) (core=0xffff) (unhandled = 0xffffffff8021f258)
Stack at ffff81000001fd88:
palacios/src/palacios/vm_guest.c(437): Could not translate Virtual Stack address
SVM core 0: SVM Exit Code: ffffffffffffffff
SVM core 0: exit_info1 low = 0x00000000
SVM core 0: exit_info1 high = 0x00000000
SVM core 0: exit_info2 low = 0x00000000
SVM core 0: exit_info2 high = 0x00000000
SVM core 0: Host Address of rip = 0x(null)
SVM core 0: Instr (15 bytes) at (null):
(null) Failed to fixup page fault exception!
Page Fault Exception: 0000000000000000
Task ID: 1 Task Name: init_task UTS_RELEASE: 1.1.0Kitten-dirty
RIP: 0010:ffffffff8022e1f9 (v3_dump_mem)
RSP: 0018:ffff81000001fe38 EFLAGS: 00010246 ERR: 00000000
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000003f8
RDX: ffffffff80282020 RSI: 0000000000000296 RDI: ffffffff802b8a30
RBP: ffff81000001fe58 R08: 00000000ffffff20 R09: 00000000ffffffff
R10: 0000000000000000 R11: ffff81000001fda8 R12: ffff810000538010
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000002719880(0000) GS:ffffffff803a9000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000000527000 CR4: 00000000000006a0
<7> Stack trace from RBP ffff81000001fe58
<7> [<ffffffff8023c487>] v3_start_svm_guest+0x3f2/0x43c
<7> [<ffffffff8020b980>] start_core+0xc1/0x120
<7> [<ffffffff8020bfc8>] v3_start_vm+0x5e9/0x654
<7> [<ffffffff80281d29>] sys_v3_start_guest+0x99/0xb0
<7> [<ffffffff8020710e>] asm_syscall+0x62/0xdc
May be i've specified wrong (i don't remember recent AMD codenames and model numbers) CPU in bochsrc (i didn't setup such strange brand and model strings, i just selected phenom in config menu ( svm=0, huh? i've specified --enable-svm to configure) and bochs generated such cpu info in config) :
Code: Select all
# configuration file generated by Bochs
config_interface: textconfig
display_library: x
memory: host=1024, guest=3072
romimage: file="/usr/share/bochs/BIOS-bochs-latest"
vgaromimage: file="/usr/share/bochs/VGABIOS-lgpl-latest"
boot: disk
floppy_bootsig_check: disabled=0
# no floppya
# no floppyb
ata0: enabled=1, ioaddr1=0x1f0, ioaddr2=0x3f0, irq=14
ata0-master: type=disk, mode=flat, translation=lba, path="/dev/disk/by-id/usb-JetFlash_TS2GJFV30_8OOTGU5N-0:0", cylinders=5778, heads=11, spt=63, biosdetect=auto, model="Generic 1234"
ata1: enabled=0
ata2: enabled=0
ata3: enabled=0
parport1: enabled=0
parport2: enabled=0
com1: enabled=1, mode=file, dev="/home/nable/ttyS0_bochs"
com2: enabled=0
com3: enabled=0
com4: enabled=0
pci: enabled=1, chipset=i440fx
vga: extension=vbe, update_freq=5
cpu: count=1:2:1, ips=50000000, quantum=5, model=phenom_8650_toliman, reset_on_triple_fault=1, cpuid_limit_winnt=0, ignore_bad_msrs=1, msrs="msrs.def"
cpuid: family=6, model=0x03, stepping=3, mmx=1, apic=xapic, sse=sse2, sse4a=0, sep=1, aes=0, xsave=0, xsaveopt=0, movbe=0, smep=0, svm=0, x86_64=1, 1g_pages=0, pcid=0, fsgsbase=0, mwait=1, mwait_is_nop=0
cpuid: vendor_string="GenuineIntel"
cpuid: brand_string=" Intel(R) Pentium(R) 4 CPU "
print_timestamps: enabled=0
port_e9_hack: enabled=0
private_colormap: enabled=0
clock: sync=none, time0=local
# no cmosimage
# no loader
log: bochsout.txt
logprefix: %t%e%d
panic: action=ask
error: action=report
info: action=report
debug: action=ignore
keyboard_type: mf
keyboard_serial_delay: 250
keyboard_paste_delay: 100000
keyboard_mapping: enabled=0, map=
user_shortcut: keys=none
mouse: enabled=0, type=none, toggle=ctrl+mbutton
And on real hardware i have the following log:
Code: Select all
<6>Starting Guest OS...
V3 -- Starting VM (2 cores)
CORE 0 RIP=000000000000fff0
Starting virtual core 1 on logical core 1
run: core=1, func=0xffffffff8020b8bf, arg=0xffff8100000425c8, name=[V3_VM]-1
virtual core 0 (on logical core 0, APIC ID 0): in start_core (RIP=000000000000fff0)
virtual core 1 (on logical core 1, APIC ID 1): in start_core (RIP=000000000000fff0)
BOCHSINFO>Start bios (version 1.6.3.1-20120203_160624-cuckoo)
BOCHSINFO>Ram Size=0x80000000 (0x0000000000000000 high)
BOCHSINFO>Relocating init from 0x000e67f0 to 0x7ffe3010 (size 52932)
BOCHSINFO>CPU Mhz=3419
BOCHSINFO>Found 3 PCI devices (max PCI bus is 00)
BOCHSINFO>Found 2 cpu(s) max supported 2 cpu(s)
BOCHSINFO>MP table addr=0x000fd7b0 MPC table addr=0x000fd7c0 size=224
BOCHSINFO>SMBIOS ptr=0x000fd790 table=0x000fd650 size=308
BOCHSINFO>ACPI tables: RSDP=0x000fd620 RSDT=0x7fffd6b0
BOCHSINFO>Scan for VGA option rom
BOCHSINFO>Running option rom at c000:0003
BOCHSCONSOLE>VGABios $Id: vgabios.c,v 1.1 2007/11/29 20:26:38 pdinda Exp $
BOCHSINFO>Turning on vga text mode console
BOCHSINFO>SeaBIOS (version 1.6.3.1-20120203_160624-cuckoo)
BOCHSINFO>
BOCHSINFO>PS2 keyboard initialized
BOCHSINFO>Found 0 lpt ports
BOCHSINFO>Found 1 serial ports
BOCHSINFO>Searching bootorder for: /pci@i0cf8/isa@1/fdc@03f0/floppy@0
BOCHSINFO>Searching bootorder for: /pci@i0cf8/isa@1/fdc@03f0/floppy@1
palacios/src/devices/pci.c(571): command update for PIIX3_IDE old=4 new=4
BOCHSINFO>ATA controller 1 at 1f0/3f4/c000 (irq 14 dev 9)
BOCHSINFO>DVD/CD [ata0-0: 3VeV eDCOR M ATAPI-6 DVD/CD]
BOCHSINFO>Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0
BOCHSINFO>ATA controller 2 at 170/374/c008 (irq 15 dev 9)
BOCHSINFO>Scan for option roms
BOCHSINFO>Press F12 for boot menu.
BOCHSINFO>
BOCHSINFO>ebda moved from 9fc00 to 9f400
BOCHSINFO>Returned 53248 bytes of ZoneHigh
BOCHSINFO>e820 map has 6 items:
BOCHSINFO> 0: 0000000000000000 - 000000000009f400 = 1 RAM
BOCHSINFO> 1: 000000000009f400 - 00000000000a0000 = 2 RESERVED
BOCHSINFO> 2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
BOCHSINFO> 3: 0000000000100000 - 000000007fffd000 = 1 RAM
BOCHSINFO> 4: 000000007fffd000 - 0000000080000000 = 2 RESERVED
BOCHSINFO> 5: 00000000fffc0000 - 0000000100000000 = 2 RESERVED
BOCHSINFO>enter handle_19:
BOCHSINFO> NULL
BOCHSINFO>Booting from DVD/CD...
BOCHSINFO>47288MB medium detected
BOCHSINFO>Booting from 0000:7c00
EFER Read HI=0 LO=0
MSR write for msr 0xc0000080
EFER Write HI=0 LO=100
EFER Read HI=0 LO=100
MSR write for msr 0xc0000080
EFER Write HI=0 LO=901
EFER Read HI=0 LO=901
EFER Read HI=0 LO=901
MSR write for msr 0x2ff
MSR write for msr 0x277
MSR write for msr 0x2ff
EFER Read HI=0 LO=901
palacios/src/devices/apic.c(1312): apic 0: core 0: Attempting to write to read only register 0000000000000030 (error)
palacios/src/devices/apic.c(724): Warning: core 1 is not in INIT state (mode = 1), ignored (assuming this is the deassert)
Resetting SVM Guest CPU 1
palacios/src/devices/apic.c(747): core 1 is not in SIPI state (mode = 2), ignored!
MSR write for msr 0xc0000080
EFER Write HI=0 LO=100
EFER Read HI=0 LO=100
MSR write for msr 0xc0000080
EFER Write HI=0 LO=901
EFER Read HI=0 LO=901
MSR write for msr 0x2ff
MSR write for msr 0x277
MSR write for msr 0x2ff
palacios/src/devices/pci.c(447): PCI Device IO space not enabled
palacios/src/devices/pci.c(447): PCI Device IO space not enabled
V3Vee: Memory offset hypercall (offset=0000000009200000)
MSR write for msr 0x534
Symbiotic Global MSR write for page 000000007d07e000
...
Then guest fails because i haven't yet fully fixed large pages support for svm but that's just another story..