Help testing Bochs VMX and SVM support

This forums is for OS project announcements including project openings, new releases, update notices, test requests, and job openings (both paying and volunteer).
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

While reading osdev.ru (it's rather hard to read it via google.translate, although possible) i've found one more issue, unrelated to VMX/SVM but also interesting:
http://translate.google.com/translate?s ... %26t%3D542
Main idea is very simple:
LAPIC timer divisior is updated only when counter wraps but bare hardware and other VMs update it on the fly
Author of that topic added bootable image that demonstrates the problem: http://osdev.ru/download/file.php?id=29 , if this link is broken, you can use my one: ftp://93.175.16.134/boot.zip

My translation of test description:
1. PIT is clocked to 50Hz, LAPIC timer - 10 kHz (divisor is 16), i.e. there are 200 APIC timer ticks between 2 PIT ticks.
2. Kernel counts APIC timer interrupts during one period of PIT, then changes divisor to 8 and counts APIC ticks during one period.
3. Then test reloads TICR with the value from p.1 and counts ticks again.
4. Then it disables both timers and display result.

On bare hardware and under virtualbox we have smth about 200/400/400 but under bochs 200/200/400 ticks.
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Help testing Bochs VMX and SVM support

Post by stlw »

Nable wrote:While reading osdev.ru (it's rather hard to read it via google.translate, although possible) i've found one more issue, unrelated to VMX/SVM but also interesting
Do you have original russian link ? I read russian.

P.S. не важно, уже сам нашел

Stanislav
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Help testing Bochs VMX and SVM support

Post by stlw »

Nable wrote:While reading osdev.ru (it's rather hard to read it via google.translate, although possible) i've found one more issue, unrelated to VMX/SVM but also interesting
I did fix candidate (based on current SVN). The change is apic.cc only. When APIC timer expires it looks if any of timer params (like divide configuration) changed and re-starts counting with proper values.

I didn't succeed to see anything from boot.img. It just says me "phase 0 initilization complete" and halts the CPU so I don't see the 400/200/200 or enything else.
Can you check with attached patch if it works for you ?

Stanislav
Attachments

[The extension patch has been deactivated and can no longer be displayed.]

Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

Sorry about boot.img, i've tested it only today and is also shows me nothing related.
I've asked the author about this fact.

> When APIC timer expires it looks if any of timer params (like divide configuration) changed and re-starts counting with proper values.
The author [of the topic that was mentioned above] claims that divide configuration is updated immediately after write to apic reg.

Although, his image didn't show us anything, so it's interesting if he'll show some working proofs.
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

I don't know how but I came across this thing today while testing my hypervisor:

Code: Select all

12972239730e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972255020e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972270325e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12975479870e[CPU0 ] VMEXIT: EPT violation for guest paddr 0x00000000fee000b0 laddr 0xffffffffff5fc0b0
12979861415p[CPU1 ] >>PANIC<< failed assertion "0" at proc_ctrl.cc:409
I'm not sure if I can reproduce this but may be you saw this already.
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Help testing Bochs VMX and SVM support

Post by stlw »

Nable wrote:I don't know how but I came across this thing today while testing my hypervisor:

Code: Select all

12972239730e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972255020e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972270325e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12975479870e[CPU0 ] VMEXIT: EPT violation for guest paddr 0x00000000fee000b0 laddr 0xffffffffff5fc0b0
12979861415p[CPU1 ] >>PANIC<< failed assertion "0" at proc_ctrl.cc:409
I'm not sure if I can reproduce this but may be you saw this already.
No, I never saw it before. Please try to reproduce if possible.

Stanislav
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

It took much time but bug is repeatable. Although, the moment when bug appears is different between two runs, so it may be a rather rare indeterminate behavior that appears due to special combinations of system events.

I hope that you hadn't lost the image that i gave you before, in the archive I add bzImage + init_task that you should put into /boot/ of that image.
ttyS0_bochs and bochsout.txt - outputs of that runs. bochs-conf.sh is my script for configuring Bochs.
Afair, i'm running bochs built from r11170, gcc version 4.6.3 (Debian 4.6.3-1).
Don't forget to change path in .bochsrc from /dev/sdb to path to image file.

ftp://93.175.16.13/bochs_assert0.tgz

Also, i've chosen (in the panic window) to run debugger instead of terminating, may be it'll help you somehow if you need some info from debugger.
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Help testing Bochs VMX and SVM support

Post by stlw »

Nable wrote:It took much time but bug is repeatable. Although, the moment when bug appears is different between two runs, so it may be a rather rare indeterminate behavior that appears due to special combinations of system events.

I hope that you hadn't lost the image that i gave you before, in the archive I add bzImage + init_task that you should put into /boot/ of that image.
ttyS0_bochs and bochsout.txt - outputs of that runs. bochs-conf.sh is my script for configuring Bochs.
Afair, i'm running bochs built from r11170, gcc version 4.6.3 (Debian 4.6.3-1).
Don't forget to change path in .bochsrc from /dev/sdb to path to image file.

ftp://93.175.16.13/bochs_assert0.tgz

Also, i've chosen (in the panic window) to run debugger instead of terminating, may be it'll help you somehow if you need some info from debugger.
Hello,

I can confirm this issue and also can tell it is weird for me so I have no ideas where to start here ....
I reproduced your problem with yesterday SVN and then went out into the code and disabled some frequent BX_ERROR messages about VMEXITs that happened.
I merged the change to the SVN at 11186 and .... booted your image without any issue with this revision immediatelly after.

Now with current SVN it boots but when I turn on debug messages (with debug: ignore, cpu0=report, cpu1=report for example) I get the failure back.
This means that the failure is related to logfile messages printed.
I suspected format error at the beggining but it looks like formats are OK for the messages I changed.

The code related to decoding of No-SSE attribute is also correct and I don't belive it might be a problem here.

Might be you will have some suggestions ....

Stanislav

EDIT: It is even worse - when I try to add a debug print message it makes the case to pass again. It must be compiler related issue (I used mingw gcc 4.6.1, similar to your gcc 4.6.3) or weird memory bug. But it is too silly for ne to run Valgrind on Bochs now ....
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

I'm also stuck on it.

BTW, i think it's good to start compiling bochs with -W -Wall -Werror, it's unlikely to help in current problem but helped me in cleaning several bugs in some programs before.
Also something like cppcheck may say smth good.

I haven't studied this bug much because after some fixes to hypervisor bug didn't appear more.. Oh, interesting moment: i'm using gcc-4.6 on workstation2 but after fixing hypervisor i've tested it only on workstation1 with gcc-4.5. I hope that tomorrow i'll find some time to test new version of hypervisor on workstation2. I'm almost can't believe into gcc bugs (although I know about many of them in the past) but it will be interesting if we've really found one.

Valgrind seems insane but the trace of execution might help. BTW (if it's not a secret, of course) : does Intel have some software for fast execution tracing?
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Help testing Bochs VMX and SVM support

Post by stlw »

Nable wrote:BTW (if it's not a secret, of course) : does Intel have some software for fast execution tracing?
Not now (yet).

Stanislav
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

I've reproduce this bug with gcc version 4.5.3 (Debian 4.5.3-12).
But i keep failing to reproduce it with gcc version 4.7.0 (Debian 4.7.0-8), guest boots to the end w/o this error. "end" means the moment when guest hangs while trying to detect virtual CD-ROM.
Although, I've noticed several times some artefacts like that on the screenshot. I'm not sure that this is the typo in init script, because if I use different compiler to build bochs, then it changes, may be it's because of error in initramfs unpacking, and this error may be because of error in emulation, and error in emulation is because of.. what?
4.5.PNG
4.7.PNG
May be I should try to reproduce the bug using virtual SVM support..

Upd: i've thought that suffering has it's limit but i was wrong. When I call debugger (this is bochs compiled with gcc-4.7, although if i click "continue" [in "panic" window] when using variant of gcc-4.5 it comes to the same state ), it appeares to be stuck:
http://pastebin.ca/2152419
Although the guest doesn't look stuck:
http://pastebin.ca/2152422

I hope that I'm still not fully insane.

Upd2: really guest doesn't get stuck on cdrom detection, it just takes a lot of time. I've get enough patience to wait for guest to complete booting (again, variant with g++-4.7) and to see what's in that script. And you see, there are no mistakes, smth wrong is with execution of interpreter (or unpacking) :
no typo.PNG
I still have hope that it's neither my insanity, nor cpu bug.
stlw
Member
Member
Posts: 357
Joined: Fri Apr 04, 2008 6:43 am
Contact:

Re: Help testing Bochs VMX and SVM support

Post by stlw »

BDW, I saw you are running SMP simulations and saw your configurations. You can greatly improve emulation speed by increasing quantum value to smth like 15.
This also could affect your issues and who knows - might be even make them disappear as well.

Stanislav
Nable
Member
Member
Posts: 453
Joined: Tue Nov 08, 2011 11:35 am

Re: Help testing Bochs VMX and SVM support

Post by Nable »

> might be even make them disappear as well.
> disappear
Hiding issues is not a fix, it's a time-bomb, so, it doesn't look like an interesting way.
Especially because for many people Bochs is a base-of-trust, i.e. it is believed to work as an almost fully correct equivalent of real hardware.

At least, looks like artefacts depend only on compiler version (if we use the same guest image), i.e. they behave differently for different version of compiler but every time the same for the same version.
One more interesting thing: broken version of hypervisor hooks writes (by marking this page as read-only in EPTables) to page at GPA (guest physical address) 0, fixed one hooks page at right address. If fixed version doesn't cause this issue (i'll check it when i have time), then we should check bochs EPT support.

What do you think about artefacts while interpreting init scripts?
What do you think about freezing of the debugger?
jbemmel
Member
Member
Posts: 53
Joined: Fri May 11, 2012 11:54 am

Re: Help testing Bochs VMX and SVM support

Post by jbemmel »

Using Valgrind, I found some uninitialized state bits in the CPU ( see https://sourceforge.net/tracker/index.p ... tid=112580# ).

In this case, it was the lazy flags bits; adding "memset( &oszapc, 0, sizeof(oszapc) );" fixes it

This issue sounds like it could be due to some other uninitialized variables. This would explain why it may be hard to reproduce. Try using valgrind to run this OS test image, and see if it reports anything suspicious
Post Reply