Page 3 of 3

Re: Help testing Bochs VMX and SVM support

Posted: Fri May 11, 2012 1:16 pm
by Nable
While reading osdev.ru (it's rather hard to read it via google.translate, although possible) i've found one more issue, unrelated to VMX/SVM but also interesting:
http://translate.google.com/translate?s ... %26t%3D542
Main idea is very simple:
LAPIC timer divisior is updated only when counter wraps but bare hardware and other VMs update it on the fly
Author of that topic added bootable image that demonstrates the problem: http://osdev.ru/download/file.php?id=29 , if this link is broken, you can use my one: ftp://93.175.16.134/boot.zip

My translation of test description:
1. PIT is clocked to 50Hz, LAPIC timer - 10 kHz (divisor is 16), i.e. there are 200 APIC timer ticks between 2 PIT ticks.
2. Kernel counts APIC timer interrupts during one period of PIT, then changes divisor to 8 and counts APIC ticks during one period.
3. Then test reloads TICR with the value from p.1 and counts ticks again.
4. Then it disables both timers and display result.

On bare hardware and under virtualbox we have smth about 200/400/400 but under bochs 200/200/400 ticks.

Re: Help testing Bochs VMX and SVM support

Posted: Fri May 11, 2012 11:47 pm
by stlw
Nable wrote:While reading osdev.ru (it's rather hard to read it via google.translate, although possible) i've found one more issue, unrelated to VMX/SVM but also interesting
Do you have original russian link ? I read russian.

P.S. не важно, уже сам нашел

Stanislav

Re: Help testing Bochs VMX and SVM support

Posted: Sat May 12, 2012 1:05 am
by stlw
Nable wrote:While reading osdev.ru (it's rather hard to read it via google.translate, although possible) i've found one more issue, unrelated to VMX/SVM but also interesting
I did fix candidate (based on current SVN). The change is apic.cc only. When APIC timer expires it looks if any of timer params (like divide configuration) changed and re-starts counting with proper values.

I didn't succeed to see anything from boot.img. It just says me "phase 0 initilization complete" and halts the CPU so I don't see the 400/200/200 or enything else.
Can you check with attached patch if it works for you ?

Stanislav

Re: Help testing Bochs VMX and SVM support

Posted: Sat May 12, 2012 3:48 am
by Nable
Sorry about boot.img, i've tested it only today and is also shows me nothing related.
I've asked the author about this fact.

> When APIC timer expires it looks if any of timer params (like divide configuration) changed and re-starts counting with proper values.
The author [of the topic that was mentioned above] claims that divide configuration is updated immediately after write to apic reg.

Although, his image didn't show us anything, so it's interesting if he'll show some working proofs.

Re: Help testing Bochs VMX and SVM support

Posted: Mon May 14, 2012 9:13 am
by Nable
I don't know how but I came across this thing today while testing my hypervisor:

Code: Select all

12972239730e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972255020e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972270325e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12975479870e[CPU0 ] VMEXIT: EPT violation for guest paddr 0x00000000fee000b0 laddr 0xffffffffff5fc0b0
12979861415p[CPU1 ] >>PANIC<< failed assertion "0" at proc_ctrl.cc:409
I'm not sure if I can reproduce this but may be you saw this already.

Re: Help testing Bochs VMX and SVM support

Posted: Mon May 14, 2012 9:33 am
by stlw
Nable wrote:I don't know how but I came across this thing today while testing my hypervisor:

Code: Select all

12972239730e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972255020e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12972270325e[CPU1 ] VMEXIT: CPUID in VMX non-root operation
12975479870e[CPU0 ] VMEXIT: EPT violation for guest paddr 0x00000000fee000b0 laddr 0xffffffffff5fc0b0
12979861415p[CPU1 ] >>PANIC<< failed assertion "0" at proc_ctrl.cc:409
I'm not sure if I can reproduce this but may be you saw this already.
No, I never saw it before. Please try to reproduce if possible.

Stanislav

Re: Help testing Bochs VMX and SVM support

Posted: Mon May 14, 2012 10:48 am
by Nable
It took much time but bug is repeatable. Although, the moment when bug appears is different between two runs, so it may be a rather rare indeterminate behavior that appears due to special combinations of system events.

I hope that you hadn't lost the image that i gave you before, in the archive I add bzImage + init_task that you should put into /boot/ of that image.
ttyS0_bochs and bochsout.txt - outputs of that runs. bochs-conf.sh is my script for configuring Bochs.
Afair, i'm running bochs built from r11170, gcc version 4.6.3 (Debian 4.6.3-1).
Don't forget to change path in .bochsrc from /dev/sdb to path to image file.

ftp://93.175.16.13/bochs_assert0.tgz

Also, i've chosen (in the panic window) to run debugger instead of terminating, may be it'll help you somehow if you need some info from debugger.

Re: Help testing Bochs VMX and SVM support

Posted: Sun May 20, 2012 1:09 pm
by stlw
Nable wrote:It took much time but bug is repeatable. Although, the moment when bug appears is different between two runs, so it may be a rather rare indeterminate behavior that appears due to special combinations of system events.

I hope that you hadn't lost the image that i gave you before, in the archive I add bzImage + init_task that you should put into /boot/ of that image.
ttyS0_bochs and bochsout.txt - outputs of that runs. bochs-conf.sh is my script for configuring Bochs.
Afair, i'm running bochs built from r11170, gcc version 4.6.3 (Debian 4.6.3-1).
Don't forget to change path in .bochsrc from /dev/sdb to path to image file.

ftp://93.175.16.13/bochs_assert0.tgz

Also, i've chosen (in the panic window) to run debugger instead of terminating, may be it'll help you somehow if you need some info from debugger.
Hello,

I can confirm this issue and also can tell it is weird for me so I have no ideas where to start here ....
I reproduced your problem with yesterday SVN and then went out into the code and disabled some frequent BX_ERROR messages about VMEXITs that happened.
I merged the change to the SVN at 11186 and .... booted your image without any issue with this revision immediatelly after.

Now with current SVN it boots but when I turn on debug messages (with debug: ignore, cpu0=report, cpu1=report for example) I get the failure back.
This means that the failure is related to logfile messages printed.
I suspected format error at the beggining but it looks like formats are OK for the messages I changed.

The code related to decoding of No-SSE attribute is also correct and I don't belive it might be a problem here.

Might be you will have some suggestions ....

Stanislav

EDIT: It is even worse - when I try to add a debug print message it makes the case to pass again. It must be compiler related issue (I used mingw gcc 4.6.1, similar to your gcc 4.6.3) or weird memory bug. But it is too silly for ne to run Valgrind on Bochs now ....

Re: Help testing Bochs VMX and SVM support

Posted: Mon May 21, 2012 12:43 pm
by Nable
I'm also stuck on it.

BTW, i think it's good to start compiling bochs with -W -Wall -Werror, it's unlikely to help in current problem but helped me in cleaning several bugs in some programs before.
Also something like cppcheck may say smth good.

I haven't studied this bug much because after some fixes to hypervisor bug didn't appear more.. Oh, interesting moment: i'm using gcc-4.6 on workstation2 but after fixing hypervisor i've tested it only on workstation1 with gcc-4.5. I hope that tomorrow i'll find some time to test new version of hypervisor on workstation2. I'm almost can't believe into gcc bugs (although I know about many of them in the past) but it will be interesting if we've really found one.

Valgrind seems insane but the trace of execution might help. BTW (if it's not a secret, of course) : does Intel have some software for fast execution tracing?

Re: Help testing Bochs VMX and SVM support

Posted: Tue May 22, 2012 6:25 am
by stlw
Nable wrote:BTW (if it's not a secret, of course) : does Intel have some software for fast execution tracing?
Not now (yet).

Stanislav

Re: Help testing Bochs VMX and SVM support

Posted: Tue May 22, 2012 2:08 pm
by Nable
I've reproduce this bug with gcc version 4.5.3 (Debian 4.5.3-12).
But i keep failing to reproduce it with gcc version 4.7.0 (Debian 4.7.0-8), guest boots to the end w/o this error. "end" means the moment when guest hangs while trying to detect virtual CD-ROM.
Although, I've noticed several times some artefacts like that on the screenshot. I'm not sure that this is the typo in init script, because if I use different compiler to build bochs, then it changes, may be it's because of error in initramfs unpacking, and this error may be because of error in emulation, and error in emulation is because of.. what?
4.5.PNG
4.7.PNG
May be I should try to reproduce the bug using virtual SVM support..

Upd: i've thought that suffering has it's limit but i was wrong. When I call debugger (this is bochs compiled with gcc-4.7, although if i click "continue" [in "panic" window] when using variant of gcc-4.5 it comes to the same state ), it appeares to be stuck:
http://pastebin.ca/2152419
Although the guest doesn't look stuck:
http://pastebin.ca/2152422

I hope that I'm still not fully insane.

Upd2: really guest doesn't get stuck on cdrom detection, it just takes a lot of time. I've get enough patience to wait for guest to complete booting (again, variant with g++-4.7) and to see what's in that script. And you see, there are no mistakes, smth wrong is with execution of interpreter (or unpacking) :
no typo.PNG
I still have hope that it's neither my insanity, nor cpu bug.

Re: Help testing Bochs VMX and SVM support

Posted: Wed May 23, 2012 2:28 am
by stlw
BDW, I saw you are running SMP simulations and saw your configurations. You can greatly improve emulation speed by increasing quantum value to smth like 15.
This also could affect your issues and who knows - might be even make them disappear as well.

Stanislav

Re: Help testing Bochs VMX and SVM support

Posted: Wed May 23, 2012 5:25 am
by Nable
> might be even make them disappear as well.
> disappear
Hiding issues is not a fix, it's a time-bomb, so, it doesn't look like an interesting way.
Especially because for many people Bochs is a base-of-trust, i.e. it is believed to work as an almost fully correct equivalent of real hardware.

At least, looks like artefacts depend only on compiler version (if we use the same guest image), i.e. they behave differently for different version of compiler but every time the same for the same version.
One more interesting thing: broken version of hypervisor hooks writes (by marking this page as read-only in EPTables) to page at GPA (guest physical address) 0, fixed one hooks page at right address. If fixed version doesn't cause this issue (i'll check it when i have time), then we should check bochs EPT support.

What do you think about artefacts while interpreting init scripts?
What do you think about freezing of the debugger?

Re: Help testing Bochs VMX and SVM support

Posted: Mon Jun 04, 2012 2:29 pm
by jbemmel
Using Valgrind, I found some uninitialized state bits in the CPU ( see https://sourceforge.net/tracker/index.p ... tid=112580# ).

In this case, it was the lazy flags bits; adding "memset( &oszapc, 0, sizeof(oszapc) );" fixes it

This issue sounds like it could be due to some other uninitialized variables. This would explain why it may be hard to reproduce. Try using valgrind to run this OS test image, and see if it reports anything suspicious