Page 1 of 2

Bochs or me?

Posted: Sat May 27, 2006 5:28 am
by Ryu
Has anyone noticed that bochs is doing something odd to generate a GPF when your using other then the setting clock: sync=slowdown, time0=local ?

I was doing some stability test with concerning with string output function, but now I don't know if its my function or bochs. In this test, I used a never ending loop that outputs a string and a ISR for the PIT channel 0 that will also output a string on interrupt.

Some things I found out:
1) No GPF when using clock: sync=slowdown, time0=local, at any PIT resolution.
2) No GPF when using around 100hz resolutions on PIT, with any setting.
3) GPF when using clock: sync=realtime, time0=local, with 500hz+ resoultions on PIT.
4) No GPF on my real machine even using 2000hz resolution.

Orginally it was a test to if I needed to have a mutex with some data, but after some time of testing I'm beginning to think that Bochs is screwing with my head and Bochs is the problem. In the other hand I'm new to ISRs and am not sure if its my routines or Bochs. Has anyone encounter a simular problem? or should I start pasting some code?

Re:Bochs or me?

Posted: Sat May 27, 2006 5:59 am
by Midas
1. Hmm, not tried that - I'll try it later.
2. I had my PIT running at 100 Hz previously (on complete rewrite now) without any problems in terms of #GP.
3. Realtime sync can flood your OS with IRQ0 as I understand it, possibly because of a bug - I'm unsure.
4. Pass. Not a clue.

Re:Bochs or me?

Posted: Sat May 27, 2006 6:06 am
by Brendan
Hi,
Ryu wrote:Orginally it was a test to if I needed to have a mutex with some data, but after some time of testing I'm beginning to think that Bochs is screwing with my head and Bochs is the problem. In the other hand I'm new to ISRs and am not sure if its my routines or Bochs. Has anyone encounter a simular problem? or should I start pasting some code?
I would assume you've got a timing related bug (e.g. a "race condition"), where if 2 things happen at exactly the wrong time things go bad. I'd post some code anyway...

@Midas: If it was an IRQ0 flood, then EOI's wouldn't be sent fast enough and some timer IRQs would be lost. This might lead to messed up timing (e.g. not keeping up with the wall clock, or some tasks getting more CPU time than they should), but it wouldn't cause GPFs.


Cheers,

Brendan

Re:Bochs or me?

Posted: Sat May 27, 2006 6:14 am
by Midas
@Brendan - Oops, fair point. Sorry for the inaccuracy! Thanks. :)

Re:Bochs or me?

Posted: Sat May 27, 2006 6:15 am
by nick8325
Midas wrote: 3. Realtime sync can flood your OS with IRQ0 as I understand it, possibly because of a bug - I'm unsure.
I suppose this is because if e.g. Bochs is simulating a million instructions per second, but you've told it to produce 100 timer ticks per real second, it'll have to produce one tick every 10,000 instructions. If your timer tick handler takes longer than that, whenever you enable interrupts you'll get a timer interrupt straight away...

(I suppose that isn't much different from what would happen on a real 1MHz processor, but IIRC the lowest-clocked 386 ran at 25MHz, which would give 250,000 clocks to handle a timer tick.)

Re:Bochs or me?

Posted: Sat May 27, 2006 6:23 am
by Midas
nick8325 wrote:
Midas wrote: 3. Realtime sync can flood your OS with IRQ0 as I understand it, possibly because of a bug - I'm unsure.
I suppose this is because if e.g. Bochs is simulating a million instructions per second, but you've told it to produce 100 timer ticks per real second, it'll have to produce one tick every 10,000 instructions. If your timer tick handler takes longer than that, whenever you enable interrupts you'll get a timer interrupt straight away...
That would be logical, too. I made a mess of that one. Thanks for the info. :-)

Re:Bochs or me?

Posted: Sat May 27, 2006 7:51 am
by Ryu
Some things I like to say before pasting a lenthly code. Firstly I apologize for the lengthly post. The GPF only happend if I have set bochs to realtime. Thirdly, about the ___ConsoleOutput@8, EFLAGS are pushed onto stack and which is restored on return which should reset the original IF. and cli is issued to protect from interrupts occurring and updating dwConsolePositionX and dwConsolePositionY.. at least its these two I have been watching.

I'm not really sure how to go about in the first place dealing with ISRs in these kinds of situations and if its the best way to use the cli method I used, however the GPF has really confused my mind. ???

Where it loops:

Code: Select all

   cli
   mov      dx, 1193180 / 2000      ; 2000 hz
   mov      al, 110110b            ; square wave
   out      43h, al               ; send command
   mov      al, dl
   out      40h, al               ; channel-0 7:0 divisor
   mov      al, dh
   out      40h, al               ; channel-0 15:8 divisor
   sti
   @hang:
   push   offset szTick
   push   0
   call   ___ConsoleOutput@8
   jmp   @hang
ISR:

Code: Select all

;==========================================================================
___Interrupt20h      PROC SYSCALL
;==========================================================================
;+++++++++++++++STACK+++++++++++++++++++18h+++++++++++++++++++++++++++++++++
   @EFLAGS               equ      esp+14h
   @CS                  equ      esp+10h
   @RIP               equ      esp+0Ch
   push   eax            ;      esp+08h
   push   ecx            ;      esp+04h
   push   edx            ;      esp+00h
;+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   mov      ecx, ds:[dwConsolePositionX]
   mov      edx, ds:[dwConsolePositionY]

   push   offset szTock
   push   0
   call   ___ConsoleOutput@8

   mov      ds:[dwConsolePositionX], ecx
   mov      ds:[dwConsolePositionY], edx

   mov      al, 20h
   out      20h, al      ; send EOI to channel-0

   pop      edx
   pop      ecx
   pop      eax

   iretd

___Interrupt20h      ENDP
___ConsoleOutput@8:

Code: Select all

;==========================================================================
___ConsoleOutput@8      PROC SYSCALL   flags:DWORD, off:DWORD
;==========================================================================
;+++++++++++++++STACK+++++++++++++++++++20h+++++++++++++++++++++++++++++++++
   @off               equ      esp+1Ch
   @flgs               equ      esp+18h
   @RIP               equ      esp+14h
   push   ecx            ;      esp+10h
   push   edx            ;      esp+0Ch
   push   esi            ;      esp+08h
   push   edi            ;      esp+04h
   pushfd               ;      esp+00h
;+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

   mov      eax, ds:[dwConsoleFlags]
   test   eax, 1h
   jz      @FAIL

   mov      ax, 0700h         ; attributes
   mov      esi, [@off]
   mov      edi, ds:[dwConsolePositionX]
   mov      edx, ds:[dwConsolePositionY]

@TRUNK:   mov      al, [esi]
      inc      esi
      cmp      edi, 158
      je      short @BRANCH0         ; last spacing
      test   edi, edi   
      jz      short @BRANCH1         ; first spacing

@TSUB:   cmp      edx, (160 * 24)
      jge      short @BRANCH2
      test   al, al
      jz      short @RETURN
      cmp      al, 0Dh         ; new line on CR
      je      short @BRANCH3
      cmp      al, 0Ah         ; new line on LF
      je      short @BRANCH3
      lea      ecx, [edi+edx+0A0000h]
      mov      [ecx], al
      add      edi, 2
      jmp      short @TRUNK

@BRANCH2:   push   esi

         mov      eax, 07200720h
         mov      edi, 0A0000h + (160 * 2)    ; start at 3rd row
         mov      esi, 0A0000h + (160 * 3)    ; source is at the 4th row
         mov      ecx, (160 * 21) / 4         ; 20 rows two bytes per character, / 4 movsd
         rep      movsd
         mov      edi, 0A0000h + (160 * 23)
         mov      ecx, (160 / 4)-1
         rep      stosd                  ; clear last line
         mov      edi, 2
         mov      edx, (160 * 23)
         pop      esi
         dec      esi
         jmp      short @TRUNK

@BRANCH3:   mov      al, [esi]         ; load next byte
         inc      esi
         cmp      al, 0Dh            ; exclude any trailing CR or LFs
         je      short @BRANCH3
         cmp      al, 0Ah   
         je      short @BRANCH3

@BRANCH0:   add      edx, 160      ; removes the last space
@BRANCH1:   mov      edi, 2h         ; removes the first space
         jmp      short @TSUB
@BRANCH4:   mov      edx, 3
         

@RETURN:
   cmp      edx, (160 * 2)         ; adjust y if its below 3rd row
   ja      short .@A            ; reason why this check is not at the beginning is to
                           ; allow a title output and then change postion to
   mov      edx, (160 * 2)         ; third row.
   .@A:

   cli                                 ; no interrupts should occur for this update
   mov      ds:[dwConsolePositionX], edi      ; save row and column state
   mov      ds:[dwConsolePositionY], edx      ; Y is multiple of 160's

@FAIL:
   popfd            ; restores IF
   pop      edi
   pop      esi
   pop      edx
   pop      ecx
   ret      8h

___ConsoleOutput@8 ENDP


Re:Bochs or me?

Posted: Sat May 27, 2006 9:03 am
by Brendan
Hi,

Some quick questions...

Where does it GPF? You should be able to get EIP from Bochs log and figure out which instruction caused the GPF (which would help).

I also don't understand this bit of code:

Code: Select all

   mov      eax, ds:[dwConsoleFlags]
   test   eax, 1h
   jz      @FAIL
Is there anything anywhere that clears the lowest bit in "dwConsoleFlags", or is it always set?

I would've expected an "interrupt safe" spinlock or something, like this (for e.g.):

Code: Select all

   test [dwConsoleLock],1
   je @tryLock
   jmp @waitLock2

@waitLock1
   popfd
@waitLock2
   pause
   test [dwConsoleLock],1
   jne @waitLock2

@tryLock
   pushfd
   cli
   lock bts [dwConsoleFlags],0
   jc @waitLock1

; ** CRITICAL SECTION HERE **

@AllDone
   mov [dwConsoleLock],0     ;Free the lock
   popfd
; Return stuff here
Of course this might be overkill on single-CPU systems - if your OS is single CPU only then you're only worried about protecting the code from thread switches and IRQs, so you could probably just do:

Code: Select all

    pushfd
    cli

; ** CRITICAL SECTION HERE **

    popfd
; Return stuff here
I'm also not sure why the timer IRQ handler is doing this:

Code: Select all

   mov      ds:[dwConsolePositionX], ecx
   mov      ds:[dwConsolePositionY], edx
To me it looks like this should be done within the "ConsoleOutput" code's critical section (if it needs to be done at all).


Cheers,

Brendan

Re:Bochs or me?

Posted: Sat May 27, 2006 9:47 am
by Ryu
Where does it GPF? You should be able to get EIP from Bochs log and figure out which instruction caused the GPF (which would help).
00004687497e[CPU0 ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
00004687497i[CPU0 ] protected mode
00004687497i[CPU0 ] CS.d_b = 32 bit
00004687497i[CPU0 ] SS.d_b = 32 bit
00004687497i[CPU0 ] | EAX=07200700 EBX=00000000 ECX=00000000 EDX=00000254
00004687497i[CPU0 ] | ESP=00000f4c EBP=00009000 ESI=0009071c EDI=00010020
00004687497i[CPU0 ] | IOPL=0 ID vip vif ac vm RF nt of df IF tf sf ZF af PF cf
00004687497i[CPU0 ] | SEG selector base limit G D
00004687497i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00004687497i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00004687497i[CPU0 ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004687497i[CPU0 ] | SS:0018( 0003| 0| 0) 00000000 00000fff 0 1
00004687497i[CPU0 ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004687497i[CPU0 ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004687497i[CPU0 ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00004687497i[CPU0 ] | EIP=00090112 (00090112)
00004687497i[CPU0 ] | CR0=0x00000011 CR1=0 CR2=0x00000000
00004687497i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00004687497i[CPU0 ] >> eb
00004687497i[CPU0 ] >> f1
00004687497i[CPU0 ] >> : jmp .+0xfffffff1

At the moment I have to dissassembe the image and working it out..
Is there anything anywhere that clears the lowest bit in "dwConsoleFlags", or is it always set?
This bit is set through the life time of the console, basically its to indicate if the text mode is enabled so that ConsoleOutput is to work properly. But yeah its suppose to hold some spinlock flags but for what I am testing I at least don't think I need to.
Of course this might be overkill on single-CPU systems - if your OS is single CPU only then you're only worried about protecting the code from thread switches and IRQs, so you could probably just do:

Code: Select all

    pushfd
    cli

; ** CRITICAL SECTION HERE **

    popfd
; Return stuff here
This was what I had it, so the call is completely IRQ safe, however It generates a GPF, and I started tinkering. (The GPFs I get seem to occur after some time of pulses of interrupts). But in therory its acceptable for a interrupt to occur during writing to video buffer, but dwConsolePositionX and dwConsolePositionY should not be changed after the ISR is done with it.. so that explains the following:
I'm also not sure why the timer IRQ handler is doing this:

Code: Select all

   mov      ds:[dwConsolePositionX], ecx
   mov      ds:[dwConsolePositionY], edx
To me it looks like this should be done within the "ConsoleOutput" code's critical section (if it needs to be done at all).
Restore the position after the interrupt uses ConsoleOutput(), Its basically a defective use but is effective for the test of the mutex of dwConsolePositionX and dwConsolePositionY

Re:Bochs or me?

Posted: Sat May 27, 2006 10:08 am
by Ryu
EIP=00090112 works out to be at jmp @hang - where the loop is :-\

Edit: the second time around Bochs ends with EIP=000910df which works out to be in ConsoleOutput() -> rep movsd. I think this is the offending instruction, the register dump I first posted I think it was from a old bochsout.txt that was opened heh heh.

Edit: After changing cli back to the right after pushfd, I get a GPF at popfd of OutputConsole(). I'm really utterly confused. I then looked up my resources on popfd if anything would be offending it during interrupts, which only says this:

The interrupt flag (IF) is altered only when executing at a level at least as privileged as the IOPL. If a POPF/POPFD instruction is executed with insufficient privilege, an exception does not occur, but the privileged bits do not change.

We are in same IOPL, however during a interrupt is it the same CPL==IOPL? Well my IDT Gate is DPL=0 so we should be okay. I'm so uttlerly lost, I have again rechecked on my real hardware and it runs perfectly. Perhaps I should compile a image and post it. I just have no software that creates floppy images for bochs at the moment if anyone can recommend me that would be nice.

Re:Bochs or me?

Posted: Sun May 28, 2006 12:11 am
by Brendan
Hi,
Ryu wrote:Edit: After changing cli back to the right after pushfd, I get a GPF at popfd of OutputConsole(). I'm really utterly confused. I then looked up my resources on popfd if anything would be offending it during interrupts, which only says this:
It could be that something trashed your stack, and your trying to load a copy of EFLAGS that has the VM flag set or something.
Ryu wrote:I'm so uttlerly lost, I have again rechecked on my real hardware and it runs perfectly.
That could just mean your real hardware has different timing... ;)
Ryu wrote:Perhaps I should compile a image and post it. I just have no software that creates floppy images for bochs at the moment if anyone can recommend me that would be nice.


For *NIX there's normally a "dd" command that works fine. For Windows, try RawWrite.


Cheers,

Brendan

Re:Bochs or me?

Posted: Sun May 28, 2006 1:54 pm
by Ryu
Brendan wrote:
It could be that something trashed your stack, and your trying to load a copy of EFLAGS that has the VM flag set or something.
I haven't touched anything to do with virtual 8086 mode yet :-
Brendan wrote: appear when I do get a GPF is ID PF IF (I suppose the generated GPF happend outside the interrupt routine).
Brendan wrote: That could just mean your real hardware has different timing... ;)
Yeah my hardware gots this multipler setting that makes the overall system think its going faster. 8)

Only joking, as I understand the PIT in real hardware if you set it for specific frequency then it should be running at that frequency, unless theres something you know that I dn't know. :-\

This problem still remains a mystery, I've compiled some files that contains 2 FAT12 images one 100hz and 200hz which are identical just the hertz was changed. The problem is shows that if I set bochs to sync=slowdown it seems fine but on realtime, the 200hz will generate a GPF, and possibly 100hz.

The package can be temperary downloaded here: http://69.194.132.218/os_pack.rar

edit: fixed link issue.

Re:Bochs or me?

Posted: Mon May 29, 2006 5:55 pm
by Ryu
I'm sorry to pull up a post that none seems to have an idea of my problem. But it has been quite some time now hacking at this puzzle. I have completely water'd this matter down (so I think). The GPF is caused by several factors:

1) A hanging loop that uses pushfd/cli/popfd, while inside it is copying memory from and to any address.
2) A ISR installed that is frequently accessed. My PIT is set up at 2000hz. And I'm sure this will happend not just the PIT.
3) The count of the rep movsd in the hanging loop. Meaning if I use the same count and use rep movsb it will still generate a GPF. Furthermore if I divide the count by 4 and use movsb/movsw/movsd, then no GPF. This count must be somewhere around 512+ which has nothing to do with the number of bytes as it seems.

Heres the code:

Code: Select all

???cli
???mov??????dx, 1193180 / 2000??????; 2000 hz
???mov??????al, 110110b????????????; square wave
???out??????43h, al???????????????; send command
???mov??????al, dl
???out??????40h, al???????????????; channel-0 7:0 divisor
???mov??????al, dh
???out??????40h, al???????????????; channel-0 15:8 divisor
???sti

???@hang:
?????????pushfd
?????????cli
?????????mov??????edi, 080000h
?????????mov??????esi, 090000h
?????????mov??????ecx, 512/8
?????????rep??????movsd
?????????popfd

???jmp???@hang

Code: Select all

;==========================================================================
___Interrupt20h??????PROC SYSCALL
;==========================================================================
;+++++++++++++++STACK+++++++++++++++++++10h+++++++++++++++++++++++++++++++++
???@EFLAGS???????????????equ??????esp+0Ch
???@CS??????????????????equ??????esp+08h
???@RIP???????????????equ??????esp+04h
???push???eax????????????;??????esp+00h
;++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
???mov??????al, 20h
???out??????20h, al??????; send EOI to channel-0
???pop??????eax
???iretd
___Interrupt20h??????ENDP
GPF will rise when the three componets are met it seems, that is the count of rep movsd/movsw/movsb, also if you do cli in hanging loop (which means at the end it must reset IF=1 as popfd should of handled) and ofcourse a frequent interrupted service is running. Nothing seems to make sense.

Some things:
-EOI is sent but on return will reset the state of IF therefore, sending a EOI before the return is okay.
-Entering the hanging loop disables interrupt, so why on earth does the count have to do with this? This may seem it have to do something with how long rep movsd is going to take but.. again interrupts is disabled.


Brendan: Please confirm this is a bochs bug or I'm going to loose it. :-\

Re:Bochs or me?

Posted: Tue May 30, 2006 12:03 am
by Brendan
Hi,
Ryu wrote:Brendan: Please confirm this is a bochs bug or I'm going to loose it. :-\
Would you mind sending me a boot image of the minimalized code?

I've been trying your last boot images (PIT100hz.img and PIT200hz.img) but I haven't been able to get them to crash - I just get a sea of scrolling "tock!!tick"...

Also, which version of Bochs are you using? I'm using the CVS version from about a month ago...

It is possible that Bochs does have a bug somewhere (I can't find anything wrong with the code you've posted). Bochs does have bugs but they are extremely rare for common things (like the CPU itself, PIC, PIT, etc).

I'm also not sure which instruction/s trigger the GPF or what is on the stack after the GPF (the error code pushed on the stack by the CPU, etc). Also, are all other IRQs masked?

It might also help to know how you've compiled Bochs - which optimizations were enabled (and possibly if SMP was enabled, as SMP disables some of the optimizations).

Before reporting a bug to the Bochs developers, it's best to narrow it down to an exact problem that can be reproduced (for example, something like "For the latest CVS version of Bochs (dated 1/2/3456) the POPFD instruction causes a GPF when an IRQ occurs between the POPFD and the next instruction after <something> happens, if Bochs is compiled with "--enable-repeat-speedups" and "--enable-new-pit", and if <something else> is set to <something else>.").

You've done a very good job of narrowing things down already, but I haven't been able to reproduce the problem and it's still not precise enough to tell exactly where a bug might be in Bochs....


Cheers,

Brendan

Re:Bochs or me?

Posted: Tue May 30, 2006 4:27 am
by mystran
Here's a suggestion:

Run Bochs with the-setting-that-shows-current-ips.
Then set that value as your IPS, and try without realtime sync.

If that will reproduce it, then I'd almost guess that you are getting too many nested interrupts or something..