Vesa mouse

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Dex4u

Vesa mouse

Post by Dex4u »

I am starting work on GUI, one thing i want is to make shore the mouse is responsive, even on a 233mhz Pentium 2. i have seen in other hobby OS a slow respose to mouse movements, eg: floaty .
What do you think is the best to "set resolution" and "set Scalling" for mouse ?.
And best way to blitter it to screen (as in fastest), as even using blitter, you still need to move the full offscreen buff to screen.
Thanks in advance.

PS: I am not looking for code just addvice ;).
blip

Re:Vesa mouse

Post by blip »

If I am not mistaken, OSs in wide use set these somewhat dynamically as they relate to pointing precision. Say you send your mouse flying, you probably want the cursor to move quickly across the screen whereas if you move it slowly you're probably trying to accurately point at something small. So you set the resolution lower to prevent overflows and higher when you reach a low displacement per count threshold (you don't want to be stuck at low resolutions, it makes it look less responsive and jittery). Scaling is geared more towards the precision aspect and should be set by the user in some kind of configuration menu; I think this page describes it well under the heading "Inputs, Resolution, and Scaling". Standard serial mice AFAIK are unable to change scalings or resolution so if you've used any you would feel a difference. I don't know the fastest or best way to blit, just some basic ideas.
Dex4u

Re:Vesa mouse

Post by Dex4u »

Thanks for your input blip
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Vesa mouse

Post by Brendan »

Hi,
Dex4u wrote: I am starting work on GUI, one thing i want is to make shore the mouse is responsive, even on a 233mhz Pentium 2. i have seen in other hobby OS a slow respose to mouse movements, eg: floaty .
I think this is caused by bad design rather than anything else - for example, redrawing the entire screen every time the mouse moves or not handling the mouse data immediately (e.g. running other tasks instead of pre-empting and switching immediately to the GUI).

For a start, I'd make the video driver draw the actual mouse pointer (it can be hardware accelerated on most video cards). Without hardware acceleration, the video driver would copy the pixel data underneath the mouse, then display the mouse pointer. If the mouse moves you'd redraw what was underneath it (not the entire screen), then copy the pixels underneath and draw it again.

If it's done properly, the mouse should always remain responsive, even without hardware acceleration on a 25 MHz 80486 running in 1028 * 768 32 bpp video mode.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Dex4u

Re:Vesa mouse

Post by Dex4u »

Brendan wrote:I think this is caused by bad design rather than anything else -
I agree with you there Brendan, but its thats, i want to avoid bad design by getting what is a good design ver a bad design.
My design so far which i run on a test bed of a 233mhz pentium 2 32 MB of ram, with vesa2, set to 640x480 24bpp (as in RGBRGB not the better 32bpp XRGBXRGB).

I have set up a timer program and the out come is not good :(.
With nothing eles running using this code

Code: Select all

 ;----------------------------------------------------;
 ; PutBmp24                                     24bpp ;
 ;----------------------------------------------------;
FillScreen:
         pushad
         push  es
         mov   ax,8h
???      mov   es,ax
         mov   edi,[ModeInfo_PhysBasePtr]
         mov   esi,ScreenBuffer
         mov   ecx,640*480 ;this hard coded is temp
         cld
@@:
         movsd
         dec   edi
         loop  @b
         pop   es
         popad
         ret
I am only getting 19 FPS.
Using this

Code: Select all

 ;----------------------------------------------------;
 ; PutBmp24                                     24bpp ;
 ;----------------------------------------------------;
FillScreen:
         pushad
         push  es
         mov   ax,8h
??     ? mov   es,ax
         mov   edi,[ModeInfo_PhysBasePtr]
         mov   esi,ScreenBuffer
         mov   ecx,640*480 ;this hard coded is temp
         cld
         rep movsd
         pop   es
         popad
         ret
I am getting 37 FPS,
But to use this code i wound need to make the buff, 3byte chucks in stead, which may move a bottleneck there.

The mouse pointer is a compiled sprites based on info i got here: http://www.nondot.org/sabre/graphpro/sprite4.html
Which works real well, but i see a problem, if you use this to draw direct to screen, its works well, but if you then blits the offscreen buff to screen, then move the mouse you may draw the backup of the old screen.

Thanks for your input.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Vesa mouse

Post by Brendan »

Hi,
Dex4u wrote:With nothing eles running using this code
Dex4u wrote:I am only getting 19 FPS.
This code is broken anyway - it writes one byte too many at the end of visible display memory and won't work if the video mode has an unseen part. For e.g. video display memory could look like this:

[t]XXXXXX---
XXXXXX---
XXXXXX---
XXXXXX---[/tt]

The other problem is that VGA display memory doesn't like unaligned accesses.

To fix it, try something like this:

Code: Select all

 ;----------------------------------------------------;
 ; PutBmp24                                     24bpp ;
 ;----------------------------------------------------;
FillScreen:
         pushad
         push  es
         mov   ax,8h
         mov   es,ax
         mov   ebp,480           ;this hard coded is temp
         lea   eax,[ebp*2+ebp]
         mov   edx,[ModeInfo_BytesPerLine]
         mov   edi,[ModeInfo_PhysBasePtr]
         sub   edx,eax
         mov   esi,ScreenBuffer
         cld

.l1:
         mov   ecx,640/4         ;this hard coded is temp
.l2:
         mov   eax,[esi]         ;eax = -- R1 G1 B1
         mov   ebx,[esi+4]       ;ebx = -- R2 G2 B2
         shl   eax,8             ;eax = R1 G1 B1 --
         shrd  eax,ebx,8         ;eax = B2 R1 G1 B1
         stosd

         mov   ax,[esi+8]        ;eax = -- -- G3 B3
         shr   ebx,8             ;ebx = -- -- R2 G2
         shl   eax,16            ;eax = G3 B3 -- --
         or    eax,ebx           ;eax = G3 B3 R2 G2
         stosd

         mov   bl,[esi+10]       ;ebx = -- -- -- R3
         mov   eax,[esi+12]      ;eax = -- R4 G4 B4
         shr   eax,8             ;eax = R4 G4 B4 --
         mov   al,bl             ;eax = R4 G4 B4 R3
         stosd

         add esi,16
         loop  .l2

         add edi,edx
         sub ebp,1
         ja .l1

         pop   es
         popad
         ret

This does 4 pixels at a time (three 32-bit writes that are always aligned, instead of four 32 bit writes that are unaligned 75% of the time). It won't work for extremely unusual video modes - the horizontal resolution must be a multiple of 4 (but it's very unlikely this matters).

It's hard to predict the speed difference this will make - I'm curious! :)
Dex4u wrote:The mouse pointer is a compiled sprites based on info i got here: http://www.nondot.org/sabre/graphpro/sprite4.html
Which works real well, but i see a problem, if you use this to draw direct to screen, its works well, but if you then blits the offscreen buff to screen, then move the mouse you may draw the backup of the old screen.
From the web site you've mentioned:
  • # They can be very large
    # Clipping is impossible
    # They are not very portable
For a GUI or something "no clipping" means you're screwed if the mouse pointer is near the (left or bottom?) edge of the screen, and "not portable" means you need a different compiled sprite for each variation of each video mode.

Also, often a GUI will allow the application to supply it's own mouse pointer - it might look like a sniper's cross-hairs in a game, an hourglass in windows, a hand if the user is moving things (scroll bars, etc), a magnifying glass for zooming in and out, etc.

For hardware accelerated mouse pointers the mouse pointer data often consists of a pair of 32*32 (or 64*64) bitmaps. One bitmap is a mask of which pixels are changed, and the other bitmap determines if a pixel will be black or white (if it's changed). This varies though (different video cards do it differently).

For these reasons I'd recommend inventing a "standard mouse pointer format" (somthing that can be converted to the format used for hardware acceleration, but supports colour too). I wouldn't use compiled sprites - they'd save a little CPU time, but it shouldn't take long to copy and update a 32*32 square of screen memory anyway and it's not like you'll be having hundreds of mouse pointers displayed on the screen.... :)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
distantvoices
Member
Member
Posts: 1600
Joined: Wed Oct 18, 2006 11:59 am
Location: Vienna/Austria
Contact:

Re:Vesa mouse

Post by distantvoices »

For mouse drawing in Vesa I'd use a (depending on your target resolution) small *.bmp with the mouse image - loaded into memory by any process and fed to the gui - in combination with a back buffer of the same size as the mouse pointer: draw, save each pixel you draw on in the mean time.move: erase pointer from old loco (restore contents of back buffer) and draw it to new loco (saving of old screen content included)

To achieve responsiveness, I recommend having gui threads preempt other user processes. One might give gui threads a higher priority too, so they are on level with service threads. I wouldn't give them highest priority. Someone must still be able to kick @$$ if something goes haywire. ;-)

Stay safe
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
Dex4u

Re:Vesa mouse

Post by Dex4u »

@beyond infinity, Thank i will give your way a test.

@Brendan, Wow if your not well paid in your job, you should be, Your code give me 31 FPS which is a big inc over 19 FPS :).

I can get 37 FPS, using rep movsd (on the same pc), but this mean the off screen buff need to be RGBRGB, which justs pushers the problem when moving data there.

Your right about chipping, but at the time i was going to make off screen buffer bigger than the screen and doing somethig like this:

Code: Select all

;we would set other thinks up here first
xor  ecx,ecx
mov cx,ModeInfo_YResolution
@@:
push ecx
mov  cx,[ModeInfo_XResolution]
rep movsd
add esi,[bufferAngover]
pop ecx
loop @b
But its much to slow.

Thanks Brendan you been a big help :) .
blip

Re:Vesa mouse

Post by blip »

I don't have access to all your code so I don't know how much you are updating your offscreen buffer before blasting it out to the video card, but chances are you don't need to copy the entire thing. For one, it is always best to limit I/O access to improve speed.

In addition to maintaining an offscreen buffer, make and tend to a bitmap specifying which regions have changed since the last update. The way I implemented it (not in an OS) was to have a bit for every dword and graphics primitives procedures that would draw and update the bitmap. I had the PIT synched to the vertical retrace frequency and an ISR would update the changed parts in video memory and clear the bitmap as it went. The region size per bit is really up to you, and you are not limited to just one bitmap. You could have multiple levels, e.g. level 1 tells you which 4 KB pages need updating, the next breaks down to 2 KB, etc. As always programming is an exercise in caching. :) For some reason I just assumed you coded it like this but wanted a super-optimized version. :-\
Dex4u

Re:Vesa mouse

Post by Dex4u »

At the moment, i just blit the same offscreenBuff to screen ( same size as screen) in a 1 second timer loop, this is just for test purpose, to get the fastest full screen update, this will not be used every update, but when its needed i need the fastest way.
So all the code is doing is moving as much data in 1 second, the offscreenbuff is not updated in the loop.

Your bitmap idea is the same as i was thinking about for desktop icons.
I want to try to optimise every bit, not taking anything for granted and go by the best FPS, along with works well.

Thanks blip :).

PS: I will put all test results, in a GUI tut, along with GUI demos, so it may help other coders.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Vesa mouse

Post by Candy »

Did you calculate what the theoretical optimum would be? How far in percentages are you from the theoretical maximum?

How much cycles does a stosd or movsd to video memory take?
Dex4u

Re:Vesa mouse

Post by Dex4u »

Candy wrote: Did you calculate what the theoretical optimum would be? How far in percentages are you from the theoretical maximum?
No, do you have any good formulars :) .
How much cycles does a stosd or movsd to video memory take?
0. LIST OF 2INTEGER INSTRUCTIONS
================================
Explanations:
Operands: r=register, m=memory, i=immediate data, sr=segment register
m32= 32 bit memory operand, etc.

Clock cycles:
The numbers are minimum values. Cache misses, misalignment, and exceptions
may increase the clock counts considerably.

Pairability:
u=pairable in U-pipe, v=pairable in V-pipe, uv=pairable in either pipe,
np=not pairable

Opcode Operands Clock cycles Pairability
----------------------------------------------------------------------------
LODS 2 np
REP LODS 7+3*n g) np
STOS 3 np
REP STOS 10+n g) np
MOVS 4 np
REP MOVSB 12+1.8*n g) np
REP MOVSW 12+1.5*n g) np
REP MOVSD 12+n g) np
----------------------------------------------------------------------------
Thanks candy.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Vesa mouse

Post by Brendan »

Hi,
Dex4u wrote:@Brendan, Wow if your not well paid in your job, you should be, Your code give me 31 FPS which is a big inc over 19 FPS :).

I can get 37 FPS, using rep movsd (on the same pc), but this mean the off screen buff need to be RGBRGB, which justs pushers the problem when moving data there.
I'd guess that 37 FPS would be at (or close to) the theoretical limit for your CPU/bus/video card. For "rep movsd" the CPU has a "fast strings" feature where its shifts entire cache lines rather than transferring dwords.

Unfortunately, for maximum compatibility you can't use a single "rep movsd". Instead you'd need to do "rep movsd" for each screen line to avoid problems when display memory is "wider" than the visible part of display memory. This leads to the first optimization - check if display memory is "wider" or not and have 2 seperate blit routines, one that does it in one operation and the other that does one line at a time.

For the code I posted, I optimized writes to display memory and nothing else - it can be improved (there's too many register dependancies in the inner loop and it isn't as good as it could be when reading the source data).

Code: Select all

;----------------------------------------------------;
; PutBmp24                                    24bpp ;
;----------------------------------------------------;
FillScreen:
        pushad
        push  es
        cld
        mov  ax,8h
        mov  es,ax

        mov  edi,[ModeInfo_PhysBasePtr]
        mov  esi,ScreenBuffer

        cmp [ModeInfo_BytesPerLine],640*3
        je .fullScreenBlit

        mov  ebp,480          ;this hard coded is temp
        mov  edx,[ModeInfo_BytesPerLine]
        lea  eax,[ebp*2+ebp]
        sub  edx,eax

        push ebp
   push edx
%define .linesRemaining [esp+4]
%define .skippedBytesPerLine [esp]

.l1a:
        cmp [esi+28],eax      ;Cache prefetch!
.l2a:
        mov  ebp,640/4-1      ;this hard coded is temp
.l3a:
        mov  eax,[esi]        ;eax = -- R1 G1 B1
        mov  ebx,[esi+4]      ;ebx = -- R2 G2 B2
        mov  ecx,[esi+8]      ;ecx = -- R3 G3 B3
        mov  edx,[esi+12]     ;edx = -- R4 G4 B4

        shl  eax,8            ;eax = R1 G1 B1 --
        rol  ecx,16           ;ecx = G3 B3 -- R3
        shl  edx,8            ;edx = R4 G4 B4 --
        shrd eax,ebx,8        ;eax = B2 R1 G1 B1
        mov  dl,ch            ;edx = R4 G4 B4 R3
        shl  ebx,8            ;ebx = -- -- R2 G2
        mov  [es:edi],eax     ;Store B2 R1 G1 B1
        mov  cx,bx            ;ecx = G3 B3 R2 G2
        mov  [es:edi+4],ecx   ;Store G3 B3 R2 G2
        mov  [es:edi+8],edx   ;Store R4 G4 B4 R3

        add  esi,16
        add  edi,12

        sub  ebp,1            ;Are there pixels left for this line?
        ja   .l1a             ; yes lots, continue with cache prefetch
        je   .l2a             ; yes 4 pixels left, continue without cache prefetch

        add  edi,.skippedBytesPerLine
        sub  dword .linesRemaining,1
        ja   .l1a

   add  esp,8            ;Remove temporary variables
        pop  es
        popad
        ret


.fullScreenBlit:
        mov  ebp,640*480/4-1  ;this hard coded is temp
.l1b:
        cmp [esi+28],eax      ;Cache prefetch!
.l2b:
        mov  eax,[esi]        ;eax = -- R1 G1 B1
        mov  ebx,[esi+4]      ;ebx = -- R2 G2 B2
        mov  ecx,[esi+8]      ;ecx = -- R3 G3 B3
        mov  edx,[esi+12]     ;edx = -- R4 G4 B4

        shl  eax,8            ;eax = R1 G1 B1 --
        rol  ecx,16           ;ecx = G3 B3 -- R3
        shl  edx,8            ;edx = R4 G4 B4 --
        shrd eax,ebx,8        ;eax = B2 R1 G1 B1
        mov  dl,ch            ;edx = R4 G4 B4 R3
        shl  ebx,8            ;ebx = -- -- R2 G2
        mov  [es:edi],eax     ;Store B2 R1 G1 B1
        mov  cx,bx            ;ecx = G3 B3 R2 G2
        mov  [es:edi+4],ecx   ;Store G3 B3 R2 G2
        mov  [es:edi+8],edx   ;Store R4 G4 B4 R3

        add  esi,16
        add  edi,12

        sub  ebp,1
        ja   .l1b
        je   .l2b

        pop  es
        popad
        ret
I'm guessing that on your computer this might get to around 33 FPS. I could get it a touch faster when display memory is "wider" than the visible part by using a seperate loop for the last screen line, but it's probably not worth the trouble. For the code that does the entire screen in one operation I can't think of any improvements that don't make it dependant on which CPU is used (64 bit code, MMX, SSE), but I think we're close to the maximum transfer speed your bus and video card can handle anyway... :)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Vesa mouse

Post by Brendan »

Hi,
Candy wrote: Did you calculate what the theoretical optimum would be? How far in percentages are you from the theoretical maximum?

How much cycles does a stosd or movsd to video memory take?
From this web page:

"The PCI bus comes in several configurations. The base configuration has a 32-bit wide data bus operating at 33 MHz. Like the CPU's local bus, the PCI is theoretically capable of transferring data on each clock cycle. This provides a theoretical maximum of 132 MBytes/second data transfer rate (33 MHz times four bytes). In practice, the PCI bus doesn't come anywhere near this level of performance except in short bursts. Whenever the CPU wishes to access a peripheral on the PCI bus, it must negotiate with other peripheral devices for the right to use the bus. This negotiation can take several clock cycles before the PCI controller grants the CPU the bus. If a CPU writes a sequence of values to a peripheral a double word per bus request, then the negotiation takes the majority of the time and the data transfer rate drops dramatically. The only way to achieve anywhere near the maximum theoretical bandwidth on the bus is to use a DMA controller and move blocks of data. In this block mode the DMA controller can negotiate just once for the bus and transfer a fair sized block of data without giving up the bus between each transfer. This "burst mode" allows the device to move lots of data quickly."

For a Pentium II I'd expect a 33 MHz 32-bit PCI bus (it pre-dates AGP I think).

For "rep movsd" Dex gets 37 FPS, which works out to 640*480*3*37 bytes per second, or about 32.5 MB per second. If the theoretical maximum is 132 MBytes/second, and if that is only achieved with bus mastering, then it seems very likely that the PCI bus is the limiting factor here.

For a hardware accelerated video card (using bus mastering), you could probably achieve a maximum of 150 FPS on the same PCI bus, but we can all imagine how easy it'd be to get the hardware information needed and write a video driver for each card.

The only other thing Dex could do is to make sure the CPU is using "write combining" for display memory, which would reduce the number of transfers from the CPU to the video card (i.e. less transfers across the PCI bus with more bytes per transfer). This means messing with the MTRRs (Memory Type Range Registers) if the BIOS hasn't already setup write combining.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Vesa mouse

Post by Candy »

Dex4u wrote:
How much cycles does a stosd or movsd to video memory take?
0. LIST OF 2INTEGER INSTRUCTIONS
================================
Explanations:
Operands: r=register, m=memory, i=immediate data, sr=segment register
m32= 32 bit memory operand, etc.

Clock cycles:
The numbers are minimum values. Cache misses, misalignment, and exceptions
may increase the clock counts considerably.

Pairability:
u=pairable in U-pipe, v=pairable in V-pipe, uv=pairable in either pipe,
np=not pairable

Opcode Operands Clock cycles Pairability
----------------------------------------------------------------------------
LODS 2 np
REP LODS 7+3*n g) np
STOS 3 np
REP STOS 10+n g) np
MOVS 4 np
REP MOVSB 12+1.8*n g) np
REP MOVSW 12+1.5*n g) np
REP MOVSD 12+n g) np
----------------------------------------------------------------------------
Thanks candy.
You're welcome and you've completely missed the point. These numbers are for rep movsd without taking memory access time into consideration, which is EXACTLY what I was thinking of. Some example thoughts: The PCI bus can take 133MB/s, so if I reach within a small constant factor of this it might just be the limiting factor. If I have 200 million cycles and I spend 92% of them doing rep movsd, it's the limiting factor and I don't have to bother finding any faster method.

Your numbers don't account for a P2 btw, whereas you are testing on a P2. The values given are for a P1 (with U and V pipeline).

For a constructive addition, you could set the MTRR of the video memory to write-combining if you haven't done so already.

It's also possible that, since you're not using dma or hw blitting or such that the writes are done in single-cycle pci writes, which considerably slows it down (5 pci cycles per dword instead of 1).
Post Reply