Page 1 of 1

Sluggish SVGA drivers

Posted: Thu Dec 16, 2010 5:46 pm
by ecco
I've got two SVGA drivers (one for Bochs and one for VMWare Workstation) and for now I've created a generic SVGA library that has primitive drawing functions like pset, line, rect, etc.

I'm working in 1024x768x32 so the frame buffer is 3MB. Currently I'm only running one process on my task scheduler that clears the rect under the old cursor location and draws the cursor at its new location and then copies the back buffer to the frame buffer:
memcpy(vga_buffer, back_buffer, 1024*768)

Probably should mention that my memcpy function depending on the size value will either movsb movsw or movsd. In this case I'm certain it's using movsd.

FYI the VMWare driver is at least fast enough to clear the screen and draw the mouse cursor fast enough to "feel" like an typical OS. I realize that I won't be redrawing every object ever frame but I should be able to at least update a good portion of the screen each frame without this much performance loss.

Should I be using some sort of DMA for this transfer or possibly MMX instructions?

Re: Sluggish SVGA drivers

Posted: Fri Dec 17, 2010 5:55 am
by Creature
Brendan (and others too) have already posted about performance and caveats in graphics drawing. You may be able to find the topics by using the search function. I'll give you a few things that come from the top of my head.

First things first. You know that you won't be redrawing every part of the screen every time. Redrawing the entire screen constantly is always slow without any type of acceleration, and there aren't many ways to speed it up. The following idea's have passed the forums a couple of times:
  • Set the MTRR for your video memory range to Write-Combining. This can give you an increase in speed (I remember someone mentioning a number of up to 10%, but I'm not sure about that and it probably depends on your implementation and hardware).
  • Use a back buffer (which you're, judging from your post, already doing), since writing to memory is faster than writing to video memory (reading is possibly even slower on video memory).
  • Avoid drawing where you don't need to: use so-called 'dirty rectangles'. Usually you keep a list of the area's on the screen that require redrawing. When your screen is going to be refreshed, you parse this list and only overwrite the parts that are necessary. Note that you can even attempt to combine rectangles that overlap.
I also remember Combuster posting a link to the Graphics Programming Black Book somewhere (which is fully online), it might be worth it to take a gaze at some of the information in there.

Also, you said you were using movs* assembly instructions to write to video memory. Although this is good (the functions are fast), you must beware that lines are not always continuous in memory. As Brendan pointed out a while ago, sometimes there is a 'padding' between lines in graphics memory, which can even be used by some cards for features and may not always be overwritten. You also may want to take a look at this page.

Re: Sluggish SVGA drivers

Posted: Fri Dec 17, 2010 4:15 pm
by Owen
On real processors (and emulators which use VT/SVM), the rep movs instructions are very slow. Use an unrolled SSE loop for buffer copies.

Re: Sluggish SVGA drivers

Posted: Sun Dec 19, 2010 2:02 pm
by ecco
Well after 30 hours of straight coding and research I thoroughly despise VGA/SVGA hardware. I tried switching to the Cirrus driver and never got the mode settings right... but got close enoguh to read the performance info and there is no improvement over bochs VBE.

My version of bochs boots with the PAT as follows:

Code: Select all

PA0: WB  PA4: WB
PA1: WT  PA5: WT
PA2: UC- PA6: UC-
PA3: UC  PA7: UC
The Variable MTRR contains only one entry:

Code: Select all

PHYS0: 0xC0000000
MASK0: 0xC0000800
I've tried changing the PA7 entry to WC and setting the PAT, PCD and PWT bits of the page used to reference my frame buffer. It also might be worth noting the CPU has CD and NW set in CR0 (I've tested with and without these bits set).

Currently Bochs takes around 78ms to completely clear the back buffer and draw the cursor
VMWare on the other takes < 1ms and gets around 100 fps.

Also VMWare boots with the PAT table completely clear (All entries Uncacheable (UC)).

Re: Sluggish SVGA drivers

Posted: Sun Dec 19, 2010 2:28 pm
by ecco
Also thank you both for the input... I think I'm going to shelf the caching mode idea for now (I suspect virtual machines ignore them anyways) and move on to testing SSE.

Re: Sluggish SVGA drivers

Posted: Sun Dec 19, 2010 6:45 pm
by Brendan
Hi,
ecco wrote:Also thank you both for the input... I think I'm going to shelf the caching mode idea for now (I suspect virtual machines ignore them anyways) and move on to testing SSE.
The main problem is that on emulators (and on real hardware), regardless of how it's done, writes to display memory are very slow compared to something like RAM (and reads from display memory are worse). Reducing the number of actual writes by using SSE and/or write-combining helps a little and might get you about 20% better performance. Reducing the number of actual writes by implementing code that isn't insanely stupid (e.g. writing 3 MiB of data when only a few KiBs actually changed since last time) can help a lot more and might get you about 500% better performance. It's like worrying about a splinter in your finger while both your legs are being ripped off by lion.

You need to keep track of which areas of the buffer have been modified since last time, and avoid re-writing parts in display memory that didn't change. There's lots of ways of doing this - "dirty rectangles" is one way, splitting the screen up into smaller pieces and having a flag for each piece is another way.

When you are doing something to keep track of which areas of the buffer have been modified, you can still improve performance further by maintaining a copy of display memory in RAM. Before every write to display memory, you check the copy of display memory in RAM to see if it can be skipped (and update both display memory and the copy in RAM if the write can't be avoided). This means (for e.g.) if someone replaces white pixels with white pixels the "new" white pixels aren't sent to display memory even though the area was modified.


Cheers,

Brendan

Re: Sluggish SVGA drivers

Posted: Sun Dec 19, 2010 9:37 pm
by ecco
Thanks, that seems to be the consensus of all my searching... The SSE instructions seemed to make the biggest difference (Bochs went from the 78ms per frame down to around 43ms). I think it slowed down VMWare.... can't remember it was a blur of VGA register programming and browsing the Intel docs... Writing just the few bytes for the cursor does wonders ;)....

On a plus note my fault handler now prints the failed func via an elf symbol table lookup method much like price is right (find the closest address the doesn't "overbid").

Re: Sluggish SVGA drivers

Posted: Thu Feb 03, 2011 12:19 pm
by arabasso
Blits the entire screen is extremely slow, even using MMX / SSE. Double Buffer without hardware acceleration has a "horrible performance. "

If you're on console, you can use Page Flipping (scrolling), which has a much better performance than Double Buffer. If you're already designing a GUI, DRS (Dirty Rectangle System) method is great.

When implementing GUI on my OS, I tested all three types mentioned above, both Double / Triple Buffering and Page Flipping was almost impossible to drag a window. DRS could have a great performance, even with the VESA driver without any hardware acceleration. I create 3 blit routines, a basic and others with MMX and SSE).

An alternative to all this is to create a driver that uses Bus Mastering (BitBlT).

About DRS, more information here:

http://www.allegro.cc/forums/thread/233953
http://www.allegro.cc/forums/thread/408604

Re: Sluggish SVGA drivers

Posted: Fri Feb 04, 2011 8:12 am
by Chandra
Hi,

It is possible to optimize some of the graphics routine to gain optimized speed.

Imagine this:

Code: Select all

void clear_screen(char red,char green,char blue)
{

int x_cord=0;
int y_cord=0;

for(y_cord=0;y_cord<=768;y_cord++)
{
    for(x_cord=0;x_cord<=1024;x_cord++)
    {
    putpixel(x_cord,y_cord,red,green,blue);
    }
}

}
Now imagine this,

Code: Select all

void clear_screen(char red,char green,char blue)
{
unsigned long color;
color=(red<<16)+(green<<8)+(blue);   /* Assume 1024*768*24 bpp*/
unsigned long offset=0;

  for(offset=0;offset<=786432;offset++)
  {
  frame_buffer[offset]=color;
  }
}
I use similar routines while drawing horizontal lines.The optimized code is more than 3 times faster because in the former, each putpixel routine takes time to calculate the offset to plot the pixel. Similar optimizations can be made on the other graphic routines as well.

Good Luck,
Chandra