Sluggish SVGA drivers
Sluggish SVGA drivers
I've got two SVGA drivers (one for Bochs and one for VMWare Workstation) and for now I've created a generic SVGA library that has primitive drawing functions like pset, line, rect, etc.
I'm working in 1024x768x32 so the frame buffer is 3MB. Currently I'm only running one process on my task scheduler that clears the rect under the old cursor location and draws the cursor at its new location and then copies the back buffer to the frame buffer:
memcpy(vga_buffer, back_buffer, 1024*768)
Probably should mention that my memcpy function depending on the size value will either movsb movsw or movsd. In this case I'm certain it's using movsd.
FYI the VMWare driver is at least fast enough to clear the screen and draw the mouse cursor fast enough to "feel" like an typical OS. I realize that I won't be redrawing every object ever frame but I should be able to at least update a good portion of the screen each frame without this much performance loss.
Should I be using some sort of DMA for this transfer or possibly MMX instructions?
I'm working in 1024x768x32 so the frame buffer is 3MB. Currently I'm only running one process on my task scheduler that clears the rect under the old cursor location and draws the cursor at its new location and then copies the back buffer to the frame buffer:
memcpy(vga_buffer, back_buffer, 1024*768)
Probably should mention that my memcpy function depending on the size value will either movsb movsw or movsd. In this case I'm certain it's using movsd.
FYI the VMWare driver is at least fast enough to clear the screen and draw the mouse cursor fast enough to "feel" like an typical OS. I realize that I won't be redrawing every object ever frame but I should be able to at least update a good portion of the screen each frame without this much performance loss.
Should I be using some sort of DMA for this transfer or possibly MMX instructions?
Re: Sluggish SVGA drivers
Brendan (and others too) have already posted about performance and caveats in graphics drawing. You may be able to find the topics by using the search function. I'll give you a few things that come from the top of my head.
First things first. You know that you won't be redrawing every part of the screen every time. Redrawing the entire screen constantly is always slow without any type of acceleration, and there aren't many ways to speed it up. The following idea's have passed the forums a couple of times:
Also, you said you were using movs* assembly instructions to write to video memory. Although this is good (the functions are fast), you must beware that lines are not always continuous in memory. As Brendan pointed out a while ago, sometimes there is a 'padding' between lines in graphics memory, which can even be used by some cards for features and may not always be overwritten. You also may want to take a look at this page.
First things first. You know that you won't be redrawing every part of the screen every time. Redrawing the entire screen constantly is always slow without any type of acceleration, and there aren't many ways to speed it up. The following idea's have passed the forums a couple of times:
- Set the MTRR for your video memory range to Write-Combining. This can give you an increase in speed (I remember someone mentioning a number of up to 10%, but I'm not sure about that and it probably depends on your implementation and hardware).
- Use a back buffer (which you're, judging from your post, already doing), since writing to memory is faster than writing to video memory (reading is possibly even slower on video memory).
- Avoid drawing where you don't need to: use so-called 'dirty rectangles'. Usually you keep a list of the area's on the screen that require redrawing. When your screen is going to be refreshed, you parse this list and only overwrite the parts that are necessary. Note that you can even attempt to combine rectangles that overlap.
Also, you said you were using movs* assembly instructions to write to video memory. Although this is good (the functions are fast), you must beware that lines are not always continuous in memory. As Brendan pointed out a while ago, sometimes there is a 'padding' between lines in graphics memory, which can even be used by some cards for features and may not always be overwritten. You also may want to take a look at this page.
When the chance of succeeding is 99%, there is still a 50% chance of that success happening.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: Sluggish SVGA drivers
On real processors (and emulators which use VT/SVM), the rep movs instructions are very slow. Use an unrolled SSE loop for buffer copies.
Re: Sluggish SVGA drivers
Well after 30 hours of straight coding and research I thoroughly despise VGA/SVGA hardware. I tried switching to the Cirrus driver and never got the mode settings right... but got close enoguh to read the performance info and there is no improvement over bochs VBE.
My version of bochs boots with the PAT as follows:
The Variable MTRR contains only one entry:
I've tried changing the PA7 entry to WC and setting the PAT, PCD and PWT bits of the page used to reference my frame buffer. It also might be worth noting the CPU has CD and NW set in CR0 (I've tested with and without these bits set).
Currently Bochs takes around 78ms to completely clear the back buffer and draw the cursor
VMWare on the other takes < 1ms and gets around 100 fps.
Also VMWare boots with the PAT table completely clear (All entries Uncacheable (UC)).
My version of bochs boots with the PAT as follows:
Code: Select all
PA0: WB PA4: WB
PA1: WT PA5: WT
PA2: UC- PA6: UC-
PA3: UC PA7: UC
Code: Select all
PHYS0: 0xC0000000
MASK0: 0xC0000800
Currently Bochs takes around 78ms to completely clear the back buffer and draw the cursor
VMWare on the other takes < 1ms and gets around 100 fps.
Also VMWare boots with the PAT table completely clear (All entries Uncacheable (UC)).
Re: Sluggish SVGA drivers
Also thank you both for the input... I think I'm going to shelf the caching mode idea for now (I suspect virtual machines ignore them anyways) and move on to testing SSE.
Re: Sluggish SVGA drivers
Hi,
You need to keep track of which areas of the buffer have been modified since last time, and avoid re-writing parts in display memory that didn't change. There's lots of ways of doing this - "dirty rectangles" is one way, splitting the screen up into smaller pieces and having a flag for each piece is another way.
When you are doing something to keep track of which areas of the buffer have been modified, you can still improve performance further by maintaining a copy of display memory in RAM. Before every write to display memory, you check the copy of display memory in RAM to see if it can be skipped (and update both display memory and the copy in RAM if the write can't be avoided). This means (for e.g.) if someone replaces white pixels with white pixels the "new" white pixels aren't sent to display memory even though the area was modified.
Cheers,
Brendan
The main problem is that on emulators (and on real hardware), regardless of how it's done, writes to display memory are very slow compared to something like RAM (and reads from display memory are worse). Reducing the number of actual writes by using SSE and/or write-combining helps a little and might get you about 20% better performance. Reducing the number of actual writes by implementing code that isn't insanely stupid (e.g. writing 3 MiB of data when only a few KiBs actually changed since last time) can help a lot more and might get you about 500% better performance. It's like worrying about a splinter in your finger while both your legs are being ripped off by lion.ecco wrote:Also thank you both for the input... I think I'm going to shelf the caching mode idea for now (I suspect virtual machines ignore them anyways) and move on to testing SSE.
You need to keep track of which areas of the buffer have been modified since last time, and avoid re-writing parts in display memory that didn't change. There's lots of ways of doing this - "dirty rectangles" is one way, splitting the screen up into smaller pieces and having a flag for each piece is another way.
When you are doing something to keep track of which areas of the buffer have been modified, you can still improve performance further by maintaining a copy of display memory in RAM. Before every write to display memory, you check the copy of display memory in RAM to see if it can be skipped (and update both display memory and the copy in RAM if the write can't be avoided). This means (for e.g.) if someone replaces white pixels with white pixels the "new" white pixels aren't sent to display memory even though the area was modified.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Sluggish SVGA drivers
Thanks, that seems to be the consensus of all my searching... The SSE instructions seemed to make the biggest difference (Bochs went from the 78ms per frame down to around 43ms). I think it slowed down VMWare.... can't remember it was a blur of VGA register programming and browsing the Intel docs... Writing just the few bytes for the cursor does wonders ....
On a plus note my fault handler now prints the failed func via an elf symbol table lookup method much like price is right (find the closest address the doesn't "overbid").
On a plus note my fault handler now prints the failed func via an elf symbol table lookup method much like price is right (find the closest address the doesn't "overbid").
Re: Sluggish SVGA drivers
Blits the entire screen is extremely slow, even using MMX / SSE. Double Buffer without hardware acceleration has a "horrible performance. "
If you're on console, you can use Page Flipping (scrolling), which has a much better performance than Double Buffer. If you're already designing a GUI, DRS (Dirty Rectangle System) method is great.
When implementing GUI on my OS, I tested all three types mentioned above, both Double / Triple Buffering and Page Flipping was almost impossible to drag a window. DRS could have a great performance, even with the VESA driver without any hardware acceleration. I create 3 blit routines, a basic and others with MMX and SSE).
An alternative to all this is to create a driver that uses Bus Mastering (BitBlT).
About DRS, more information here:
http://www.allegro.cc/forums/thread/233953
http://www.allegro.cc/forums/thread/408604
If you're on console, you can use Page Flipping (scrolling), which has a much better performance than Double Buffer. If you're already designing a GUI, DRS (Dirty Rectangle System) method is great.
When implementing GUI on my OS, I tested all three types mentioned above, both Double / Triple Buffering and Page Flipping was almost impossible to drag a window. DRS could have a great performance, even with the VESA driver without any hardware acceleration. I create 3 blit routines, a basic and others with MMX and SSE).
An alternative to all this is to create a driver that uses Bus Mastering (BitBlT).
About DRS, more information here:
http://www.allegro.cc/forums/thread/233953
http://www.allegro.cc/forums/thread/408604
Re: Sluggish SVGA drivers
Hi,
It is possible to optimize some of the graphics routine to gain optimized speed.
Imagine this:
Now imagine this,
I use similar routines while drawing horizontal lines.The optimized code is more than 3 times faster because in the former, each putpixel routine takes time to calculate the offset to plot the pixel. Similar optimizations can be made on the other graphic routines as well.
Good Luck,
Chandra
It is possible to optimize some of the graphics routine to gain optimized speed.
Imagine this:
Code: Select all
void clear_screen(char red,char green,char blue)
{
int x_cord=0;
int y_cord=0;
for(y_cord=0;y_cord<=768;y_cord++)
{
for(x_cord=0;x_cord<=1024;x_cord++)
{
putpixel(x_cord,y_cord,red,green,blue);
}
}
}
Code: Select all
void clear_screen(char red,char green,char blue)
{
unsigned long color;
color=(red<<16)+(green<<8)+(blue); /* Assume 1024*768*24 bpp*/
unsigned long offset=0;
for(offset=0;offset<=786432;offset++)
{
frame_buffer[offset]=color;
}
}
Good Luck,
Chandra
Programming is not about using a language to solve a problem, it's about using logic to find a solution !