My relatively simple SVGA driver (for my microkernel system) has become a bottleneck for graphics operations, and I want to try to optimize it. Here's the situation. The SVGA driver has a shared memory region containing a 32-bit RGB(A) (where the A is ignored) buffer of all pixels, which the process writes to directly. The process sends a message telling the SVGA driver to flip a specified rectangle, which causes the driver to do so. Based on my measurements that the latency of flipping a rectangle is nearly proportional to the area of the rectangle, the message passing is not a bottleneck. So, I'm instead focusing on optimizing things within the driver.
Internally, the SVGA driver uses a generic putpixel function that converts a pixel value from the shared memory region to a pixel value appropriate for the given mode (only direct color is supported at the moment) and then writes it to the buffer (either using linear or paged addressing.) This function is called by the rectangle-flipping function in a loop. The relevant code is here: https://github.com/nickbjohnson4224/rho ... vga/svga.c, in the svga_putbyte, svga_putpixel, and svga_fliprect functions.
What sort of internal optimizations might benefit this setup?
SVGA driver optimizations
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
- thepowersgang
- Member
- Posts: 734
- Joined: Tue Dec 25, 2007 6:03 am
- Libera.chat IRC: thePowersGang
- Location: Perth, Western Australia
- Contact:
Re: SVGA driver optimizations
Well, unrolling putpixel in fliprect will speed things up a bit (removing the need to recalculate the index each time)
Designing your loop to reduce bank switches could be an idea too (but from a look, it seems that would not be a problem)
Designing your loop to reduce bank switches could be an idea too (but from a look, it seems that would not be a problem)
Kernel Development, It's the brain surgery of programming.
Acess2 OS (c) | Tifflin OS (rust) | mrustc - Rust compiler
Currently Working on: mrustc
Acess2 OS (c) | Tifflin OS (rust) | mrustc - Rust compiler
Currently Working on: mrustc
Re: SVGA driver optimizations
Have you looked at what assembly the compiler is emitting at -O3? That will give you an idea of where it can't optimise well.NickJohnson wrote:My relatively simple SVGA driver (for my microkernel system) has become a bottleneck for graphics operations, and I want to try to optimize it. Here's the situation. The SVGA driver has a shared memory region containing a 32-bit RGB(A) (where the A is ignored) buffer of all pixels, which the process writes to directly. The process sends a message telling the SVGA driver to flip a specified rectangle, which causes the driver to do so. Based on my measurements that the latency of flipping a rectangle is nearly proportional to the area of the rectangle, the message passing is not a bottleneck. So, I'm instead focusing on optimizing things within the driver.
Internally, the SVGA driver uses a generic putpixel function that converts a pixel value from the shared memory region to a pixel value appropriate for the given mode (only direct color is supported at the moment) and then writes it to the buffer (either using linear or paged addressing.) This function is called by the rectangle-flipping function in a loop. The relevant code is here: https://github.com/nickbjohnson4224/rho ... vga/svga.c, in the svga_putbyte, svga_putpixel, and svga_fliprect functions.
What sort of internal optimizations might benefit this setup?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: SVGA driver optimizations
I was going to do that next; I just wanted to know if there were any architectural/algorithm optimizations that could be made. For example, would it be beneficial to track which pixels have changed since the last flip, so that the minimum number of writes to video memory are made, or would that cause too much overhead?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: SVGA driver optimizations
You have a potential copy too much - if the actual video mode is linear RGBA, you can instead try to share VRAM itself - you can even share a framebuffer (either VRAM or in shared memory) if it's not in R8G8B8A8 format if the application can deal with it: it saves you from requiring the intelligence of converting colours altogether: static graphics can be preconverted in the correct format at load time, after which you will need little more than a blitter per bpp rather than per format.
Having the user provide dirty rectangles instead of just flipping the entire screen can save you quite a bit in GUI scenarios but will be of little help with visually rich scenarios where most of the screen gets updated anyway and where the first suggestion can prove much more effective. Note that actually accessing each pixel for comparison will cost you an amount that's likely on par with just writing it anyway.
Having the user provide dirty rectangles instead of just flipping the entire screen can save you quite a bit in GUI scenarios but will be of little help with visually rich scenarios where most of the screen gets updated anyway and where the first suggestion can prove much more effective. Note that actually accessing each pixel for comparison will cost you an amount that's likely on par with just writing it anyway.
Re: SVGA driver optimizations
+1 to berkus. Using a rectangle eliminates many expensive call to putpixel, and avoids recalculation of index too. This is what X11 does for example, and you can't say it's slow.
Modifying the MTRR would be also a good idea.
Alternatively you can use two triangles if you have an OpenGL driver.
Modifying the MTRR would be also a good idea.
Alternatively you can use two triangles if you have an OpenGL driver.
Re: SVGA driver optimizations
Hi,
Some general notes...
a) Allowing applications to access a pixel buffer directly fails as soon as the application is spread across multiple video drivers/monitors, and makes it virtually impossible for any video driver to (eventually) support any hardware acceleration effectively. It also makes it impossible to do things like remote desktop (where a "pseudo driver" pretends it's a video driver while sending a description of what to draw to a remote computer) efficiently - for example, to draw a single large white rectangle you're be looking at several MiB of network traffic rather than about 20 bytes. Finally, you can't make things easy for applications by abstracting away low levels details (including things like resolution independence and pixel format independence).
b) Anything that uses a "putpixel()" routine needs to be optimised until it doesn't. You should also consider using function pointers. For an example, you might have a function pointer that points to a "draw rectangle" function, and 5 different "draw rectangle" functions (one for each supported pixel format); where the code that changes video modes also changes the function pointers so they point to the right functions for the pixel format.
c) Dirty rectangles are good until you get too many dirty rectangles and have to spend all your CPU time trying to detect and handle overlapping rectangles. There's simpler methods that are O(1); like having a "needs to be updated" flag for each group of pixels; which are slower for a smaller number of changes (where performance doesn't matter so much) but faster for a larger number of changes (where performance matters a lot more). With a good video driver interface, it shouldn't matter how the video driver handles this internally (e.g. a video driver could support 5 different methods of minimising writes to display memory - dirty rectangles, "needs to be updated" flags, a "just blit everything" method, etc), and the applications shouldn't know or care which of these methods the video driver happens to be using at any given time.
Cheers,
Brendan
Some general notes...
a) Allowing applications to access a pixel buffer directly fails as soon as the application is spread across multiple video drivers/monitors, and makes it virtually impossible for any video driver to (eventually) support any hardware acceleration effectively. It also makes it impossible to do things like remote desktop (where a "pseudo driver" pretends it's a video driver while sending a description of what to draw to a remote computer) efficiently - for example, to draw a single large white rectangle you're be looking at several MiB of network traffic rather than about 20 bytes. Finally, you can't make things easy for applications by abstracting away low levels details (including things like resolution independence and pixel format independence).
b) Anything that uses a "putpixel()" routine needs to be optimised until it doesn't. You should also consider using function pointers. For an example, you might have a function pointer that points to a "draw rectangle" function, and 5 different "draw rectangle" functions (one for each supported pixel format); where the code that changes video modes also changes the function pointers so they point to the right functions for the pixel format.
c) Dirty rectangles are good until you get too many dirty rectangles and have to spend all your CPU time trying to detect and handle overlapping rectangles. There's simpler methods that are O(1); like having a "needs to be updated" flag for each group of pixels; which are slower for a smaller number of changes (where performance doesn't matter so much) but faster for a larger number of changes (where performance matters a lot more). With a good video driver interface, it shouldn't matter how the video driver handles this internally (e.g. a video driver could support 5 different methods of minimising writes to display memory - dirty rectangles, "needs to be updated" flags, a "just blit everything" method, etc), and the applications shouldn't know or care which of these methods the video driver happens to be using at any given time.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: SVGA driver optimizations
sounds like you are talking about a bliter, bob, tile, sprite or even a few other things, depending on a few details. A reasonably good page on sprites: http://www.nondot.org/sabre/graphpro/index_sprite.html