Page 1 of 1

GUI Optimization?

Posted: Fri Mar 30, 2007 8:07 pm
by pcmattman
I'm beginning to start my GUI development (again).

Can anyone reccomend any other method (other than memset) to set the memory that a pointer points to to a certain value? At the moment it's choking on the large amounts of blitting required and this basically brings the whole GUI to a halt.

Any other suggestions for optimization would be helpful too.

Posted: Fri Mar 30, 2007 8:25 pm
by carbonBased
You can't avoid blits, but you can avoid how much you blit. Try using dram buffers for some things.

For my first GUI I had a dram buffer per window, and only blit those regions from each window that had been updated. Alternatively, you could have a dram buffer for the entire screen, a blit from it.

Some cards may also support a hardware blitting feature, but I believe they only work from vram to vram, so you're still stuck with the bottleneck of the initial copy into vram, so the comments above still apply.

--Jeff

Re: GUI Optimization?

Posted: Sat Mar 31, 2007 2:55 am
by Brendan
Hi,
pcmattman wrote:Can anyone reccomend any other method (other than memset) to set the memory that a pointer points to to a certain value?
In general (for filling an area to a certain value) you'd use the widest register you can to write to each address in order. If the area isn't suitably aligned or is too small you'd do smaller writes at the beginning and end (where necessary) and still use the widest register you can.

For SSE/SSE2 you might also want to look into "non-temporal stores" (for e.g. the MOVNTDQ instruction) so that you don't pollute the cache (and so you do 8 bytes at a time). You'd probably also want to use some loop unrolling (where possible), so that (for e.g.) to fill 8192 bytes you'd have 32 MOVNTQ instructions inside a loop that is done 32 times (rather than one MOVNTQ instruction that is done 1024 times).

For systems without SSE2 you'd probably want to use "REP STOSD".

Of course if video memory is being filled/cleared you'd want to make sure the video driver does the filling/clearing, so that it can be done in hardware by the 2D accelerator (if/when the video driver and hardware supports it).
pcmattman wrote:At the moment it's choking on the large amounts of blitting required and this basically brings the whole GUI to a halt.
Does the code ever set the same pixel in display memory more than once (excluding hardware accelerated operations)? Does the code ever set a group of pixels in display memory to the same value they were before? If the answer to any of these is "yes", then you either need a double buffer in RAM or some way of keeping track of which pixels actually changed since last time.

Possibly the fastest (software only) full frame bit blit code uses a "line changed" flag for each screen line, a copy of display memory in RAM and a buffer for the new pixel data in RAM. When the buffer is blitted to the screen you'd do something like:

Code: Select all

    for(y = 0; y < ScreenY; y++) {
        if( lineState[y] = CHANGED) {
            pos1 = &buffer[y];
            pos2 = &oldState[y];
            pos3 = &screen[y];
            for(x = 0; x += 4; x < ScreenX) {
                if( *(pos1 + x) != *(pos2 + x) ) {
                    *(pos3 + x) = *(pos1 + x);
                    *(pos2 + x) = *(pos1 + x);
                }
            }
            lineState[y] = NOT_CHANGED;
        }
    }
This type of algorithm should be easy enough to adapt to SSE2 using the MASKMOVDQU instruction with a mask generated by something like the PCMPEQB instruction (with a PXOR to invert the mask because there doesn't seem to be a PCMPNEQB instruction).

The main problems here is keeping track of the "line changed" flags, and the amount of RAM you'd be using. For 1024 * 768 * 32 bpp you'd use up about 6 MB of RAM for a pair of video buffers.

@carbonBased: Most video cards support video -> video bit blits and system -> video bit blits. Newer cards probably also support bus mastering for system -> video bit blits.


Cheers,

Brendan

Posted: Sat Mar 31, 2007 6:30 pm
by pcmattman
The main problem I have is that in 320x200 the double buffer and the memory itself is 64 K... in other words, theres 64K cycles in the loops in memset to transfer memory. It clears the double buffer at the start of draw then blits it back to the screen after drawing functions are complete. If you really want to understand what I'm doing I followed the GUI development tutorial on osdever.net.

I'm trying to avoid VESA modes and anything remotely complex as I need to be able to test in Bochs (and Bochs doesn't have any of those sort of extensions :().

Posted: Sat Mar 31, 2007 7:28 pm
by slasher
Bochs has VESA extensions, am using VESA in my os and it works fine in Bochs,VMWare,QEMU and on real pc.

VESA is not complicated. The only problems is switching modes from PMode. you can tackle this either by writing a V86 layer or switching to real mode and back to pmode after calling VESA bios.

I use the V86 method and it works fine. Once you have VESA setup, you only need to gain access to the LFB and you are done.

For quick fills and other operations MMX/SSE/3DNOW!/SSE2 can really help if used wisely(haven't done this yet but I will)

Posted: Sat Mar 31, 2007 10:21 pm
by mystran
Btw, as it comes to performance, when using Bochs, it's pretty good idea to check that vga_update_interval (or whatever it is) is set to high enough value...

The smaller the value, the more often Bochs syncs VGA memory to it's window and this operations seems to be one of the slowest things in the whole emulator, so...

Basicly, it's good idea to make it as big as you can tolerate.

Posted: Sat Mar 31, 2007 10:48 pm
by pcmattman
Sounds good, thanks for all your advice. One final question: why does the mouse work so slow in Bochs but when I test on a real PC the mouse moves properly?

Posted: Sat Mar 31, 2007 11:10 pm
by mystran
If you decrease vga_update_interval, you'll notice your mouse will move faster... though your emulation will naturally slow down..

It's a tradeoff. The more often you copy screen, the more time it'll take, but if you don't copy often enough, it'll take some time before you see it update.

That's why I said "as big as you can tolerate." :)