I'm beginning to start my GUI development (again).
Can anyone reccomend any other method (other than memset) to set the memory that a pointer points to to a certain value? At the moment it's choking on the large amounts of blitting required and this basically brings the whole GUI to a halt.
Any other suggestions for optimization would be helpful too.
GUI Optimization?
- carbonBased
- Member
- Posts: 382
- Joined: Sat Nov 20, 2004 12:00 am
- Location: Wellesley, Ontario, Canada
- Contact:
You can't avoid blits, but you can avoid how much you blit. Try using dram buffers for some things.
For my first GUI I had a dram buffer per window, and only blit those regions from each window that had been updated. Alternatively, you could have a dram buffer for the entire screen, a blit from it.
Some cards may also support a hardware blitting feature, but I believe they only work from vram to vram, so you're still stuck with the bottleneck of the initial copy into vram, so the comments above still apply.
--Jeff
For my first GUI I had a dram buffer per window, and only blit those regions from each window that had been updated. Alternatively, you could have a dram buffer for the entire screen, a blit from it.
Some cards may also support a hardware blitting feature, but I believe they only work from vram to vram, so you're still stuck with the bottleneck of the initial copy into vram, so the comments above still apply.
--Jeff
Re: GUI Optimization?
Hi,
For SSE/SSE2 you might also want to look into "non-temporal stores" (for e.g. the MOVNTDQ instruction) so that you don't pollute the cache (and so you do 8 bytes at a time). You'd probably also want to use some loop unrolling (where possible), so that (for e.g.) to fill 8192 bytes you'd have 32 MOVNTQ instructions inside a loop that is done 32 times (rather than one MOVNTQ instruction that is done 1024 times).
For systems without SSE2 you'd probably want to use "REP STOSD".
Of course if video memory is being filled/cleared you'd want to make sure the video driver does the filling/clearing, so that it can be done in hardware by the 2D accelerator (if/when the video driver and hardware supports it).
Possibly the fastest (software only) full frame bit blit code uses a "line changed" flag for each screen line, a copy of display memory in RAM and a buffer for the new pixel data in RAM. When the buffer is blitted to the screen you'd do something like:
This type of algorithm should be easy enough to adapt to SSE2 using the MASKMOVDQU instruction with a mask generated by something like the PCMPEQB instruction (with a PXOR to invert the mask because there doesn't seem to be a PCMPNEQB instruction).
The main problems here is keeping track of the "line changed" flags, and the amount of RAM you'd be using. For 1024 * 768 * 32 bpp you'd use up about 6 MB of RAM for a pair of video buffers.
@carbonBased: Most video cards support video -> video bit blits and system -> video bit blits. Newer cards probably also support bus mastering for system -> video bit blits.
Cheers,
Brendan
In general (for filling an area to a certain value) you'd use the widest register you can to write to each address in order. If the area isn't suitably aligned or is too small you'd do smaller writes at the beginning and end (where necessary) and still use the widest register you can.pcmattman wrote:Can anyone reccomend any other method (other than memset) to set the memory that a pointer points to to a certain value?
For SSE/SSE2 you might also want to look into "non-temporal stores" (for e.g. the MOVNTDQ instruction) so that you don't pollute the cache (and so you do 8 bytes at a time). You'd probably also want to use some loop unrolling (where possible), so that (for e.g.) to fill 8192 bytes you'd have 32 MOVNTQ instructions inside a loop that is done 32 times (rather than one MOVNTQ instruction that is done 1024 times).
For systems without SSE2 you'd probably want to use "REP STOSD".
Of course if video memory is being filled/cleared you'd want to make sure the video driver does the filling/clearing, so that it can be done in hardware by the 2D accelerator (if/when the video driver and hardware supports it).
Does the code ever set the same pixel in display memory more than once (excluding hardware accelerated operations)? Does the code ever set a group of pixels in display memory to the same value they were before? If the answer to any of these is "yes", then you either need a double buffer in RAM or some way of keeping track of which pixels actually changed since last time.pcmattman wrote:At the moment it's choking on the large amounts of blitting required and this basically brings the whole GUI to a halt.
Possibly the fastest (software only) full frame bit blit code uses a "line changed" flag for each screen line, a copy of display memory in RAM and a buffer for the new pixel data in RAM. When the buffer is blitted to the screen you'd do something like:
Code: Select all
for(y = 0; y < ScreenY; y++) {
if( lineState[y] = CHANGED) {
pos1 = &buffer[y];
pos2 = &oldState[y];
pos3 = &screen[y];
for(x = 0; x += 4; x < ScreenX) {
if( *(pos1 + x) != *(pos2 + x) ) {
*(pos3 + x) = *(pos1 + x);
*(pos2 + x) = *(pos1 + x);
}
}
lineState[y] = NOT_CHANGED;
}
}
The main problems here is keeping track of the "line changed" flags, and the amount of RAM you'd be using. For 1024 * 768 * 32 bpp you'd use up about 6 MB of RAM for a pair of video buffers.
@carbonBased: Most video cards support video -> video bit blits and system -> video bit blits. Newer cards probably also support bus mastering for system -> video bit blits.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 2566
- Joined: Sun Jan 14, 2007 9:15 pm
- Libera.chat IRC: miselin
- Location: Sydney, Australia (I come from a land down under!)
- Contact:
The main problem I have is that in 320x200 the double buffer and the memory itself is 64 K... in other words, theres 64K cycles in the loops in memset to transfer memory. It clears the double buffer at the start of draw then blits it back to the screen after drawing functions are complete. If you really want to understand what I'm doing I followed the GUI development tutorial on osdever.net.
I'm trying to avoid VESA modes and anything remotely complex as I need to be able to test in Bochs (and Bochs doesn't have any of those sort of extensions
).
I'm trying to avoid VESA modes and anything remotely complex as I need to be able to test in Bochs (and Bochs doesn't have any of those sort of extensions
![Sad :(](./images/smilies/icon_sad.gif)
Bochs has VESA extensions, am using VESA in my os and it works fine in Bochs,VMWare,QEMU and on real pc.
VESA is not complicated. The only problems is switching modes from PMode. you can tackle this either by writing a V86 layer or switching to real mode and back to pmode after calling VESA bios.
I use the V86 method and it works fine. Once you have VESA setup, you only need to gain access to the LFB and you are done.
For quick fills and other operations MMX/SSE/3DNOW!/SSE2 can really help if used wisely(haven't done this yet but I will)
VESA is not complicated. The only problems is switching modes from PMode. you can tackle this either by writing a V86 layer or switching to real mode and back to pmode after calling VESA bios.
I use the V86 method and it works fine. Once you have VESA setup, you only need to gain access to the LFB and you are done.
For quick fills and other operations MMX/SSE/3DNOW!/SSE2 can really help if used wisely(haven't done this yet but I will)
Btw, as it comes to performance, when using Bochs, it's pretty good idea to check that vga_update_interval (or whatever it is) is set to high enough value...
The smaller the value, the more often Bochs syncs VGA memory to it's window and this operations seems to be one of the slowest things in the whole emulator, so...
Basicly, it's good idea to make it as big as you can tolerate.
The smaller the value, the more often Bochs syncs VGA memory to it's window and this operations seems to be one of the slowest things in the whole emulator, so...
Basicly, it's good idea to make it as big as you can tolerate.
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.
If you decrease vga_update_interval, you'll notice your mouse will move faster... though your emulation will naturally slow down..
It's a tradeoff. The more often you copy screen, the more time it'll take, but if you don't copy often enough, it'll take some time before you see it update.
That's why I said "as big as you can tolerate."![Smile :)](./images/smilies/icon_smile.gif)
It's a tradeoff. The more often you copy screen, the more time it'll take, but if you don't copy often enough, it'll take some time before you see it update.
That's why I said "as big as you can tolerate."
![Smile :)](./images/smilies/icon_smile.gif)
The real problem with goto is not with the control transfer, but with environments. Properly tail-recursive closures get both right.