Page 2 of 2

Re:info about write to video memery needed

Posted: Tue May 24, 2005 8:21 am
by Candy
GLneo wrote: wow, i did not know that, and all that time i spent optimizing might be making things slower, mabey i should trust gcc optimizers on simple things, thx :)
From the website you mentioned:
ran on a 486dx 33Mhz with 8MB of memory, 128KB cache, and a 16-bit ISA SVGA card.
If you still use those computers plus the compilers written in that age you would have improvement using your technique. If you however live today you quite probably don't have any use for any of them.

Donald E. Knuth: Premature optimization is the root of all evil.

Re:info about write to video memery needed

Posted: Tue May 24, 2005 9:31 am
by mystran
Besides... when (if) you optimize something, you are first supposed to profile your code to find the hotspot. Often they are not obvious. Once the hotspots have been found, you time the code. Then you try to optimize, then you time again. If you got it faster, great. Often you didn't and you should roll back to the previous version, because some other optimization might trigger a compiler optimization from the original non-optimized code (and often not from the supposedly optimized code).

If you did manage to get some cycles off, then you profile again in order to see whether it's still worth optimizing the original bottleneck.

Also some "optimization advice": the most powerful optimization when programming C++ is to declare as much as possible as "const". That includes both parameters and methods (and functions if you use those). When a parameter is const, the compiler can optimize more agressively based on the fact. When a method is also const, the compiler can even cache results (because the result constant).

I've personally optimized some speed critical vector calculus code (which was mostly inlined already!) by doing nothing but adding the relevant const-modifiers, and got about 10%-15% speedup. And that was a few GCC versions ago. =)

Re:info about write to video memery needed

Posted: Wed May 25, 2005 6:03 am
by GLneo
so for every code opti. i should test some thing like:

Code: Select all

asm("rdtsc\n":"=a"(bl), "=d"(bh):);
// code
asm("rdtsc\n":"=a"(el), "=d"(eh):);
print(el - bl);

???

Re:info about write to video memery needed

Posted: Thu May 26, 2005 10:53 am
by dh
GLneo wrote: wow, i did not know that, and all that time i spent optimizing might be making things slower, mabey i should trust gcc optimizers on simple things, thx :)
A good rule of thumb is to write as it comes to you and optamize later. This can make for messy code, but that's why one should use comments carefully.

I find that when working with a limited language (eg. Visual Bacon), you worry less about optamization, that is, until your done with a wopping 14 MB exe file that can be done in 2 MB with C. It makes you stop and think: "ya, maybe that extra code was stupid". Moral of the story: make it work, then make it fast, and after that (if possible) make it small.

Re:info about write to video memery needed

Posted: Thu May 26, 2005 1:05 pm
by GLneo
well at least the code its-self is faster w/o optimizers;
a wopping 14 MB exe file that can be done in 2 MB with C
14 MB in vbacon = 82 kB in c = 1 kB in asm ::)

O.K. back on topic, Brendan, do you have any more info on your 12h mode pixel func. thx ;)

Re:info about write to video memery needed

Posted: Fri May 27, 2005 3:27 am
by Brendan
Hi,
GLneo wrote:O.K. back on topic, Brendan, do you have any more info on your 12h mode pixel func. thx ;)
Sorry - forgot I said I'd post more when I got home :)

First, you'd need to set the bit mask register to allow all bits of all writes to go to video memory. This is VGA graphics controller register #8, and is done like this:

Code: Select all

   mov dx,0x03ce            ;Set bit mask register
   mov ax,0xff08
   out dx,ax
   
It only needs to be done once depending on how you write to display memory (I do it just after switching to video mode 0x12).

To select which bit plane/s you're writing to there's the "map mask" register - Sequencer register #2. This register contains 4 bits where each bit corresponds to a bit plane, 0 = disabled and 1 = enabled. If you set this register to 0x0F and write to video display memory then you'd be writing to all planes at the same time. For example, to set 16 pixels to white at the top left of the screen you could do:

Code: Select all

   mov bx,0xa000
   mov es,bx            ;es = segment for display memory

   mov ax,0x0F02            ;ax = map mask register for plane 0
   mov dx,0x03c4            ;dx = IO port for map mask register
   out dx,ax            ;Set map mask register

   mov [es:0],0xFFFF
The problem here is that this only works for writing and not for reading. If you tried to do "or byte [es:0],1" I'm not sure exactly what would be or'ed with 1 (I think it depends on the contents of the "Read Map Select Register", which is register #4 of the Graphics Controller Registers). In any case you can only read from one plane, even though you can write to more than one.

To write a single pixel you can set the bit mask register to the bit that corresponds to your pixel, and then set the map mask register to 0x0F, then write a 0x00 (to make the pixel black), then set the map mask register to the colour you want and write a 0xFF to video to set the pixel to the correct colour.

Anyway, regardless of what you do writing a pixel at a time to video display memory is completely slow because it requires several IO port instructions, some bit shifting and masking, etc. I couldn't find any way to get acceptable performance.

To solve this I use a double buffer, where each plane is buffered in memory and I can read/write without messing about with slow IO ports. To avoid writing to display memory when I don't need to I also use a "video state buffer". The idea is that the video state buffer contains what was sent to the screen last time, so I can compare the new data (in the buffer) with the old data (in the state buffer) and only write to display memory if the new data is different. The code I use to blit the buffer to display memory is:

Code: Select all

blitBuffer:
   pushes eax,edx,esi,edi
   mov ax,0xa000
   clr esi
   mov dx,0x03c4            ;dx = IO port for map mask register
   mov es,ax            ;es = segment for display memory

   mov ax,0x0102            ;ax = map mask register for plane 0
   call .doPlane
   mov ax,0x0202            ;ax = map mask register for plane 1
   call .doPlane
   mov ax,0x0402            ;ax = map mask register for plane 2
   call .doPlane
   mov ax,0x0802            ;ax = map mask register for plane 3
   call .doPlane
   sti
   mov ax,cs
   mov es,ax
   pops eax,edx,esi,edi
   ret

.doPlane:
   out dx,ax            ;Set map mask register
   clr di               ;es:di = address of display memory

   cli

.l1:   mov eax,[gs:esi+PMEMVIDbufferAddress]   ;eax = new data
   cmp eax,[gs:esi+PMEMVIDstateAddress]   ;Has this dword changed?
   je .l2               ; no, don't update display memory
   stosd               ; yes, update display memory
   mov [gs:esi+PMEMVIDstateAddress],eax   ;      and the state buffer
   add esi,4
   cmp di,640*480/8
   jb .l1
   ret

.l2:   add di,4
   add esi,4
   cmp di,640*480/8
   jb .l1
   ret

For this code I get very good performance (even on Bochs), but it costs 300 KB to do it (150 KB for the buffer and another 150 KB for the state). The reason I get good performance is that I'm only doing 4 IO port writes per screen update, and only writing to (slower) display memory when necessary. For example, if the screen is filled with a blue background (colour 0001b) and I draw a white "window" (colour 1111b), then plane 0 doesn't change at all.

I've attached the code to set a single pixel in the buffer. All the code is designed for real mode where GS has been set to "flat" (base = 0 and limit = 4 GB) - it's actually part of my boot menu's code.

I've also got more code to draw horizontal & vertical lines, draw 8*8 and 8*16 characters, etc but the code to plot a pixel should give you the idea...


Cheers,

Brendan