Page 1 of 2
Optimising 4-bpp Modes (was: What does your OS look like?)
Posted: Sun Dec 30, 2012 9:14 pm
by shikhin
Hi,
farlepet wrote:(my 640x480x4bpp driver is wayyyy to slow for this as the boot screen right now)
Yeah, 4bpp planar mode is pretty difficult to get fast, without all those tricky optimizations.
Regards,
Shikhin
Re: What does your OS look like? (Screen Shots..)
Posted: Sun Dec 30, 2012 11:56 pm
by farlepet
By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
Re: What does your OS look like? (Screen Shots..)
Posted: Mon Dec 31, 2012 12:42 am
by bluemoon
farlepet wrote:By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
That would not work very well on modern CPU due to dependency stalls, unless you greatly unroll the operation - in such case you rather do SSE.
Re: What does your OS look like? (Screen Shots..)
Posted: Sun Jan 06, 2013 5:24 am
by Combuster
farlepet wrote:By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
A random putpixel in 4bpp mode costs a port out, which is horribly slow. An algorithm aware of this problem is designed to minimize those by combining the video writes with the same GC settings so that you can write more than one pixel per OUT - sometimes at the cost of some other logic. It does in all cases mean that your essentially need to implement the same code paths as you would for hardware accellerated graphics.
Re: What does your OS look like? (Screen Shots..)
Posted: Sun Jan 06, 2013 12:10 pm
by farlepet
Combuster wrote:A random putpixel in 4bpp mode costs a port out, which is horribly slow. An algorithm aware of this problem is designed to minimize those by combining the video writes with the same GC settings so that you can write more than one pixel per OUT - sometimes at the cost of some other logic. It does in all cases mean that your essentially need to implement the same code paths as you would for hardware accellerated graphics.
I was thinking of doing one color bit at a time, like going through the image 4 times, taking the 1st bit the 1st time, the 2nd bit the second time, etc.. so I would only have 16 outb's per image.
Re: What does your OS look like? (Screen Shots..)
Posted: Sun Jan 06, 2013 3:36 pm
by Combuster
16 outs seems like a lot for a direct ram-to-vram blit. There are four planes, and selecting a different plane only costs one out...
Re: What does your OS look like? (Screen Shots..)
Posted: Tue Jan 08, 2013 7:20 pm
by farlepet
Combuster wrote:16 outs seems like a lot for a direct ram-to-vram blit. There are four planes, and selecting a different plane only costs one out...
I may be using the wrong code then, because when I set a plane, i use this:
Code: Select all
inline void set_plane(int p)
{
p &= 3;
outb(VGA_GC_INDEX, 4);
outb(VGA_GC_DATA, p);
outb(VGA_SEQ_INDEX, 2);
outb(VGA_SEQ_DATA, 1 << p);
}
Re: What does your OS look like? (Screen Shots..)
Posted: Tue Jan 08, 2013 10:17 pm
by Brendan
Hi,
farlepet wrote:I may be using the wrong code then, because when I set a plane, i use this:
Code: Select all
inline void set_plane(int p)
{
p &= 3;
outb(VGA_GC_INDEX, 4);
outb(VGA_GC_DATA, p);
outb(VGA_SEQ_INDEX, 2);
outb(VGA_SEQ_DATA, 1 << p);
}
You should never need to read from display memory (reads are very slow, and if you actually need to read at all then it's always better/faster to read from a buffer in RAM); so the first 2 "OUT"s aren't needed.
For the second pair of "OUT"s, you can do "outb(VGA_SEQ_INDEX, 2);" once (e.g. just after setting the video mode) and it will stay set. You don't need to set it each time you change planes.
Basically, blitting your buffer to display memory may become:
Code: Select all
void blit_4bpp(void *src) {
int p;
for(p = 1; p <= 8; p << 1) {
outb(VGA_SEQ_DATA, p); // Select the plane
memcpy(videoDisplayMemory, src, bytesPerLine*verticalResolution); // Blit the plane
}
}
Of course this assumes that the "bytesPerLine" is the same as "bytesBetweenLines" (which is the case for VGA modes, but may not be the case for VBE modes).
A more complete blit might be:
Code: Select all
void blit_4bpp(void *src) {
int p;
void *dest;
int y;
for(p = 1; p <= 8; p << 1) {
outb(VGA_SEQ_DATA, p);
dest = videoDisplayMemory;
for(y = 0; y < verticalResolution; y++) {
if(lineChangedFlags[y] != 0) {
// Horizontal line of pixels did change since last time
memcpy(dest, src, bytesPerLine);
src += bytesPerLine;
dest += bytesBetweenLines;
}
}
}
for(y = 0; y < verticalResolution; y++) {
lineChangedFlags[y] = 0;
}
}
Cheers,
Brendan
Re: What does your OS look like? (Screen Shots..)
Posted: Wed Jan 09, 2013 4:28 pm
by farlepet
Sorry if i should create a new topic about this, tell me if i should...
Brendan wrote:You should never need to read from display memory (reads are very slow, and if you actually need to read at all then it's always better/faster to read from a buffer in RAM); so the first 2 "OUT"s aren't needed.
For the second pair of "OUT"s, you can do "outb(VGA_SEQ_INDEX, 2);" once (e.g. just after setting the video mode) and it will stay set. You don't need to set it each time you change planes.
I tried doing this:
Code: Select all
#define do_vga_bit(beg, bit, val) ((val) ? (*beg = (*beg | (0x80 >> bit))) : (*beg = (*beg & ~(0x80 >> bit))))
void _4BPP(int x, int y, int c)
{
if(x > VGA_width || y > VGA_height) return;
int pix = x & 0x07; x>>=3;
char *loc = (char *)(0xA0000 + x + (y * (90/*VGA_width >> 3*/)));
outb(VGA_SEQ_DATA, 1); do_vga_bit(loc, pix, c & 1);
outb(VGA_SEQ_DATA, 2); do_vga_bit(loc, pix, c & 2);
outb(VGA_SEQ_DATA, 4); do_vga_bit(loc, pix, c & 4);
outb(VGA_SEQ_DATA, 8); do_vga_bit(loc, pix, c & 8);
}
but i seem to need to set the read plane for it to work. (probably because I'm and-ing/or-ing the existing memory, i dont know any other way other than a buffer...) When displaying the BMP, it gives me this instead:
https://www.dropbox.com/s/d12qcobhhry47 ... 1%3A34.png
Is there another way to do this?
Re: Optimising 4-bpp Modes (was: What does your OS look like
Posted: Thu Jan 10, 2013 1:06 am
by Combuster
Well, an or/and is a read from VRAM and thus suboptimal - especially when you use it in cases where there's no reason to know the original content - f.x. when you are setting all 8 consecutive pixels of a character cell.
In the remaining case, using the VGA latch mechanics can save you from selecting the read plane. Or you can make life easy and double buffer in system RAM and only send dirty blocks to the VGA card.
Bottom line, optimizing graphics is an art of its own
(
Try learning it from the master)
Re: Optimising 4-bpp Modes (was: What does your OS look like
Posted: Thu Jan 10, 2013 6:30 am
by Antti
I have a little question that does not need another thread. Currently I have planned something like this (BIOS booting):
Code: Select all
if vbe extensions available
if suitable mode found
jump forward (everything perfectly OK)
else suitable mode not found AND hardware level VGA compatibility
jump set_12h
else assume hardware level VGA compatibility
set_12h: (640x480x16)
try to set 12h (BIOS function)
if not failed (how to check?)
jump forward (everything OK, use "VGA registers and planes")
set_11h: (640x480x1)
try to set 11h (BIOS function)
if not failed (how to check?)
jump forward (everything OK, just use "A0000 LFB")
forward:
Video mode is set. Start the correct "driver" that can do "character draws, line draws, putpixel etc."
Are there any pitfalls that should be taken into account? At least I should develop a better way to do pseudocode.
Re: Optimising 4-bpp Modes (was: What does your OS look like
Posted: Thu Jan 10, 2013 6:57 am
by Combuster
Well the first conditional block as written is equivalent to an if(vbe) use_vbe(); else always_assume_vga();, which is probably different from what you actually had intended it to be...
Right now my setup looks for class code 03:00:00, and if no PCI video controllers exist then it'll try probing a few registers. I have yet to observe an instance where the alternative path is taken - I don't have a truly headless system, nor do I have PCI video equipment that is non-VGA compatible, so in all cases the VGA driver can get loaded if none of my other drivers take precedence.
As for error checking, BIOS calls should be setting/clearing CF, if that helps.
Re: Optimising 4-bpp Modes (was: What does your OS look like
Posted: Thu Jan 10, 2013 7:22 am
by Antti
Combuster wrote:Well the first conditional block as written is equivalent to an if(vbe) use_vbe(); else always_assume_vga();, which is probably different from what you actually had intended it to be...
I wrote it like I intended it to be. If there is no VBE, I assume that the hardware is compatible with "port IOs required for setting planes etc."
Combuster wrote:As for error checking, BIOS calls should be setting/clearing CF, if that helps.
Yes, I was aware of that. However, it does not sound very reliable here. Ralf Brown did not list that for INT 0x10, AH=0x00.
Re: Optimising 4-bpp Modes (was: What does your OS look like
Posted: Thu Jan 10, 2013 9:52 am
by Gigasoft
Double buffering might make things slower depending on what you're drawing. If it mostly consists of solid color fills and text, and there is not much overlapping, you'll do better without it. In my system I use double buffering, but I'll probably making it optional so that it's only used by applications that need it (such as Solitaire).
16 port writes to upload an image (or 5 when left and right edges are both divisible by
seem to be the minimum. The latches are not useful for this kind of operation.
Re: Optimising 4-bpp Modes (was: What does your OS look like
Posted: Thu Jan 10, 2013 11:59 am
by Combuster
That wasn't too hard to beat with latches.
Code: Select all
bit_mask = 0xff >> (left_offset % 8)
outw(VGA_GC, VGA_GC_BITMASK | bit_mask << 8);
outw(VGA_SEQ, VGA_SEQ_PLANE | 0x0100);
// for each character cell, perform read, then write bitmap data for plane 0 on left edge
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 8)
// write similarly for plane 3
bit_mask = (0xff << (right_offset % 8)) & 0xff;
outb(VGA_GC+1, bit_mask);
// for each character cell, perform read, then write bitmap data for plane *3* on *right* edge
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 1)
// write similarly for plane 0
outb(VGA_GC+1, 0xff); // restore default value. Probably useful for next round
// write all pixels not on an edge for plane 0. reads not needed
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 8)
// write similarly for plane 3
--------------------------------
13 outs needed worst case.
P.S. if you look good at one of the tricks, you'll know that aligned writes only take 4 outs, and by extension, the default way to cost 8 outs...