Optimising 4-bpp Modes (was: What does your OS look like?)

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
shikhin
Member
Member
Posts: 274
Joined: Sat Oct 09, 2010 3:35 am
Libera.chat IRC: shikhin
Contact:

Optimising 4-bpp Modes (was: What does your OS look like?)

Post by shikhin »

Hi,
farlepet wrote:(my 640x480x4bpp driver is wayyyy to slow for this as the boot screen right now)
Yeah, 4bpp planar mode is pretty difficult to get fast, without all those tricky optimizations.

Regards,
Shikhin
http://shikhin.in/

Current status: Gandr.
User avatar
farlepet
Posts: 13
Joined: Sun Dec 30, 2012 12:52 pm

Re: What does your OS look like? (Screen Shots..)

Post by farlepet »

By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
Lambda OS:
GitHub: https://github.com/farlepet
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: What does your OS look like? (Screen Shots..)

Post by bluemoon »

farlepet wrote:By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
That would not work very well on modern CPU due to dependency stalls, unless you greatly unroll the operation - in such case you rather do SSE.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: What does your OS look like? (Screen Shots..)

Post by Combuster »

farlepet wrote:By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
A random putpixel in 4bpp mode costs a port out, which is horribly slow. An algorithm aware of this problem is designed to minimize those by combining the video writes with the same GC settings so that you can write more than one pixel per OUT - sometimes at the cost of some other logic. It does in all cases mean that your essentially need to implement the same code paths as you would for hardware accellerated graphics.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
farlepet
Posts: 13
Joined: Sun Dec 30, 2012 12:52 pm

Re: What does your OS look like? (Screen Shots..)

Post by farlepet »

Combuster wrote:A random putpixel in 4bpp mode costs a port out, which is horribly slow. An algorithm aware of this problem is designed to minimize those by combining the video writes with the same GC settings so that you can write more than one pixel per OUT - sometimes at the cost of some other logic. It does in all cases mean that your essentially need to implement the same code paths as you would for hardware accellerated graphics.
I was thinking of doing one color bit at a time, like going through the image 4 times, taking the 1st bit the 1st time, the 2nd bit the second time, etc.. so I would only have 16 outb's per image.
Lambda OS:
GitHub: https://github.com/farlepet
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: What does your OS look like? (Screen Shots..)

Post by Combuster »

16 outs seems like a lot for a direct ram-to-vram blit. There are four planes, and selecting a different plane only costs one out...
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
farlepet
Posts: 13
Joined: Sun Dec 30, 2012 12:52 pm

Re: What does your OS look like? (Screen Shots..)

Post by farlepet »

Combuster wrote:16 outs seems like a lot for a direct ram-to-vram blit. There are four planes, and selecting a different plane only costs one out...
I may be using the wrong code then, because when I set a plane, i use this:

Code: Select all

inline void set_plane(int p)
{
	p &= 3;
	outb(VGA_GC_INDEX, 4);
	outb(VGA_GC_DATA, p);
	outb(VGA_SEQ_INDEX, 2);
	outb(VGA_SEQ_DATA, 1 << p);
}
Lambda OS:
GitHub: https://github.com/farlepet
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: What does your OS look like? (Screen Shots..)

Post by Brendan »

Hi,
farlepet wrote:I may be using the wrong code then, because when I set a plane, i use this:

Code: Select all

inline void set_plane(int p)
{
	p &= 3;
	outb(VGA_GC_INDEX, 4);
	outb(VGA_GC_DATA, p);
	outb(VGA_SEQ_INDEX, 2);
	outb(VGA_SEQ_DATA, 1 << p);
}
You should never need to read from display memory (reads are very slow, and if you actually need to read at all then it's always better/faster to read from a buffer in RAM); so the first 2 "OUT"s aren't needed.

For the second pair of "OUT"s, you can do "outb(VGA_SEQ_INDEX, 2);" once (e.g. just after setting the video mode) and it will stay set. You don't need to set it each time you change planes.

Basically, blitting your buffer to display memory may become:

Code: Select all

void blit_4bpp(void *src) {
    int p;

    for(p = 1; p <= 8; p << 1) {
        outb(VGA_SEQ_DATA, p);                                           // Select the plane
        memcpy(videoDisplayMemory, src, bytesPerLine*verticalResolution); // Blit the plane
    }
}
Of course this assumes that the "bytesPerLine" is the same as "bytesBetweenLines" (which is the case for VGA modes, but may not be the case for VBE modes).

A more complete blit might be:

Code: Select all

void blit_4bpp(void *src) {
    int p;
    void *dest;
    int y;

    for(p = 1; p <= 8; p << 1) {
        outb(VGA_SEQ_DATA, p);
        dest = videoDisplayMemory;
        for(y = 0; y < verticalResolution; y++) {
            if(lineChangedFlags[y] != 0) {
                // Horizontal line of pixels did change since last time
                memcpy(dest, src, bytesPerLine);
                src += bytesPerLine;
                dest += bytesBetweenLines;
            }
        }
    }
    for(y = 0; y < verticalResolution; y++) {
        lineChangedFlags[y] = 0;
    }
}

Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
farlepet
Posts: 13
Joined: Sun Dec 30, 2012 12:52 pm

Re: What does your OS look like? (Screen Shots..)

Post by farlepet »

Sorry if i should create a new topic about this, tell me if i should...
Brendan wrote:You should never need to read from display memory (reads are very slow, and if you actually need to read at all then it's always better/faster to read from a buffer in RAM); so the first 2 "OUT"s aren't needed.

For the second pair of "OUT"s, you can do "outb(VGA_SEQ_INDEX, 2);" once (e.g. just after setting the video mode) and it will stay set. You don't need to set it each time you change planes.
I tried doing this:

Code: Select all

#define do_vga_bit(beg, bit, val) ((val) ? (*beg = (*beg | (0x80 >> bit))) : (*beg = (*beg & ~(0x80 >> bit))))

void _4BPP(int x, int y, int c)
{
	if(x > VGA_width || y > VGA_height) return;
	int pix = x & 0x07; x>>=3;
	char *loc = (char *)(0xA0000 + x + (y * (90/*VGA_width >> 3*/)));
	outb(VGA_SEQ_DATA, 1); do_vga_bit(loc, pix, c & 1);
	outb(VGA_SEQ_DATA, 2); do_vga_bit(loc, pix, c & 2);
	outb(VGA_SEQ_DATA, 4); do_vga_bit(loc, pix, c & 4);
	outb(VGA_SEQ_DATA, 8); do_vga_bit(loc, pix, c & 8);
}
but i seem to need to set the read plane for it to work. (probably because I'm and-ing/or-ing the existing memory, i dont know any other way other than a buffer...) When displaying the BMP, it gives me this instead:
https://www.dropbox.com/s/d12qcobhhry47 ... 1%3A34.png

Is there another way to do this?
Lambda OS:
GitHub: https://github.com/farlepet
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimising 4-bpp Modes (was: What does your OS look like

Post by Combuster »

Well, an or/and is a read from VRAM and thus suboptimal - especially when you use it in cases where there's no reason to know the original content - f.x. when you are setting all 8 consecutive pixels of a character cell.

In the remaining case, using the VGA latch mechanics can save you from selecting the read plane. Or you can make life easy and double buffer in system RAM and only send dirty blocks to the VGA card.

Bottom line, optimizing graphics is an art of its own :wink: (Try learning it from the master)
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Optimising 4-bpp Modes (was: What does your OS look like

Post by Antti »

I have a little question that does not need another thread. Currently I have planned something like this (BIOS booting):

Code: Select all

if vbe extensions available
	if suitable mode found
		jump forward (everything perfectly OK)
	else suitable mode not found AND hardware level VGA compatibility
		jump set_12h
else assume hardware level VGA compatibility

set_12h: (640x480x16)
try to set 12h (BIOS function)
if not failed (how to check?)
	jump forward (everything OK, use "VGA registers and planes")

set_11h: (640x480x1)
try to set 11h (BIOS function)
if not failed (how to check?)
	jump forward (everything OK, just use "A0000 LFB")

forward:
Video mode is set. Start the correct "driver" that can do "character draws, line draws, putpixel etc."
Are there any pitfalls that should be taken into account? At least I should develop a better way to do pseudocode.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimising 4-bpp Modes (was: What does your OS look like

Post by Combuster »

Well the first conditional block as written is equivalent to an if(vbe) use_vbe(); else always_assume_vga();, which is probably different from what you actually had intended it to be...

Right now my setup looks for class code 03:00:00, and if no PCI video controllers exist then it'll try probing a few registers. I have yet to observe an instance where the alternative path is taken - I don't have a truly headless system, nor do I have PCI video equipment that is non-VGA compatible, so in all cases the VGA driver can get loaded if none of my other drivers take precedence.

As for error checking, BIOS calls should be setting/clearing CF, if that helps.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Optimising 4-bpp Modes (was: What does your OS look like

Post by Antti »

Combuster wrote:Well the first conditional block as written is equivalent to an if(vbe) use_vbe(); else always_assume_vga();, which is probably different from what you actually had intended it to be...
I wrote it like I intended it to be. If there is no VBE, I assume that the hardware is compatible with "port IOs required for setting planes etc."
Combuster wrote:As for error checking, BIOS calls should be setting/clearing CF, if that helps.
Yes, I was aware of that. However, it does not sound very reliable here. Ralf Brown did not list that for INT 0x10, AH=0x00.
Gigasoft
Member
Member
Posts: 856
Joined: Sat Nov 21, 2009 5:11 pm

Re: Optimising 4-bpp Modes (was: What does your OS look like

Post by Gigasoft »

Double buffering might make things slower depending on what you're drawing. If it mostly consists of solid color fills and text, and there is not much overlapping, you'll do better without it. In my system I use double buffering, but I'll probably making it optional so that it's only used by applications that need it (such as Solitaire).

16 port writes to upload an image (or 5 when left and right edges are both divisible by 8) seem to be the minimum. The latches are not useful for this kind of operation.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Optimising 4-bpp Modes (was: What does your OS look like

Post by Combuster »

That wasn't too hard to beat with latches.

Code: Select all

bit_mask = 0xff >> (left_offset % 8)
outw(VGA_GC, VGA_GC_BITMASK | bit_mask << 8);
outw(VGA_SEQ, VGA_SEQ_PLANE | 0x0100);
// for each character cell, perform read, then write bitmap data for plane 0 on left edge
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 8)
// write similarly for plane 3
bit_mask = (0xff << (right_offset % 8)) & 0xff;
outb(VGA_GC+1, bit_mask);
// for each character cell, perform read, then write bitmap data for plane *3* on *right* edge
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 1)
// write similarly for plane 0
outb(VGA_GC+1, 0xff); // restore default value. Probably useful for next round
// write all pixels not on an edge for plane 0. reads not needed
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 8)
// write similarly for plane 3
--------------------------------
13 outs needed worst case.
P.S. if you look good at one of the tricks, you'll know that aligned writes only take 4 outs, and by extension, the default way to cost 8 outs... :wink:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply