Yeah, 4bpp planar mode is pretty difficult to get fast, without all those tricky optimizations.farlepet wrote:(my 640x480x4bpp driver is wayyyy to slow for this as the boot screen right now)
Regards,
Shikhin
Yeah, 4bpp planar mode is pretty difficult to get fast, without all those tricky optimizations.farlepet wrote:(my 640x480x4bpp driver is wayyyy to slow for this as the boot screen right now)
That would not work very well on modern CPU due to dependency stalls, unless you greatly unroll the operation - in such case you rather do SSE.farlepet wrote:By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
A random putpixel in 4bpp mode costs a port out, which is horribly slow. An algorithm aware of this problem is designed to minimize those by combining the video writes with the same GC settings so that you can write more than one pixel per OUT - sometimes at the cost of some other logic. It does in all cases mean that your essentially need to implement the same code paths as you would for hardware accellerated graphics.farlepet wrote:By optimizations, do you mean general optimizations as in changing multiplication to shifting and things like that? Or things specific to the planar mode?
I was thinking of doing one color bit at a time, like going through the image 4 times, taking the 1st bit the 1st time, the 2nd bit the second time, etc.. so I would only have 16 outb's per image.Combuster wrote:A random putpixel in 4bpp mode costs a port out, which is horribly slow. An algorithm aware of this problem is designed to minimize those by combining the video writes with the same GC settings so that you can write more than one pixel per OUT - sometimes at the cost of some other logic. It does in all cases mean that your essentially need to implement the same code paths as you would for hardware accellerated graphics.
I may be using the wrong code then, because when I set a plane, i use this:Combuster wrote:16 outs seems like a lot for a direct ram-to-vram blit. There are four planes, and selecting a different plane only costs one out...
Code: Select all
inline void set_plane(int p)
{
p &= 3;
outb(VGA_GC_INDEX, 4);
outb(VGA_GC_DATA, p);
outb(VGA_SEQ_INDEX, 2);
outb(VGA_SEQ_DATA, 1 << p);
}
You should never need to read from display memory (reads are very slow, and if you actually need to read at all then it's always better/faster to read from a buffer in RAM); so the first 2 "OUT"s aren't needed.farlepet wrote:I may be using the wrong code then, because when I set a plane, i use this:Code: Select all
inline void set_plane(int p) { p &= 3; outb(VGA_GC_INDEX, 4); outb(VGA_GC_DATA, p); outb(VGA_SEQ_INDEX, 2); outb(VGA_SEQ_DATA, 1 << p); }
Code: Select all
void blit_4bpp(void *src) {
int p;
for(p = 1; p <= 8; p << 1) {
outb(VGA_SEQ_DATA, p); // Select the plane
memcpy(videoDisplayMemory, src, bytesPerLine*verticalResolution); // Blit the plane
}
}
Code: Select all
void blit_4bpp(void *src) {
int p;
void *dest;
int y;
for(p = 1; p <= 8; p << 1) {
outb(VGA_SEQ_DATA, p);
dest = videoDisplayMemory;
for(y = 0; y < verticalResolution; y++) {
if(lineChangedFlags[y] != 0) {
// Horizontal line of pixels did change since last time
memcpy(dest, src, bytesPerLine);
src += bytesPerLine;
dest += bytesBetweenLines;
}
}
}
for(y = 0; y < verticalResolution; y++) {
lineChangedFlags[y] = 0;
}
}
I tried doing this:Brendan wrote:You should never need to read from display memory (reads are very slow, and if you actually need to read at all then it's always better/faster to read from a buffer in RAM); so the first 2 "OUT"s aren't needed.
For the second pair of "OUT"s, you can do "outb(VGA_SEQ_INDEX, 2);" once (e.g. just after setting the video mode) and it will stay set. You don't need to set it each time you change planes.
Code: Select all
#define do_vga_bit(beg, bit, val) ((val) ? (*beg = (*beg | (0x80 >> bit))) : (*beg = (*beg & ~(0x80 >> bit))))
void _4BPP(int x, int y, int c)
{
if(x > VGA_width || y > VGA_height) return;
int pix = x & 0x07; x>>=3;
char *loc = (char *)(0xA0000 + x + (y * (90/*VGA_width >> 3*/)));
outb(VGA_SEQ_DATA, 1); do_vga_bit(loc, pix, c & 1);
outb(VGA_SEQ_DATA, 2); do_vga_bit(loc, pix, c & 2);
outb(VGA_SEQ_DATA, 4); do_vga_bit(loc, pix, c & 4);
outb(VGA_SEQ_DATA, 8); do_vga_bit(loc, pix, c & 8);
}
Code: Select all
if vbe extensions available
if suitable mode found
jump forward (everything perfectly OK)
else suitable mode not found AND hardware level VGA compatibility
jump set_12h
else assume hardware level VGA compatibility
set_12h: (640x480x16)
try to set 12h (BIOS function)
if not failed (how to check?)
jump forward (everything OK, use "VGA registers and planes")
set_11h: (640x480x1)
try to set 11h (BIOS function)
if not failed (how to check?)
jump forward (everything OK, just use "A0000 LFB")
forward:
Video mode is set. Start the correct "driver" that can do "character draws, line draws, putpixel etc."
I wrote it like I intended it to be. If there is no VBE, I assume that the hardware is compatible with "port IOs required for setting planes etc."Combuster wrote:Well the first conditional block as written is equivalent to an if(vbe) use_vbe(); else always_assume_vga();, which is probably different from what you actually had intended it to be...
Yes, I was aware of that. However, it does not sound very reliable here. Ralf Brown did not list that for INT 0x10, AH=0x00.Combuster wrote:As for error checking, BIOS calls should be setting/clearing CF, if that helps.
Code: Select all
bit_mask = 0xff >> (left_offset % 8)
outw(VGA_GC, VGA_GC_BITMASK | bit_mask << 8);
outw(VGA_SEQ, VGA_SEQ_PLANE | 0x0100);
// for each character cell, perform read, then write bitmap data for plane 0 on left edge
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 8)
// write similarly for plane 3
bit_mask = (0xff << (right_offset % 8)) & 0xff;
outb(VGA_GC+1, bit_mask);
// for each character cell, perform read, then write bitmap data for plane *3* on *right* edge
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 1)
// write similarly for plane 0
outb(VGA_GC+1, 0xff); // restore default value. Probably useful for next round
// write all pixels not on an edge for plane 0. reads not needed
outb(VGA_SEQ+1, 2)
// write similarly for plane 1
outb(VGA_SEQ+1, 4)
// write similarly for plane 2
outb(VGA_SEQ+1, 8)
// write similarly for plane 3
--------------------------------
13 outs needed worst case.