Hi,
lama wrote:so i'm a bit confused here
i tried the off-screen buffer ; without reading the frame buffer, just writing to it. I cannot help myself, it seems to me, that ram and frame access speeds are equal, so it is still insanely slow.
First, nothing guarantees that display memory is contiguous. For example, there might be 2048 pixels per line where 1280 of them are visible and the remaining 768 pixels are just padding. VBE gives you a "bytes_between_lines" value for this reason, and you should be using it - for e.g.:
Code: Select all
dest_address = address of your buffer in RAM
src_address = address for start of display memory in video card
for(line = 0; line < lastLine; line++) {
memcpy(dest_address, src_address, 1280*2);
src_address += 1280*2;
dest_address += byte_between_lines;
}
The alternative (to make sure your code works on all video cards, rather than just yours, while only doing one copy where possible) is to do something like:
Code: Select all
dest_address = address of your buffer in RAM
src_address = address for start of display memory in video card
if(byte_between_lines = 1280*2) {
memcpy(dest_address, src_address, 1280*2 * lastLine);
} else {
for(line = 0; line < lastLine; line++) {
memcpy(dest_address, src_address, 1280*2);
src_address += 1280*2;
dest_address += byte_between_lines;
}
}
Once that's fixed, the next thing I'd do is find out which pieces are causing the biggest problems. There's no point worrying about the code that copies pixels to display memory if 90% of the time is spent drawing characters in an insanely slow ("
for(y2 = 0; y2 < 16; y2++;) { for(x2 = 0; x2 < 8; x2++) { if(something) put_pixel(x1 + x2, y1 + y2, colour); } }" nested loop. One trick I do is set the a group of pixels in the top left corner of the screen to different colours depending on what my code is doing. For example, you might set the top left pixels to white when drawing characters in the buffer, set the top left pixels to red when scrolling the buffer and then set them to yellow when blitting from the buffer to display memory. By watching those top left pixels you can get a fairly good idea how much time it's spending where - if they're white most of the time then....
Note: The fastest way to draw characters in graphics modes is with bitmasks - e.g. "
pixelMaskForRow = lookupTable[fontDataForRow]; newPixelData = (oldPixelData & ~pixelMaskForRow) || (colour & pixelMaskForRow);". You want to try to do as many pixels at the same time as possible - e.g. in 32-bit code, you could use EAX and EDX together as a 64-bit mask and do 4 pixels at a time.
Next; if someone does "printf("Hello\nThis is nice\nThird line!\nHehee\n");" then you should only blit the buffer to display memory once after all lines have been added to the buffer (rather than doing it 4 times, once for each line); because that avoids lots of pointless writes to display memory.
You can extend this idea further. Your code to print stuff into the buffer should never copy the buffer to display memory at all; and you should have a separate routine (e.g. a "flushBuffer();" routine) that copies from the buffer to display memory. In this case, someone could print 20 things to the buffer and then call "flushBuffer();" once when they're finished printing everything. That avoids lots more pointless writes to display memory.
The next step is to keep a log as a big zero terminated string in memory. When someone prints something the characters are just appended to the end of the big zero terminated string in memory (and not converted to pixels and drawn in the buffer at all). Then when someone calls "flushBuffer();" you'd calculate how many lines were added to the big zero terminated string in memory, scroll the buffer once (e.g. if 22 lines where added to the big zero terminated string in memory you'd scroll the buffer once by 22 lines), and then copy from the buffer to display memory once. This avoids a lot of scrolling. For example, if they add 2000 lines to the big zero terminated string in memory then you could fill the entire buffer with the background colour once and then only draw the last 60 lines (or whatever actually fits on the screen) and avoid scrolling 2000 times and also avoid drawing thousands of characters that get scrolled off the top before they're seen.
Finally, often a lot of pixels don't change at all (especially for things like displaying text during boot, where you're only using 2 colours and the background colour is used for about 80% of the pixels). For example you might replace a space character with a full stop and only change 6 pixels, but copy 2.34 MiB of data to display memory anyway.
If you scroll a screen full of text, it's likely that there's plenty of white space (at the end of lines of text, etc). Then there's the gaps between characters, and things like changing a "P" to a "B", etc (where only about 10 pixels change). With some care, you can remove the majority of these writes. Most people are familiar with "dirty rectangles", but that really doesn't help for things like scrolling the screen. What I tend to do is have one buffer in RAM that contains the new contents of the screen plus another buffer in RAM that contains the current contents of the screen; and then (when blitting to display memory) I compare the data in both of these buffers and only write to display memory (and the second buffer) if the data was different. This tends to remove around 70% of display memory writes, and (despite the extra reads/writes to RAM) tends to make copying from your buffer to display memory twice as fast.
lama wrote:I looked at the page you posted, but i cant get that cache mode working.. is there some specification that can tell me something about the way how to handle MTRR? it would be awesome to have that combine-write mode only for frame buffer memory
Don't rely on write-combining to hide the fact that your code is slow - it'll only make it harder to find/fix the cause of the problem.
Fix your code so that it's fast, and when your code is as fast as possible then start thinking about write-combining (not before).
Cheers,
Brendan