Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
I have tried to improve my console output and scroll to be faster. It was pretty slow.. I optimized the memcopy that was used in the scroll to copy 32bits on each copy etc so now its mouch faster but it still is not very fast... Guess it have something to do with the way the textmode video ram is snooping of the address and data bus for backwards compatiblity or something since it should not be the videoram that is the problem...
I found this;
SD -- Screen Disable
"When set to 1, this bit turns off the display and assigns maximum memory bandwidth to the system. Although the display is blanked, the synchronization pulses are maintained. This bit can be used for rapid full-screen updates."
Two standard hints I've seen for speeding up video memory accesses are marking it as write-combining (so it outputs larger block updates, improving bandwidth as less addresses need to be sent across the bus) and using SSE instructions, preferably including the (S)SSE3 (I think) PREFETCH* and MOVNT* instructions.
First my memcopy did byte copying. Now it use dwords for as mouch as possible and only bytes to make size right in the end.
That "update" gave good performance increase, but it still it should be possible to improve by using SSE qword moves.
When it comes to write combining im not sure how to "initiate" this.. Do you use a set of instructions to give this "hint" to cpu or do you have to control memory controller?
Im very curious if someone else have done anything to improve their memcopy and/or textmode scroll routine..
thomasnilsen wrote:I have tried to improve my console output and scroll to be faster. It was pretty slow.. I optimized the memcopy that was used in the scroll to copy 32bits on each copy etc so now its mouch faster but it still is not very fast... Guess it have something to do with the way the textmode video ram is snooping of the address and data bus for backwards compatiblity or something since it should not be the videoram that is the problem...
If your code is noticeably slow, then you're probably reading from display memory. Never (under any circumstances) should you read anything from display memory - keep a buffer in RAM..
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
thomasnilsen wrote:So keeping a "copy" in ram and write everything to video ram on every update?
That would be faster.
Although I'd keep a log in memory (e.g. a long ASCIIZ string that contains everything you've displayed during boot); with a pointer to the end of the log and a pointer to the character that happens to be at the top of the screen. Each time you add a new line to the bottom of the log then you advance the pointer to the "top character" by one line; and when you send the log to the video you copy characters (starting with the "top of screen" pointer) until you reach the edge of the screen or a newline character (and if you reach a newline character you fill the rest of the line in display memory with spaces). If you change video modes during boot you'd be able to re-display the log (even if the new video mode can display more characters than the old video mode); and after boot you can save the boot load as a file or something.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
SD -- Screen Disable
"When set to 1, this bit turns off the display and assigns maximum memory bandwidth to the system. Although the display is blanked, the synchronization pulses are maintained. This bit can be used for rapid full-screen updates."
Anyone got a clue what this is? If its possible to speed up my scrolling somehow..
What happens here is that video memory is shared - by the PC, the GPU (if any), and the sequencer. The VGA's memory runs at a certain speed, and for every eight pixels, that memory has to be read by the sequencer so it can display the needed graphics in time. That means that you lose 640x480x60/8 bytes/s of the memory's total bandwidth for the sole purpose of displaying it (causing a write to lag because the sequencer has priority over other writes for the obvious reason) By disabling the screen (or rather, its contents), you shut down the drain of pixels from video memory to the sequencer, freeing that bandwidth for the CPU to use. The result is that you won't see anything on screen.
However, that was in the days that things ran on low MHz clocks. Even the first generation of 3D cards have memory rates at or above 50MHz, and wide register sizes, so in the case of VGA output you lose 25(pixel clock)/(50(memory clock)*8(pixels/read)*2(bus width is twice the size)) = 3.1% of the memory bandwidth to the sequencer in the worst case. I doubt that's the kind of improvement you are looking for.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
@thomasnilsen: For x86, because it was originally 16-bit, a 'word' means 16 bits. So the SSE registers are octwords (or "double quadwords" according to Intel docs, but that's just silly). I'm just pedantic about such things
@Combuster: Also, most modern discrete cards use some form of VRAM (GDDR*, usually) which I think has two ports (one read/write, one read-only), specifically so that the display has a grand total of *zero* impact on memory speed. The only problem that brings is the possibility of tearing if you don't use double buffers, but you should do that anyway.
Sorry if i was unclear about the operand type sizes above. I meant d(as in double) and q(as in quad) words
Anyway here are the results so far;
Disable VGA seq before scroll and enable after = makes everything very flickery! So this is a dead road unless there is possibility do do some retrace sync on the disable/enable
Using SSE dword/qword or such is next todo on my list. Probably could expect same performance increase from byte to dword on the memcopying which is approx x4 so that's not too bad
Selenic wrote:@Combuster: Also, most modern discrete cards use some form of VRAM (GDDR*, usually) which I think has two ports (one read/write, one read-only), specifically so that the display has a grand total of *zero* impact on memory speed. The only problem that brings is the possibility of tearing if you don't use double buffers, but you should do that anyway.
Apart from the fact that I mentioned "the first generation of 3D cards", there's another factor 64 in memory speeds so even if the memory isn't dual ported, the net loss on a VGA screen for a modern ATI is <0.1% of the bandwidth. In other words, there is little point in disabling the sequencer for speed's sake when the logic for doing so might cost you more than the gain. In some emulators (at least Bochs) the video emulation works in delta's, in which case screen disable may yield the opposite effect due to the need to write all black. All this leads us to the other observation:
Disable VGA seq before scroll and enable after = makes everything very flickery! So this is a dead road unless there is possibility do do some retrace sync on the disable/enable
Black output is the direct consequence of disabling the screen. You could have seen it coming. Also note that during a retrace, there's no output from the monitor, so shutting down the memory accesses in that timeframe has zero effect on the memory bandwidth.
Together with the other observation, do you want to see a temporary black screen between updates? In most cases, like now, the answer would be "no".
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Well, sometimes you can think what the result will be, but sometimes its nice to check and see that you actually got the result you expected or maby not expected.
The only improvment possible is offscreen buffer and some logic to only write updated data in buffer to video ram.
Writing / copying from it with dword,qword,dqword or even using some optimized mov instruction is the only thing that can improve performance it seems.
Well no huge improvements then..
Better get back on the hard stuff like getting my re-entrant and interruptible isr's working properly..
Depending on what you are doing (say for example, you are particularly worried about scrolling...),
you could simply adjust the offset into display RAM, that VGA displays from...
That way, you don't have to say, copy a bunch of lines up or down or whatever... you simply update what is needed and manage some 'buffer window', which is currently in VGA display memory and adjust the offset - which can be done "reasonably" quickly nowadays - thanks to 32bit IO writes.
Alternatively, you could doublebuffer. Since, you can easily fit at least two 'pages' / 'screens' worth of text in VGA memory - and swap between them.
In any case, you might want to check out M.Abrash's GPBB, you can find it on Gamedev for free.
Google can help, as always.
If you can have the screen at the beginning of video memory, which you usually can, you can have a scrolling buffer that wraps at the end. Just set the line compare field to the scan linë number where the screen should wrap.