Brendan wrote:Hi,
freecrac wrote:If you write more than one line with "rep movsd" or "memcpy()" then you write data to the invisible portion of display memory, which is a waste of time/bandwidth. You could implement a special case (e.g. do "if(bytes_per_line == horizontal_resoluton * bytes_per_pixel) { ... } else { ... }"), but this is a waste of time as the bottleneck will be PCI bus bandwidth and not CPU speed. Basically, the special case won't improve performance and will make code maintenance worse.
But for to write more than one line at a time i am thinking more about to use more than one writing instruction in the inner loop and not to write across the unvisible area with only one writing instruction.
The bottleneck is still PCI bus bandwidth.
Think of it like this. If the PCI bus can only handle one write even 10 nanoseconds and one CPU is capable of doing a write every nanosecond; then that one CPU would need to wait 9 nanoseconds after each write for the PCI bus to catch up. It doesn't matter if the CPU is writing one horizontal line sequentially or if the CPU is writing the first dword of the first line, then the first dword of the second line, then the second dword of the first line, then the second dword of the second line, etc. It doesn't even matter if you've got 8 CPUs all trying to write different lines at the same time. The PCI bus can only handle one write every 10 nanosecond regardless of what you do.
But i see a different when i use a 64 bit MMX-register (instead of using only a 32 Bit Dword-Register) for example to draw my mousepointer with this image
directly to the framebuffer,
from a buffer, where the content is stored with the releative address (inside of the picture) and with the color of each pixel, only with polling 3DAh and without double/triple buffering.
Without MMX-Register i become a flickering mouspointer with faster moving, but with MMX-Register i can not see any flickering tested with a Geforce 4 TI 4200(VBE3; AGPx4, 64MB) and an AMD Palomino 1800+@1533mhz(SocketA) booting DOS.
The only way to improve this is to do larger writes. For example, if it takes 10 nanoseconds to write 32-bits of data (where there's a packet containing a 32-bit address and a 32-bit data value sent over the PCI bus) then you might be able to transfer 400 MB per second. It might take 12 nanoseconds to send 64-bits of data (where there's a packet containing the a 32-bit address and a 64-bit data value sent over the PCI bus) so you might be able to transfer 727 MB per second that way.
Do you think that it is what i have seen (but i never set "write-combining" on)?
Do you think some bios will set it on by default, also the AGPxmaxspeed, similar to set DMA on for IDE storage devices using int 13h?
What if you use SSE to do 64 byte writes? If it takes 40 nanoseconds to send 512 bits of data then you might be able to transfer 1600 MB per second.
In the future i try to use SSE-Register, but in the past a lot of older CPUs have no SSE-Register, but MMX-Register.
Now, modern CPUs support "write-combining". In this case (if write combining in enabled for the area) the CPU will put smaller writes into a buffer and then send them on the bus as a large 64 byte write. If you're doing lots of small writes then write combining can make a big difference because it improves bus efficiency. Of course if you're using SSE and doing 64 byte writes anyway, then write combining doesn't help at all.
Aaaah. This information spend an other light to the matter and make it a lot clearer for me.
freecrac wrote:OK, maybe 8 times faster is only an insignificant small speed up.
If you've got 100 computers; and one computer has a "8x AGP" video card and 2 have a "4x AGP" card; then adding AGP support to your OS improves performance by approximately 0% on average. Optimising the code (larger writes, avoiding unnecessary writes, etc) works on all computers - if your optimisations only make it 20 times faster on all computers then surely this is far more important than diddling with AGP.
Cheers,
Brendan
These are also good information. Thank you very much. You point exeactly out what i am looking for.
Dirk