VESA mode

freecrac · Post by **freecrac** » Sat Sep 29, 2012 8:14 am

Combuster wrote:Does AGP count as a lack of searching effort?

EDIT: reading back the rest of the thread makes me think OS development is not yet a subject for you... I suggest you go read the required knowledge rule and either tackle something less difficult, or if you do meet those prerequisites, try not to show the current amount of laziness on your side.

It is really so lazy to talk about it for to find an overview for a beginner of OS development?

But your are not so far, i never planed to develop an own OS. I am more interesting of learning and talking about the way to develop an own OS. Maybe i can help some paople to learn also if i am lazy.
Furthermore i think there are many ways for learning and some people use a different way of learning, so that they have individual sorts of problems while learning.

Dirk

Brendan · Post by **Brendan** » Sat Sep 29, 2012 7:29 pm

Hi,

freecrac wrote:
If you write more than one line with "rep movsd" or "memcpy()" then you write data to the invisible portion of display memory, which is a waste of time/bandwidth. You could implement a special case (e.g. do "if(bytes_per_line == horizontal_resoluton * bytes_per_pixel) { ... } else { ... }"), but this is a waste of time as the bottleneck will be PCI bus bandwidth and not CPU speed. Basically, the special case won't improve performance and will make code maintenance worse.
But for to write more than one line at a time i am thinking more about to use more than one writing instruction in the inner loop and not to write across the unvisible area with only one writing instruction.

The bottleneck is still PCI bus bandwidth.

Think of it like this. If the PCI bus can only handle one write even 10 nanoseconds and one CPU is capable of doing a write every nanosecond; then that one CPU would need to wait 9 nanoseconds after each write for the PCI bus to catch up. It doesn't matter if the CPU is writing one horizontal line sequentially or if the CPU is writing the first dword of the first line, then the first dword of the second line, then the second dword of the first line, then the second dword of the second line, etc. It doesn't even matter if you've got 8 CPUs all trying to write different lines at the same time. The PCI bus can only handle one write every 10 nanosecond regardless of what you do.

The only way to improve this is to do larger writes. For example, if it takes 10 nanoseconds to write 32-bits of data (where there's a packet containing a 32-bit address and a 32-bit data value sent over the PCI bus) then you might be able to transfer 400 MB per second. It might take 12 nanoseconds to send 64-bits of data (where there's a packet containing the a 32-bit address and a 64-bit data value sent over the PCI bus) so you might be able to transfer 727 MB per second that way. What if you use SSE to do 64 byte writes? If it takes 40 nanoseconds to send 512 bits of data then you might be able to transfer 1600 MB per second.

Now, modern CPUs support "write-combining". In this case (if write combining in enabled for the area) the CPU will put smaller writes into a buffer and then send them on the bus as a large 64 byte write. If you're doing lots of small writes then write combining can make a big difference because it improves bus efficiency. Of course if you're using SSE and doing 64 byte writes anyway, then write combining doesn't help at all.

freecrac wrote:OK, maybe 8 times faster is only an insignificant small speed up.

If you've got 100 computers; and one computer has a "8x AGP" video card and 2 have a "4x AGP" card; then adding AGP support to your OS improves performance by approximately 0% on average. Optimising the code (larger writes, avoiding unnecessary writes, etc) works on all computers - if your optimisations only make it 20 times faster on all computers then surely this is far more important than diddling with AGP.

Cheers,

Brendan

freecrac · Post by **freecrac** » Sun Sep 30, 2012 4:24 am

Brendan wrote:Hi,

freecrac wrote:
If you write more than one line with "rep movsd" or "memcpy()" then you write data to the invisible portion of display memory, which is a waste of time/bandwidth. You could implement a special case (e.g. do "if(bytes_per_line == horizontal_resoluton * bytes_per_pixel) { ... } else { ... }"), but this is a waste of time as the bottleneck will be PCI bus bandwidth and not CPU speed. Basically, the special case won't improve performance and will make code maintenance worse.
But for to write more than one line at a time i am thinking more about to use more than one writing instruction in the inner loop and not to write across the unvisible area with only one writing instruction.
The bottleneck is still PCI bus bandwidth.

Think of it like this. If the PCI bus can only handle one write even 10 nanoseconds and one CPU is capable of doing a write every nanosecond; then that one CPU would need to wait 9 nanoseconds after each write for the PCI bus to catch up. It doesn't matter if the CPU is writing one horizontal line sequentially or if the CPU is writing the first dword of the first line, then the first dword of the second line, then the second dword of the first line, then the second dword of the second line, etc. It doesn't even matter if you've got 8 CPUs all trying to write different lines at the same time. The PCI bus can only handle one write every 10 nanosecond regardless of what you do.

But i see a different when i use a 64 bit MMX-register (instead of using only a 32 Bit Dword-Register) for example to draw my mousepointer with this image

directly to the framebuffer,
from a buffer, where the content is stored with the releative address (inside of the picture) and with the color of each pixel, only with polling 3DAh and without double/triple buffering.
Without MMX-Register i become a flickering mouspointer with faster moving, but with MMX-Register i can not see any flickering tested with a Geforce 4 TI 4200(VBE3; AGPx4, 64MB) and an AMD Palomino 1800+@1533mhz(SocketA) booting DOS.

The only way to improve this is to do larger writes. For example, if it takes 10 nanoseconds to write 32-bits of data (where there's a packet containing a 32-bit address and a 32-bit data value sent over the PCI bus) then you might be able to transfer 400 MB per second. It might take 12 nanoseconds to send 64-bits of data (where there's a packet containing the a 32-bit address and a 64-bit data value sent over the PCI bus) so you might be able to transfer 727 MB per second that way.

Do you think that it is what i have seen (but i never set "write-combining" on)?
Do you think some bios will set it on by default, also the AGPxmaxspeed, similar to set DMA on for IDE storage devices using int 13h?

What if you use SSE to do 64 byte writes? If it takes 40 nanoseconds to send 512 bits of data then you might be able to transfer 1600 MB per second.

In the future i try to use SSE-Register, but in the past a lot of older CPUs have no SSE-Register, but MMX-Register.

Now, modern CPUs support "write-combining". In this case (if write combining in enabled for the area) the CPU will put smaller writes into a buffer and then send them on the bus as a large 64 byte write. If you're doing lots of small writes then write combining can make a big difference because it improves bus efficiency. Of course if you're using SSE and doing 64 byte writes anyway, then write combining doesn't help at all.

Aaaah. This information spend an other light to the matter and make it a lot clearer for me.

freecrac wrote:OK, maybe 8 times faster is only an insignificant small speed up.
If you've got 100 computers; and one computer has a "8x AGP" video card and 2 have a "4x AGP" card; then adding AGP support to your OS improves performance by approximately 0% on average. Optimising the code (larger writes, avoiding unnecessary writes, etc) works on all computers - if your optimisations only make it 20 times faster on all computers then surely this is far more important than diddling with AGP.

Cheers,

Brendan

These are also good information. Thank you very much. You point exeactly out what i am looking for.

Dirk

Combuster · Post by **Combuster** » Sun Sep 30, 2012 6:06 am

Yet, boosting an AGP system up makes it perform more in line with PCI express, meaning that the worst case speed improves, and more people can use the software at the same quality. The older the machine, the more necessary speed improvements become, and typically, the easier they are.

OSDev.org

VESA mode

Re: VESA mode

Re: VESA mode

Re: VESA mode

Re: VESA mode