Hi,
Tyler wrote:It does make me wonder how many video drivers actually do use EngBitBlt, and how much it's designed to handle - I could imagine a generic EngBitBlt that uses one of many different methods depending on if the source data is aligned, how much is being transfered, if MMX and/or SSE is supported, if it's an awkward video mode (e.g. 24-bit colour), etc.
I have noticed that mentioned a few times... What is it about 24-bit that makes operating that would work on 8-, 16- and 32-bit not work? Is it simply a data size and alignment issue? Also, is it not feasable copy three 24-bit pixels in two 32-bit ops?
In my experience, 24-bit pixel formal is a huge pain in the neck (if you want it to be fast) because a dword contains 1.333 pixels and it's hard to do aligned writes.
If you're filling a horizontal line with a colour, you end up doing something like:
Code: Select all
mov eax,0xRRGGBBRR
mov ebx,0xGGBBRRGG
mov edx,0xBBRRGGBB
mov ecx,LENGTH/4
.l1:
mov [edi],eax
mov [edi+4],ebx
mov [edi+8],edx
add edi,12
loop .l1
Of course you have to make sure the first pixel is on a 12 byte (4 pixel) boundary to keep your writes aligned, and for arbitrary lines there's a lot of messing about with the first and last pixels.
For MMX or SSE it's worse. Because you're doing 8 byte or 16 byte writes you end up needing to start on a 24 byte or 48 byte boundary and spending more time trying to get the start and end of the line right. For filling an arbitrary rectangle it's the same problem.
For blitting an arbitrary area it can be much worse - it's a mess unless the left and right edges of the source data and the destination are both aligned on 12 byte, 24 byte or 48 byte boundaries.
Worst case is if the source data and the destination are a few pixels out (e.g. if "sourceLeftEdge % 4 != destLeftEdge % 4"). You'd need to do misaligned reads or writes, or rotate data in the inner loop, for e.g.:
Code: Select all
.l1:
mov eax,[esi]
mov ebx,[esi+4]
mov edx,[esi+8]
shld eex,eax,cl
shld efx,ebx,cl
shld egx,edx,cl
mov [edi],eex
mov [edi],efx
mov [edi],egx
rol cx,8
shrd eex,eax,cl
shrd egx,ebx,cl
shrd ehx,edx,cl
rol cx,8
sub dword [count],1
jne .l1
Of course this is simplified - there aren't enough general registers so I made up some new ones! A better approach would be to have 4 different routines (no rotation, 1 byte rotation, 2 byte rotation and 3 byte rotation).
Lastly, if your unlucky enough to be using bank switching you might find that a single pixel is split across different banks (for e.g. for 64 KB banks there's 21845.33333 pixels per bank). I'd hate to attempt something like "blitPIxels(void *srcData, int srcTop, int srcLeft, int destTop, int destLeft, int width, int height);" in this case...
Cheers,
Brendan