Maybe I am missing on something but how in the world can you assume that the programmer is trying to move >8 bytes? What if we are moving 2 bytes only? You are moving 8 bytes with the MOVQ MMX instruction.
Another point to notice is that when you issue the DEC instruction with any of the General Purpose Registers, the Zero Flag in the EFLAGS register will be set to 1. So this code that you have written:
Is not optimized. You can remove the CMP instruction like this:
Code: Select all
DEC ECX
JNZ .DoSomethingIfECXIsNotZeroYet
Another thing that you might already know is that you "must" ask the programmer how many bytes he would like to copy. Then you will be able to pull off a lot of tricks for copying from the source to the destination.
For example, first find the quotient of the "count" by 8 and see how many QWORDs (Using MMX instruction) you can move to the destination. After that, find the quotient of the "count / 8" divided by 4 and find out how many DWORDs you can move to the destination, then use that for WORD and finally BYTE.
For example, if the user asks you to move 19 bytes from the source to the destination, then 19 bytes of transfer can be done like this:
Two QWORDs (= 2 * 8 = 16 bytes)
Now subtract 16 (the number of bytes we have already moved) from 19 to get 3. 3 will not yield a positive quotient when divided by 4 so divide it by 2 to get 1 as the quotient. The remainder will be 1. So you have to move 1 WORD and 1 Bytes.
As a result, to transfer 19 bytes,
Transfer 2 QWORDs (QWORD = 8 bytes, 2*8= 16 bytes)
Transfer 1 WORD (2 bytes)
Transfer 1 Byte
I have written on __MMXZeroMemory function that I think will give you the idea. However, I have not used DWORD/WORD movements in this code. I have written comments for the code but haven't had time to write documentation. Post back here if you have any questions.
Code: Select all
__MMXZeroMemory:
; void __MMXZeroMemory (void* Destination, DWORD Length)
PUSH EAX ; Push the accumulator onto the stack
PUSH ECX ; Push the count register onto the stack
PUSH EDX ; Push the data register onto the stack
PUSH EDI ; Push the destination index onto the stack
PUSH EBP ; Push the base pointer onto the stack
MOV EBP , ESP ; Move the stack pointer to the base pointer
MOV ECX , DWORD PTR [EBP + 0x1C]; *ECX = The [Length] parameter
TEST ECX , ECX ; See if the requested length is zero
JZ .EP ; Jump to the end of the procedure if yes
MOV EDI , DWORD PTR [EBP + 0x18]; *EDI = The [Destination] parameter
MOV EAX , ECX ; *EAX = The [Length] parameter
XOR EDX , EDX ; Clear the buffer for Byte-moves
SHR ECX , 0x00000003 ; *ECX = Number of QWORDs that we have to move to the Destination
AND EAX , 0x00000007 ; *EAX = Number of parity bytes that we have to move
TEST ECX , ECX ; See if the number of QWORDs to move is zero
JZ .MoveBytes ; Start moving bytes (Up to 7 bytes) if yes
EMMS ; Empty MMX Technology State
PXOR MM0 , MM0 ; The 8-Byte MMX byte is zero
.MoveQWORDs: ; Start zeroing memory 8 bytes at a time
MOVQ QWORD PTR [EDI] , MM0 ; Zero the current QWORD
ADD EDI , 0x00000008 ; Move to the next QWORD in the destination
DEC ECX ; Decrement the number of QWORDs to move
JNZ .MoveQWORDs ; Keep moving QWORDs to the destination
TEST EAX , EAX ; See if the number of parity bytes to move is zero
JZ .EP ; Jump to the end of the procedure if yes
.MoveBytes: ; We are left to move up to 7 bytes now
MOV BYTE PTR [EDI] , DL ; Write one byte to the current destination's location
INC EDI ; Move to the next byte of the destination
DEC EAX ; Decrement the number of bytes to move
JNZ .MoveBytes ; Keep moving parity bytes while EAX>0
.EP: ; End of the procedure routine
POP EBP ; Restore the base pointer
POP EDI ; Restore the destination index
POP EDX ; Restore the data register
POP ECX ; Restore the count register
POP EAX ; Restore the accumulator
RET 0x08 ; Return to the calling procedure
; And sweep 2 parameters off the stack
; And sweep 1 parameter off the stack
Alright, I coded one [__MemCpy] right now and you are free to use it too. But there are certain things you have to bear in mind about the code and these can be used as general programming skills:
When you want to divide a number by n where n is a power of 2 (For example, 1, 2, 4, 8, 16, 32, etc), just find x while 2^x = n
For example, to divide a number by 16, find x for 2^x = 16
This can be evaluated as
Log.2(16) = x
Log.2(2^4) = x
4*Log.2(2) = x
4=x or x=4
Therefore, to be able to divide a number by 16, you can shift it to the right 4 times. This allows us to find the quotient of a number divided by 16. We are going to need this value to be able to find out how many QWORDs/DWORDs/etc we have to move to a buffer. For example, if the user wants to move 34 bytes to a buffer, we shift 34 to the right 2 times to be able to find out how many DWORDs we have to move to the buffer. When you shift a number to the right n times, that number will be divided by 2^n. So shifting a number to the right 2 times will divide the number by 2^2 = 4 (DWORD).
Now that we know the quotient of our division, we have to find the remainder also. Finding the remainder of a division of a number by a power of 2 is really easy, also. If you want to find this remainder, just AND the number of N-1 when N is the quotient. For example, to find the remainder of the division (14/8), just subtract 1 from 8 (8-1 = 7) then AND 14 with 7 to get 6. So 6 is the remainder of the division of (14/8). 1*8 = 8, 8+6 = 14.
To wrap it up, let's say we need the remainder and the quotient of the division (NUMBER / 32).
Quotient = NUMBER SHR x
2^x = 32 => x = 5
Quotient = NUMBER SHR 5
Remainder = NUMBER AND (32-1) = NUMBER AND 31
That's that. Now to be able to understand the code below, you need to know a few optimization tricks.
1) If the required length specified by the programmer is zero, then exit the function immediately.
2) If the source and the destination are equal, then quit the function immediately.
3) Use the MOVSD instruction to move ECX number of DWORDs at a time from DS:ESI to DS:EDI.
4) Use the MOVSB instruction to move ECX number of BYTES at a time from DS:ESI to DS:EDI.
5) Do not assume that the processor your kernel is running in is equipped with MMX/SSE instructions.
6) Do not trash your general purpose registers; PUSH and POP them to save their current state.
7) Clear the direction flag when using string instructions (although not necessarily) with the CLD instruction.
8) I use the StdCall calling convention in my kernel. You should push the parameters from right to left in this calling convention.
I didn't have enough time to write documentation for this code either. If you need further help/explanation, write back here.
Code: Select all
; ——————————————————————————————————————————————————
__MemCpy:
; void __MemCpy (void* Destination, const void* Source, DWORD NumberofBytesToCopy); StdCall;
PUSHFD ; Push the EFLAGS register onto the stack
PUSH EAX ; Push the accumulator onto the stack
PUSH ECX ; Push the count register onto the stack
PUSH ESI ; Push the source index onto the stack
PUSH EDI ; Push the destination index onto the satck
PUSH EBP ; Push the base pointer onto the stack
MOV EBP , ESP ; Move the stack pointer to the base pointer
CLD ; Clear the direction flag, Move forward with string instructions
MOV ECX , DWORD PTR [EBP + 0x24]; *ECX = The [NumberofBYtesToCopy] parameter
TEST ECX , ECX ; See if the number of bytes we have to copy is zero
JZ .EP ; Jump to the end of the procedure if yes
MOV ESI , DWORD PTR [EBP + 0x20]; *ESI = The [Source] parameter
MOV EDI , DWORD PTR [EBP + 0x1C]; *EDI = The [Destination] parameter
CMP ESI , EDI ; Are the source and the destination the same memory location?
JE .EP ; If yes, then jump to the end of the procedure
MOV EAX , ECX ; EAX holds the [NumberofBytesToCopy] parameter too
AND EAX , 0x00000003 ; *EAX = Remainder of ([NumberofBytesToCopy] / 4)
SHR ECX , 0x00000002 ; *ECX = Quotient of ([NumberofBytesToCopy] / 4)
REP MOVSD ; Store as many as ECX DWORDs from source to destination
MOV ECX , EAX ; Put the count in the count register
REP MOVSB ; Now start moving bytes from Source to Destination
.EP: ; End of the procedure
POP EBP ; Restore the base pointer
POP EDI ; Restore the destination index
POP ESI ; Restore the source index
POP ECX ; Restore the count register
POP EAX ; Restore the base index
POPFD ; Restore the EFLAGS register
RET 0x0C ; Return to the calling procedure
; And sweep 3 parameters off the stack
Hope that helps!