Hi,
XCHG wrote:Now if you look at the [.Loop] label, you will see that it has RET at the end of the [.Extract] label therefore, the [.Loop] up to the end of the [.Extract] label is a separate procedure. [__DWORDToStr] is another procedure. So far, we have nested procedures.
Your ".loop" could be considered a seperate procedure that passes parameters in registers, but it doesn't use any HLL calling convention and doesn't reference ESP or EBP at all. It's an example of what can be done once you throw away the stack frame.
Modern CPUs keep track of return values so that the target of a RET can be predicted and doesn't cause stalls - the code you've posted should be optimized to so that the "PUSH OFFSET .EP" doesn't stuff this up:
Code: Select all
; --------------------------------------------------
__DWORDToStr:
; void __DWORDToStr (DWORD InDWORD, char* Buffer)
PUSH EAX
PUSH EBX
PUSH ECX
PUSH EDX
PUSH EDI
MOV EAX , DWORD PTR [ESP + 0x18] ; InDWORD
MOV EDI , DWORD PTR [ESP + 0x1C] ; Buffer
MOV ECX , 0xCCCCCCCD ; Divisor
CALL .Loop
MOV BYTE PTR [EDI] , 0x00
POP EDI
POP EDX
POP ECX
POP EBX
POP EAX
RET 0x08
.Loop:
MOV EBX , EAX
MUL ECX
SHR EDX , 0x00000003
MOV EAX , EDX
ADD EDX , EDX
LEA EDX , [EDX + 0x04*EDX]
SUB EBX , EDX
OR EBX , 0x00000030
PUSH EBX
TEST EAX , EAX
JE .Extract
CALL .Loop
.Extract:
POP EAX
MOV BYTE PTR [EDI] , AL
INC EDI
RET
; --------------------------------------------------
Of course the idea of using up to 80 bytes of stack space, having a call/ret pair in the inner loop, and doing up to 10 byte writes instead of fewer larger writes just doesn't seem right to me. It might be fun to see if this is faster:
Code: Select all
; --------------------------------------------------
__DWORDToStr:
; void __DWORDToStr (DWORD InDWORD, char* Buffer)
PUSH EAX
PUSH EBX
PUSH ECX
PUSH EDX
PUSH ESI
PUSH EDI
PUSH EBP
MOV EAX , DWORD PTR [ESP + 0x20] ; InDWORD
MOV EDI , 0xCCCCCCCD ; Divisor
XOR EBP , EBP
XOR ECX , ECX
XOR ESI , ESI
.L1:
MOV EBX , EAX
MUL EDI
SHR EDX , 0x00000003
MOV EAX , EDX
ADD EDX , EDX
LEA EDX , [EDX + 0x04*EDX]
SUB EBX , EDX
OR EBX , 0x00000030
SHLD ESI, EBP, 8
SHLD EBP, ECX, 8
SHL ECX, 8
TEST EAX , EAX
MOV CL, BL
JNE .L1
MOV EDI , DWORD PTR [ESP + 0x24] ; Buffer
MOV [EDI] , ECX
MOV [EDI+4] , EBP
MOV [EDI+8] , SI
MOV [EDI+10] , AL
POP EBP
POP EDI
POP ESI
POP EDX
POP ECX
POP EBX
POP EAX
RET 0x08
; --------------------------------------------------
Hmm - nearly ran out of registers for that (must've been a coincidence)....
XCHG wrote:About using ESP instead of creating EBP, maybe it is how I feel but I think you don't code in Assembly as much as you do in high level languages because if you did, you would have known that maintainability is one of THE most important parts of coding in Assembly. I have coded procedures that are literally hundreds of lines of code and if I hadn't udes EBP as a steady reference to my local parameters and variables, I could have simply wasted a lot of time re-calculating all the offsets relative to ESP. There are times when you PUSH and POP a lot of values onto and off of the stack and if you had chosen to access the parameters and/or local variables using ESP instead of EBP, you would have gone insane. Now could it make you happy if you still had your EBP as a free GPR while all of the offsets and re-calculating them could take you hours? Sorry I just don't buy that.
For the record, I've been programming assembly for over 20 years (although I've only been doing 80x86 for about 15 of those years). I've spent more time programming in Commodore 64 BASIC than I have in any other high level language.
If you're using local variables, then you shouldn't be pushing and popping stuff everywhere to begin with (setup the stack on entry, then write your code so it spills values to local variables on the previously setup stack, then undo the stack and return). If you write a procedure that's hundreds of lines long, then it's not maintainable (regardless of what you do or don't do with the stack frame), and you should've split it into smaller parts before you got that far.
XCHG wrote:About the speed of execution of the codes that use ESP instead of EBP, you are NOT at all considering the decoding time of instructions, huh? You think it wouldn't matter if you had referenced ESP 100 times instead of EBP and you had made the CPU decode 100 more bytes. That is CRIME against the CPU. [calls 911
]
For small procedures using EBP for the frame pointer will cost you more bytes of code. For larger procedures a few extra bytes of code is nothing compared to having an extra register to help avoid spilling values to the stack, help avoid register dependancies, and help with better instruction scheduling.
Cheers,
Brendan