guys just to save some clock cicles I´m trying to find out a way to do it.
I saw in some tut´s helppc etc
that pusha takes 11 cicles of clock
and push takes only one
push segreg takes 3 cicles of clock
now is my question
the only segreg that x86 have is cs, ds, es, fs and gs, right ?
eax, ecx, edx, ebx esp, ebp esi and edi is not a segreg, right?
my doubt is the esp, ebp. esi and edi I´m almost sure that isn´t a seg reg.
guys could you help with that question ? please
pusha and push eax, ecx, edx, ebx, etc.
i wouldnt trust anything that tells you how many cycles it takes, since that changes for each release of each model of each class of CPU (there are at least 6 different timings for chips marked as 'P4' -- probably more), plus, intel no longer publishes how many cycles each instruction takes, because it is no longer important (other things affect it more than cycle times, making it virtually impossible to tell at compile-time, how many cycles it will take even with a cycle timing chart)that pusha takes 11 cicles of clock
and push takes only one
push segreg takes 3 cicles of clock
instead, get the intel optimization guide (should be on the page with the manuals) or the AMD equivalent
look at the encodings in the intel manual, volume 2b, apendix B:the only segreg that x86 have is cs, ds, es, fs and gs, right ?
eax, ecx, edx, ebx esp, ebp esi and edi is not a segreg, right?
my doubt is the esp, ebp. esi and edi I´m almost sure that isn´t a seg reg.
GPRs:
000 EAX/AX/AL
001 ECX/CX/CL
010 EDX/DX/DL
011 EBX/BX/BL
100 ESP/SP/AH
101 EBP/BP/CH
110 ESI/SI/DH
111 EDI/DI/BH
all these can be used as normal registers (reg)
segregs (better known as segment registers, or seg2/seg3 are used to hold segments:
seg2/seg3
00/000 - ES
01/001 - CS
10/010 - SS
11/011 - DS
AVAILIBLE IN seg3 ONLY:
100 - FS
101 - GS
110 - RESERVED
111 - RESERVED
the sad part is i wrote most of that from memory (i had to look up DS/SS)
Just to add a few points to [JAAman]'s post, the PUSHA instruction does *not* push 32-bit registers. Instead, it pushes the 16-bits registers that are Accumulator (AX), Base Index (BX), Count Register (CX), Data Register (DX), Stack Pointer (SP), Base Pointer (BP), Source Index (SI) and last but not least, Destination Index (DI).
The 32-bit equivalent of the PUSHA instruction is the PUSHAD. They have POP versions also as in POPA and POPAD for 16-bit and 32-bit registers, respectively.
The combination of PUSHA and POPA instructions took 17 clock cycles to execute on my PIII 800 MHZ machine while the code segment was aligned on a DWORD boundary. I checked it on a WORD boundary and it yielded the same result. The PUSHAD and POPAD versions of this instruction took 2 clock cycles more on the same CPU, for 19 clock cycles in total.
The combination of consecutive PUSH instructions followed by POP instructions to do the same job as PUSHAD and POPAD, manually, took 20 clock cycles for the same machine. I have not checked the 16-bit registers sequential PUSH and POPs but I guess they would take more clock cycles to execute due to partial register access stalls.
The 32-bit equivalent of the PUSHA instruction is the PUSHAD. They have POP versions also as in POPA and POPAD for 16-bit and 32-bit registers, respectively.
The combination of PUSHA and POPA instructions took 17 clock cycles to execute on my PIII 800 MHZ machine while the code segment was aligned on a DWORD boundary. I checked it on a WORD boundary and it yielded the same result. The PUSHAD and POPAD versions of this instruction took 2 clock cycles more on the same CPU, for 19 clock cycles in total.
The combination of consecutive PUSH instructions followed by POP instructions to do the same job as PUSHAD and POPAD, manually, took 20 clock cycles for the same machine. I have not checked the 16-bit registers sequential PUSH and POPs but I guess they would take more clock cycles to execute due to partial register access stalls.
your post is slightly misleading:XCHG wrote:Just to add a few points to [JAAman]'s post, the PUSHA instruction does *not* push 32-bit registers. Instead, it pushes the 16-bits registers that are Accumulator (AX), Base Index (BX), Count Register (CX), Data Register (DX), Stack Pointer (SP), Base Pointer (BP), Source Index (SI) and last but not least, Destination Index (DI).
The 32-bit equivalent of the PUSHA instruction is the PUSHAD. They have POP versions also as in POPA and POPAD for 16-bit and 32-bit registers, respectively.
there is no PUSHAD/POPAD instruction
there is only 1 instruction for pusha/popa -- if your current operand size is 16bits, it will store/retrieve 16bit registers, if you current operand size is 32bits, it will store/retrieve 32bit registers, but both are aliases for the same instruction:
0110.0000 this is pusha if your default size is 16bit, pushad if your default size is 32bit
0110.0001 this is popa if your default size is 16bit, popad if your default size is 32bit
0110.0110 this is the operand size override -- many (not all) assemblers will automatically put this before your pusha/popa instruction if you use the wrong term for your current bits setting (if you use pushad and have 'bits16' or you use pusha when you have 'bits32') however, it is proper (just not common) to refer to both forms as pusha/popa (since they are the same instruction)
you should prob change that to PUSHAD/POPAD instead, just to be safe