push & pop in 64bit mode

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
nicola
Member
Member
Posts: 32
Joined: Mon May 16, 2011 2:05 pm
Location: hanoi

push & pop in 64bit mode

Post by nicola »

i'm trying to write my first function in 64bit mode,
however 64bit doesn't seem to accept PUSHA/POPA/PUSHAD/POPAD instructions
I'm using AMD Sempron 140 Single core 2.7GHz
M2004
Member
Member
Posts: 65
Joined: Sun Mar 07, 2010 2:12 am

Re: push & pop in 64bit mode

Post by M2004 »

Those instructions are not supported in long mode. You need to create
a macro instead which simulates the functionality of pushaq and popaq.

regards
Mac2004
User avatar
nicola
Member
Member
Posts: 32
Joined: Mon May 16, 2011 2:05 pm
Location: hanoi

Re: push & pop in 64bit mode

Post by nicola »

in 64bit mode, there are so many registers,
should we create pushaq,popaq macros?
does it lower down performance?
I'm using AMD Sempron 140 Single core 2.7GHz
M2004
Member
Member
Posts: 65
Joined: Sun Mar 07, 2010 2:12 am

Re: push & pop in 64bit mode

Post by M2004 »

Assuming you are using an assembler. Take a look
at the manual of your assembler how create macroses.

regards
Mac2004
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: push & pop in 64bit mode

Post by Combuster »

if pusha (or popa) is what you want then just push everything. However, in practice you won't need to push more than 15 GPRs so every pusha is wasting at least one push, if not more.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
M2004
Member
Member
Posts: 65
Joined: Sun Mar 07, 2010 2:12 am

Re: push & pop in 64bit mode

Post by M2004 »

Here the macroses I use, (fasm syntax)

Code: Select all

;**************************************************************************************
;PUSHAQ: 	Emulates the 'pushaq instruction' under long mode. 
;
;  Input:  --
;
;  Output: --
;
;**************************************************************************************
align 8
macro PUSHAQ 
     {
	;Save registers to the stack.
	;--------------------------------

	push rax		;save current rax
	push rbx		;save current rbx
	push rcx		;save current rcx
	push rdx		;save current rdx
	push rbp		;save current rbp
	push rdi		;save current rdi
	push rsi		;save current rsi
	push r8			;save current r8
	push r9			;save current r9
	push r10		;save current r10
	push r11		;save current r11
	push r12		;save current r12
	push r13		;save current r13
	push r14		;save current r14
	push r15		;save current r15

      }	;end of macro definition

;**************************************************************************************
;POPAQ: 	Emulates the 'popaq instruction' under long mode. 
;		
;  Input:  --
;
;  Output: --
;
;**************************************************************************************
align 8

macro POPAQ 
     {
	;Restore registers from the stack.
	;--------------------------------

	pop r15			;restore current r15
	pop r14			;restore current r14
	pop r13			;restore current r13
	pop r12			;restore current r12
	pop r11			;restore current r11
	pop r10			;restore current r10
	pop r9			;restore current r9
	pop r8			;restore current r8
	pop rsi			;restore current rsi
	pop rdi			;restore current rdi
	pop rbp			;restore current rbp
	pop rdx			;restore current rdx
	pop rcx			;restore current rcx
	pop rbx			;restore current rbx
	pop rax			;restore current rax

      }	;end of macro definition
User avatar
nicola
Member
Member
Posts: 32
Joined: Mon May 16, 2011 2:05 pm
Location: hanoi

Re: push & pop in 64bit mode

Post by nicola »

allthough bad for performance, however i follow mac2004 to use the NASM version of PUSHAQ

Code: Select all

%macro pushaq 0
    push rax      ;save current rax
    push rbx      ;save current rbx
    push rcx      ;save current rcx
    push rdx      ;save current rdx
    push rbp      ;save current rbp
    push rdi       ;save current rdi
    push rsi       ;save current rsi
    push r8        ;save current r8
    push r9        ;save current r9
    push r10      ;save current r10
    push r11      ;save current r11
    push r12      ;save current r12
    push r13      ;save current r13
    push r14      ;save current r14
    push r15      ;save current r15
%endmacro
easier for coding a little bit :)
I'm using AMD Sempron 140 Single core 2.7GHz
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: push & pop in 64bit mode

Post by gerryg400 »

If you want better performance, it is probably faster to decrement the stack by the size of the entire register set and then mov each register to the stack one at a time.

i.e instead of

Code: Select all

   push reg1
   push reg2
   push reg3
   ...
   push reg15
try

Code: Select all

   sub $120, rsp
   mov reg1, 112(rsp) 
   mov reg2, 104(rsp)
   mov reg3, 96(rsp)
   ...
   mov reg15, 0(rsp)
If a trainstation is where trains stop, what is a workstation ?
User avatar
nicola
Member
Member
Posts: 32
Joined: Mon May 16, 2011 2:05 pm
Location: hanoi

Re: push & pop in 64bit mode

Post by nicola »

gerryg400's method using MOV & SUB SP seems to be 15+1 ticks for 15 MOVs & 1 SUB
and the method using direct PUSH seems to be 15+15 ticks for 15 MOVs & 15 DEC,
25% faster, i think
I'm using AMD Sempron 140 Single core 2.7GHz
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: push & pop in 64bit mode

Post by gerryg400 »

nicola wrote:gerryg400's method using MOV & SUB SP seems to be 15+1 ticks for 15 MOVs & 1 SUB
and the method using direct PUSH seems to be 15+15 ticks for 15 MOVs & 15 DEC,
25% faster, i think
It's not as simple as that. In this case a stand-alone MOV and PUSH would take the same amount of time because the micro-ops that make up the PUSH can be done in parallel.

The advantage comes from the fact that using MOV instead of PUSH reduces the dependency from one instruction to the next and allows the cpu to do parallel and out of order execution.

NOTE: I'm no expert. I hope an expert chimes in soon if I've got this wrong !
If a trainstation is where trains stop, what is a workstation ?
Post Reply