ASM fine-tuning

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

ASM fine-tuning

Post by Pype.Clicker »

Can someone remind me what is the fastest on actual Pentium architecture ?

[table][tr][td]pusha[/td][td]
push ebx
push esi
push edi[/td]
[/tr][/table]

Well, the idea is to fine-tune my task-switching code: i know there are other registers that pusha will save (i.e. eax, ecx, edx, ebp and esp for those who ask what i'm talkin' about), but they normally could be forgotten safely in my calling conventions or are already saved somewhere else.

with all these pairing, caching, etc. i really wonder what will be the fastest. if you have some infos, lemme know :)
crazybuddha

Re:ASM fine-tuning

Post by crazybuddha »

I don't have any idea to what extent the following table is accurate, but it might be interesting to know about:

http://www.quantasm.com/opcode_i.html

Regarding the actual time of PUSHA, there's no single answer. If you are at the point where you are trying to shave cycles, you need to read these:

http://www.agner.org/assem/
http://www.azillionmonkeys.com/qed/optimize.html

as well as Intel's manuals (which are presently offline).

Although usually optimization issues are prefaced with a reprimand about how unnecessary it is, nothing will teach you as much as trying to quantify and speed up your code performance, even if you do a lot of silly things in the process.
f2

Re:ASM fine-tuning

Post by f2 »

>> as well as Intel's manuals (which are presently offline). <<

Where would you get these?
crazybuddha

Re:ASM fine-tuning

Post by crazybuddha »

f2

Re:ASM fine-tuning

Post by f2 »

I thought that was the place. I ordered the books about a month and a half ago, and they STILL haven't come in!
Tim

Re:ASM fine-tuning

Post by Tim »

I'm not going to guess which is faster on a modern processor, but I'd say that for more than five registers, pusha will be quicker than a series of register pushes.

However, it looks like Pype.Clicker is only saving three registers (why, may I ask?) so the three push instructions will apparently be faster on all CPUs than a pusha.
crazybuddha

Re:ASM fine-tuning

Post by crazybuddha »

Don't quote me but I believe PUSHA will be significantly slower than 5 cycles on a Pentium unless going into a cached stack, which perhaps is a real problem given how it's being used.

My money is on the register pushes in any case.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:ASM fine-tuning

Post by Pype.Clicker »

Although usually optimization issues are prefaced with a reprimand about how unnecessary it is, nothing will teach you as much as trying to quantify and speed up your code performance, even if you do a lot of silly things in the process.
Well, i have virtually no experience about quantifying code speed ... do yo have nice tutorials about clock-cycle counting ? i've been told of some "time stamp counter" or something alike in Pentium+ architecture ... how exactly can it be used ?
crazybuddha

Re:ASM fine-tuning

Post by crazybuddha »

RDTSC is the instruction. If it's not supported in your assembler, you can just hand code it in. Do a google search for the following document:

rdtscpm1.pdf

BTW, the intel docs appear to be mirrored here:

http://www.x86.org/intel.doc/inteldocs.htm

The Agner Fog document is really the only tutorial I'm aware of.
Post Reply