Can someone remind me what is the fastest on actual Pentium architecture ?
[table][tr][td]pusha[/td][td]
push ebx
push esi
push edi[/td]
[/tr][/table]
Well, the idea is to fine-tune my task-switching code: i know there are other registers that pusha will save (i.e. eax, ecx, edx, ebp and esp for those who ask what i'm talkin' about), but they normally could be forgotten safely in my calling conventions or are already saved somewhere else.
with all these pairing, caching, etc. i really wonder what will be the fastest. if you have some infos, lemme know
ASM fine-tuning
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:ASM fine-tuning
I don't have any idea to what extent the following table is accurate, but it might be interesting to know about:
http://www.quantasm.com/opcode_i.html
Regarding the actual time of PUSHA, there's no single answer. If you are at the point where you are trying to shave cycles, you need to read these:
http://www.agner.org/assem/
http://www.azillionmonkeys.com/qed/optimize.html
as well as Intel's manuals (which are presently offline).
Although usually optimization issues are prefaced with a reprimand about how unnecessary it is, nothing will teach you as much as trying to quantify and speed up your code performance, even if you do a lot of silly things in the process.
http://www.quantasm.com/opcode_i.html
Regarding the actual time of PUSHA, there's no single answer. If you are at the point where you are trying to shave cycles, you need to read these:
http://www.agner.org/assem/
http://www.azillionmonkeys.com/qed/optimize.html
as well as Intel's manuals (which are presently offline).
Although usually optimization issues are prefaced with a reprimand about how unnecessary it is, nothing will teach you as much as trying to quantify and speed up your code performance, even if you do a lot of silly things in the process.
Re:ASM fine-tuning
>> as well as Intel's manuals (which are presently offline). <<
Where would you get these?
Where would you get these?
Re:ASM fine-tuning
I thought that was the place. I ordered the books about a month and a half ago, and they STILL haven't come in!
Re:ASM fine-tuning
I'm not going to guess which is faster on a modern processor, but I'd say that for more than five registers, pusha will be quicker than a series of register pushes.
However, it looks like Pype.Clicker is only saving three registers (why, may I ask?) so the three push instructions will apparently be faster on all CPUs than a pusha.
However, it looks like Pype.Clicker is only saving three registers (why, may I ask?) so the three push instructions will apparently be faster on all CPUs than a pusha.
Re:ASM fine-tuning
Don't quote me but I believe PUSHA will be significantly slower than 5 cycles on a Pentium unless going into a cached stack, which perhaps is a real problem given how it's being used.
My money is on the register pushes in any case.
My money is on the register pushes in any case.
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:ASM fine-tuning
Well, i have virtually no experience about quantifying code speed ... do yo have nice tutorials about clock-cycle counting ? i've been told of some "time stamp counter" or something alike in Pentium+ architecture ... how exactly can it be used ?Although usually optimization issues are prefaced with a reprimand about how unnecessary it is, nothing will teach you as much as trying to quantify and speed up your code performance, even if you do a lot of silly things in the process.
Re:ASM fine-tuning
RDTSC is the instruction. If it's not supported in your assembler, you can just hand code it in. Do a google search for the following document:
rdtscpm1.pdf
BTW, the intel docs appear to be mirrored here:
http://www.x86.org/intel.doc/inteldocs.htm
The Agner Fog document is really the only tutorial I'm aware of.
rdtscpm1.pdf
BTW, the intel docs appear to be mirrored here:
http://www.x86.org/intel.doc/inteldocs.htm
The Agner Fog document is really the only tutorial I'm aware of.