Page 1 of 1

Opcode times

Posted: Sun Jun 27, 2004 11:24 am
by Jason
Been looking around for some recent information on opcode times. All seems to be based on old processors Pentium 1, etc.. I have an AMD Athlon XP 2600 there is only a couple of instructions I'am after.....
1) Callgate in use on the same segment and of less protection
2) Ret in the same segment and of lesser protection

Any input would be much appreciated ;)

Thanks in advance :)

Re:Opcode times

Posted: Sun Jun 27, 2004 12:56 pm
by Curufir
The reason they don't publish them anymore is because they were never accurate once all the caching/pipline stuff had been taken into consideration. Just do some benchmarking against your clock/counter.

Re:Opcode times

Posted: Sun Jun 27, 2004 3:53 pm
by Pype.Clicker
moreover, the actual amount of clock cycles required for such operations will heavily depend on how those 'complex instructions' are encoded into micro-rom instructions (they're more or less interpreted on modern RISC-cored cpus ...)

Re:Opcode times

Posted: Sun Jun 27, 2004 11:25 pm
by Solar
If you want to optimize your code, check out the "IA-32 Intel Architecture Optimization Reference Manual". It's a companion to the Software Developer Manuals. It explains all the branch-prediction, pipelining etc. - I'm afraid there is no easy "here's the table" answer to the question how to write utmost efficient code.

Note, however, the olden rule: Premature optimization is the root of all evil. (D. Knuth) Before you start hacking Assembler "because it's faster", assert that:

* the code section you are optimizing actually is the one causing performance problems, and

* your Assembler actually is faster than compiled C.

In both cases, chances are the answer is "no"...

Re:Opcode times

Posted: Mon Jun 28, 2004 2:45 am
by Candy
Solar wrote: I'm afraid there is no easy "here's the table" answer to the question how to write utmost efficient code.
Here's the table: (amd.com) http://www.amd.com/us-en/assets/content ... /22007.pdf

It's in appendix F.

:D

I just hate to see people give incorrect answers...

Note that they ARE correct about it being totally useless. It does exist though. The other 80% of the book is about how to optimize.

For more asm optimization stuff, look at this thread: (masmforum.com)
http://www.masmforum.com/viewtopic.php?t=3329&start=0

Re:Opcode times

Posted: Mon Jun 28, 2004 5:20 am
by Pype.Clicker
great stuff, candy ... all those "optimized long integer multiply" and other "optimized decimal-to-asciiz conversion" routines will certainly please those who're writing a stdlib replacement :)

Re:Opcode times

Posted: Mon Jun 28, 2004 5:59 am
by Solar
Candy wrote:
I just hate to see people give incorrect answers...
Note that:

1) That table does not list execution time in clock cycles, but execute latencies and decode type - which is a different ballgame, and to understand all implications you have to understand the architecture;

2) Numbers given are for AMD Athlon; are you sure it won't differ with the Athlon XP?

Re:Opcode times

Posted: Mon Jun 28, 2004 6:42 am
by Candy
Solar wrote: 1) That table does not list execution time in clock cycles, but execute latencies and decode type - which is a different ballgame, and to understand all implications you have to understand the architecture;
Not necessarily. The numbers CAN also be seen as max clock cycles, not counting memory access latencies, just like on a 486 f.ex.
2) Numbers given are for AMD Athlon; are you sure it won't differ with the Athlon XP?
Athlon is an architecture (actually K7), Athlon XP is a die size plus marketing. The Morgan-series Durons are the exact same but have less cache, the Spitfire Durons are the same at a larger die size (0.18 afaik), etc etc etc. The 32-bit Athlons are all the same (note, this is NOT true for the 64-bit ones), except for some minor differences in features (but not in latencies!). A processor won't have a latency for a feature it doesn't have, of course.