Opcode times

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Jason

Opcode times

Post by Jason »

Been looking around for some recent information on opcode times. All seems to be based on old processors Pentium 1, etc.. I have an AMD Athlon XP 2600 there is only a couple of instructions I'am after.....
1) Callgate in use on the same segment and of less protection
2) Ret in the same segment and of lesser protection

Any input would be much appreciated ;)

Thanks in advance :)
Curufir

Re:Opcode times

Post by Curufir »

The reason they don't publish them anymore is because they were never accurate once all the caching/pipline stuff had been taken into consideration. Just do some benchmarking against your clock/counter.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Opcode times

Post by Pype.Clicker »

moreover, the actual amount of clock cycles required for such operations will heavily depend on how those 'complex instructions' are encoded into micro-rom instructions (they're more or less interpreted on modern RISC-cored cpus ...)
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Opcode times

Post by Solar »

If you want to optimize your code, check out the "IA-32 Intel Architecture Optimization Reference Manual". It's a companion to the Software Developer Manuals. It explains all the branch-prediction, pipelining etc. - I'm afraid there is no easy "here's the table" answer to the question how to write utmost efficient code.

Note, however, the olden rule: Premature optimization is the root of all evil. (D. Knuth) Before you start hacking Assembler "because it's faster", assert that:

* the code section you are optimizing actually is the one causing performance problems, and

* your Assembler actually is faster than compiled C.

In both cases, chances are the answer is "no"...
Every good solution is obvious once you've found it.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Opcode times

Post by Candy »

Solar wrote: I'm afraid there is no easy "here's the table" answer to the question how to write utmost efficient code.
Here's the table: (amd.com) http://www.amd.com/us-en/assets/content ... /22007.pdf

It's in appendix F.

:D

I just hate to see people give incorrect answers...

Note that they ARE correct about it being totally useless. It does exist though. The other 80% of the book is about how to optimize.

For more asm optimization stuff, look at this thread: (masmforum.com)
http://www.masmforum.com/viewtopic.php?t=3329&start=0
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Opcode times

Post by Pype.Clicker »

great stuff, candy ... all those "optimized long integer multiply" and other "optimized decimal-to-asciiz conversion" routines will certainly please those who're writing a stdlib replacement :)
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Opcode times

Post by Solar »

Candy wrote:
I just hate to see people give incorrect answers...
Note that:

1) That table does not list execution time in clock cycles, but execute latencies and decode type - which is a different ballgame, and to understand all implications you have to understand the architecture;

2) Numbers given are for AMD Athlon; are you sure it won't differ with the Athlon XP?
Every good solution is obvious once you've found it.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:Opcode times

Post by Candy »

Solar wrote: 1) That table does not list execution time in clock cycles, but execute latencies and decode type - which is a different ballgame, and to understand all implications you have to understand the architecture;
Not necessarily. The numbers CAN also be seen as max clock cycles, not counting memory access latencies, just like on a 486 f.ex.
2) Numbers given are for AMD Athlon; are you sure it won't differ with the Athlon XP?
Athlon is an architecture (actually K7), Athlon XP is a die size plus marketing. The Morgan-series Durons are the exact same but have less cache, the Spitfire Durons are the same at a larger die size (0.18 afaik), etc etc etc. The 32-bit Athlons are all the same (note, this is NOT true for the 64-bit ones), except for some minor differences in features (but not in latencies!). A processor won't have a latency for a feature it doesn't have, of course.
Post Reply