JamesM wrote:That is interesting code, and your point may well be true -- but part of the point of the bochs code is that none of it is in CPU-specific ASM.
So I'd think the question would be "what is the fastest way to decode instructions that can be written (somewhat cleanly) in pure C?" Which is probably a translation of those jump tables into "switch" statements. I am also unsure whether instruction decoding, or instruction simulation, or the per-instruction cpu_loop overhead takes the most per-instruction time. I'm inclined to believe that it's the cpu_loop. If you singlestep through a single instruction loop, there are at least 500 instructions of overhead for every emulated opcode. I haven't bothered to count precicely, but it's a lot.
Having some background in emulators, I can say that the fastest way to achieve this is in fact jump tables, as Brendan rightly says. Jump tables are best implemented in C by arrays of function pointers, computed statically. GCC implements switches fairly nastily, and function pointer arrays allow you to get past some of the differences in hardware. I'm afraid I can't go into more detail as I'm not sure exactly where the boundaries of my NDA end, but function pointer tables and then switching on subopcodes is definately the way forward.
It will impact on memory footprint usually by no more than 512k-1MB, but will decrease decode latency.
Funny, I have had no issues with Switch's in GCC. My virtual machine (emulator) has multiple methods of decoding instructions, download and check it out, see if you can optimize it more, or see what is more efficient with what compiler. It compiles and works in MSVC and gcc (under linux) as far as I know, it should work in most other compilers/environments as well. A well formulated switch statement will be compiled into a jump table by any good compiler, gcc included (as long as optimizations are on of course). I have found that in most cases, inlined functions in a switch statement to be faster (of course, this changes depending on the compiler, etc). My current emulator (x86) uses a function table (and a sub-opcode table for two-byte opcodes). This produces much cleaner code, more easily modified, runs almost as fast as a jump table, and for my purposes, will work plenty fast. Please download my demo linked earlier and feel free to check out the code and see how it works, run the benchmark under your compiler and see what performs the best with optimizations on/off. Then you can look and see how the implementation is, and decide if speed or neatness is the most important, or if you can get both. That was written in 2003, so it's a bit dated now, but still 100% valid since it decodes instructions from memory in a virtual environment, aka emulator. Please if you do download/check out the code, let me know what was fastest on your machine/compiler, if you had optimizations on/off, and if you made any changes (like, ran the loops more/less times, wrote/ran a different application, etc).