Candy is correct, at least for AMD CPUs: The bits which in data/unified cache would be used for ECC, are used for encoding start/stop, prefix and such indicators in the L1 instruction cache; the instructions are simply protected by parity, rather than by full ECC as data is. After all, if your instruction gets corrupted, you just fetch it again. While K8s can do 3 instruction decodes per cycle, and K10 can do 4, this does I believe only hold if you have a primed instruction cache (or maybe also if your sequence of instructions is lucky, in that its one of the layouts common enough the processor was optimized to try it)
As for variable vs fixed length: Given a fixed length instruction set, extending it to be variable length is easy; granted, it may not have the highest coding efficiency, but then again, x86 doesn't either.
However, as an effective instruction set, x86 falls down in quite a few ways:
- Prefix bytes which can come in any order
- Prefix bytes in there entirety (Though those which come in one place are much better!)
- Variable length opcodes
- Variable length postfixes
Now, I need to elaborate on the third point. Variable length opcodes isn't, in of itself a problem, as long as the first coding word establishes a length for the instruction up front.
And yes, thats the deal: If you want an efficient variable length architecture, make how long the instruction is be very easy to find. Perhaps consider using 16-bit coding words; it gives you much more room to play with, and you'll still be able to beat x86 for code density!
(Remember: ARM Thumb2 averages 1.1x the size of x86 for the same source code. Thumb2 isn't the cleanest of variable length RISC instruction sets; for a start, in the the 32-bit instructions, the first 16-bits aren't pulling their weight due to a need for backwards compatibility)