Hi,
Combuster wrote:Brendan wrote:How about an assembler with a basic peephole optimizer, that will also track instruction dependencies and rearrange "basic blocks" so that I don't need to write unmaintainable spaghetti code (unless I'm working on the small part of my project where I actually want to hand-optimize)?
Well, the assembler should first be able to allow parts to be optimized, while others are not.
Definitely, although I'd be tempted to go further: allow the programmer to explicitly select which optimizations are enabled/disabled for arbitrary pieces of code (potentially including which CPU/s to optimize for).
Combuster wrote:There are always cases where you do not want reordering of instructions (I/O, locking), while you do like that in some other parts.
The assembler can split the code into "basic blocks", where all instructions within a basic block can be reordered. Certain instructions can't be part of a basic block - things like control flow instructions (JMP, CALL, Jcc, INT n, RET, etc), targets of control flow instructions (e.g. for "JMP FOO" an instruction that's after the label "FOO" can't be shifted before the label "FOO"), I/O port instructions, anything with a LOCK prefix, FENCE, MFENCE, SFENCE, RDMSR, RDPMC, RDTSC, WBINVD, INVPLG, MOV, CRn, etc.
You'd also need to be very careful with reads and writes to memory. For e.g. consider the instructions "mov eax,[foo]" and "mov ebx,[ecx]" - the assembler can't change the order because it can't know that "[ecx]" and "[foo]" aren't the same address (and can't know that both instructions don't touch a memory mapped device).
Combuster wrote:Even then, reordering of instructions is not always trivial. What bottleneck should the optimizer opt for? Maximizing pipes? Decoder fitting? AGI avoidance? Cache size? 586 or Netburst? Optimizing out one thing correctly can degrade performance in other places. So "basic" is something you should watch out for.
Even generic optimizations would be a huge bonus.
Consider, um, me. For most of the (assembly language) code I write I do "half-assed" hand-optimization, mostly caring about the algorithm and not really caring too much about the micro-optimizations. The micro-optimization I end up doing is mostly out of habit (I don't really think about what I'm doing, and I never actually benchmark/profile). This is partly because the code I write is almost always intended for a wide range of different CPUs (for e.g. there's no real reason to spend ages getting the best performance on Pentium 4 when it could be run on a K6 or Pentium M or 80486 or ...).
The idea (at least initially) would be for the assembler to do "half-assed" micro-optimizations for me, so that I can write easy to read (and easy to maintain) code instead of complicating the source code with my own "half-assed" micro-optimizations.
Of course it would be entirely possible for the optimizer to take into account all sorts of things, but IMHO perfection isn't strictly necessary and the extra complexity may not be worth worrying about for an assembler's initial release (maybe in the second release?).
Cheers,
Brendan