Page 1 of 1

Compiler/code generator patterns reference?

Posted: Tue May 03, 2016 2:43 pm
by Artlav
Is there some sort of a reference book/site for code generation and translation patterns?

I've been looking at too much disassembled code lately, and got to notice some rather clever implementations of various HLL structures, like switch operator done with subtractions, for example.
This got me thinking back to my earlier forages into compiler writing, and having to figure out how to do all such statements in there, due to the lack of this very thing.

So, is there a reference material that would show how do you implement if/while/switch/etc operators, classes, sets and so on?
I've seen books on compiler construction, but they are a lot of text with a few scattered examples hidden in the depths.

While i'm looking for something a little more condensed.
I.e.

if value then A else B;
-Process value
-Emit unresolved conditional jump X
-Process A
-Emit unresolved unconditional jump Y
-Place current location into X
-Process B
-Place current location into Y

for (initial;final;increment){A;}
-Process initial
-Remember current position X
-Process final
-Emit unresolved conditional jump Y
-Process A
-Process increment
-Emit goto X
-Store current position to Y

Something to that effect.
Some of them are obvious, some aren't so much.

So, is there a reference like that around?

Re: Compiler/code generator patterns reference?

Posted: Wed May 04, 2016 12:08 am
by Boris
I don't think you can have a reference which will work for any language and any target architecture.
If you are looking for something very generic, why don't you look at LLVM work ? http://llvm.org/docs/LangRef.html

Re: Compiler/code generator patterns reference?

Posted: Wed May 04, 2016 8:08 am
by Artlav
Boris wrote:I don't think you can have a reference which will work for any language and any target architecture.
If you are looking for something very generic, why don't you look at LLVM work ? http://llvm.org/docs/LangRef.html
It does not have to be specific to any architecture. Shouldn't, even.
Some examples of assembler code are fine, but the idea is to document the patterns used.

I'm looking for an equivalent of a linked list Wikipaedia page, rather than Rosetta Code's list of linked list implementation in every language there is.
Description of the pattern rather than all the minutea of making a compiler.

On one hand, we have compiler construction books, like Crenshaw or Fraser, which list the entire code of a compiler with comments and descriptions.
Wading through it to find one little piece is a pain.
On the other hand we have Wikipaedia page on "case statement", which have zero information about implementing it, or articles on scripting language implementations, which are just "how to get started" pieces of code with no patterns described.

I want something in-between.

Re: Compiler/code generator patterns reference?

Posted: Wed May 04, 2016 8:00 pm
by Brendan
Hi,
Artlav wrote:Some examples of assembler code are fine, but the idea is to document the patterns used.
This sounds really nice in theory; but in practice I doubt compilers use these patterns.

For example; rather than doing this:
  • for (initial;final;increment){A;}
    -Process initial
    -Remember current position X
    -Process final
    -Emit unresolved conditional jump Y
    -Process A
    -Process increment
    -Emit goto X
    -Store current position to Y
A compiler is much more likely to convert "for" loops (e.g. by modifying the AST/Abstract Syntax Tree) from the original higher level form:

Code: Select all

    for (initial;condition;increment) { A; }
Into a lower level form like:

Code: Select all

    initial;
    if(!condition) goto endloop;
startloop:
    A;
    if(condition) goto startloop;
endloop:
And do the same for things like do, do/while, switch, etc, and converting things like array accesses into pointer arithmetic, etc; so that all the "higher level syntactical sugar" disappears and you're left with little more than (maybe) "if, else, goto, function call, function return, operators and literals".

And then do various optimisations (constant folding, dead code elimination, etc) on the "lower level AST".

And then convert "lower level AST" into "intermediate language" (e.g. SSA).

And then do various optimisations on "intermediate language".

And then (in the back-end) convert from "intermediate language" into something that more closely resembles target specific machine code (e.g. instruction selection, register allocation, etc).

And then do various optimisations on that (peephole, etc).

And then emit the resulting code.

Mostly what I'm saying is that there's no "convert from higher level form directly into something you emit", and it's much more like "many separate steps, where each step brings it closer to something you emit".


Cheers,

Brendan

Re: Compiler/code generator patterns reference?

Posted: Wed May 04, 2016 11:18 pm
by SpyderTL
I agree that there should be some sort of mapping list of high level constructs to assembly code, somewhere, even if you have to further optimize it with the surrounding code at a later time.

While I was googling, I found this link that has a list of assembly optimization rules.

http://mark.masmcode.com/

Re: Compiler/code generator patterns reference?

Posted: Sat May 07, 2016 8:49 am
by Schol-R-LEA
@Brendan: Quite right, though the question of code optimization and generation patterns is still relevant, just not in the way the OP was thinking.

Also, I would expect highly optimizing compilers to insert an optimization step before converting the high-level AST to the less abstract low-level AST, but most compilers won't (or at least won't use them unless given an option to do very aggressive optimizations) because such techniques are both difficult to write and have considerable time requirements. You would only use them for those few optimizations that rely on knowledge of the specific type of operation in question, such as converting a case (switch) to a jump table.

Re: Compiler/code generator patterns reference?

Posted: Sat May 07, 2016 9:46 am
by Brendan
Hi,
Schol-R-LEA wrote:Also, I would expect highly optimizing compilers to insert an optimization step before converting the high-level AST to the less abstract low-level AST, but most compilers won't (or at least won't use them unless given an option to do very aggressive optimizations) because such techniques are both difficult to write and have considerable time requirements. You would only use them for those few optimizations that rely on knowledge of the specific type of operation in question, such as converting a case (switch) to a jump table.
For the raw/high level AST; you can expect a lot of opportunities for things like constant folding and dead code elimination (especially for modern languages where constants and dead code elimination are relied on to avoid pre-processing - e.g. where "if(featureEnabled) { ..." might be used instead of "#ifdef FEATURE_ENABLED").

For this reason, I'd want to do a constant folding pass and a dead code elimination pass as soon as possible (on the high level AST) to avoid unnecessary work sooner (and make the compiler faster).


Cheers,

Brendan

Re: Compiler/code generator patterns reference?

Posted: Sat May 07, 2016 1:21 pm
by Schol-R-LEA
Makes sense, I agree. Dead code elimination is something you probably would want more than one pass of, though, with the high-level pass at the start and at the very least a peephole pass at the end. Different optimization strategies may cause dead code to be generated, and while it probably can be avoided in most case, I don't know if it always can be, so a polish pass during the peephole stage might be a good idea.

Mind you, this relates to some of the issues I am having with the design of my AOT/JIT pair, and the data format for relics itself. I want to do at least some high-level optimization in the AOT, preferably quite a few for code in the release-candidate and release states; if nothing else, the I expect the superoptimizer to trigger rebuilds for some code fairly often, so having the optimizations completed ahead of time would avoid cases where the rebuild time and memory usage exceeds what would be saved by specializing rebuilds.

However, a general high-level optimizer risks losing information that could be used by the JIT optimizer, so I don't want to discard the data entirely. I am thinking that I would have to pack the original AST into the relic, and possibly allow it to have more than one pre-optimized AST as well, and let the JIT decide which AST branches it wants to build from. I may even make it possible for the fetch system to request a specific optimization tree rather than have the whole set of ASTs transferred, but determining something so complicated would probably be going past the point of diminishing returns.

Re: Compiler/code generator patterns reference?

Posted: Sat May 07, 2016 2:32 pm
by Brendan
Hi,
Schol-R-LEA wrote:Makes sense, I agree. Dead code elimination is something you probably would want more than one pass of, though, with the high-level pass at the start and at the very least a peephole pass at the end. Different optimization strategies may cause dead code to be generated, and while it probably can be avoided in most case, I don't know if it always can be, so a polish pass during the peephole stage might be a good idea.

Mind you, this relates to some of the issues I am having with the design of my AOT/JIT pair, and the data format for relics itself. I want to do at least some high-level optimization in the AOT, preferably quite a few for code in the release-candidate and release states; if nothing else, the I expect the superoptimizer to trigger rebuilds for some code fairly often, so having the optimizations completed ahead of time would avoid cases where the rebuild time and memory usage exceeds what would be saved by specializing rebuilds.
You'll find that:
  • some optimisations are higher level/generic (constant folding, dead code elimination, common sub-expression elimination, etc) and belong in your AOT compiler (and don't make sense in your JIT)
  • some optimisations (register allocation, peephole, branch removal, etc) depend on low level/machine specific information and belong in your JIT (and don't make any sense in the AOT compiler)
Schol-R-LEA wrote:However, a general high-level optimizer risks losing information that could be used by the JIT optimizer, so I don't want to discard the data entirely. I am thinking that I would have to pack the original AST into the relic, and possibly allow it to have more than one pre-optimized AST as well, and let the JIT decide which AST branches it wants to build from. I may even make it possible for the fetch system to request a specific optimization tree rather than have the whole set of ASTs transferred, but determining something so complicated would probably be going past the point of diminishing returns.
Your AOT can insert "hints" - things like if a branch is likely to be taken/untaken, how likely each different case of a switch statement is, whether various pieces of code are expected to be executed often or rarely, etc. The idea is to do as much expensive analysis as possible during AOT (and generate the hints from that), because you know you can't afford to do anything expensive at run-time (because it'll cost more than you gain).

Note that (in my opinion) a JIT should use something like "machine code for a virtual machine", and shouldn't use AST at all. Mostly you want nice linear sequences of "instructions" (separated by control flow where unavoidable), and don't want to be bouncing around a tree data structure following parent/child/sibling pointers all the time.


Cheers,

Brendan

Re: Compiler/code generator patterns reference?

Posted: Sun May 08, 2016 9:47 am
by Schol-R-LEA
That's an aspect I hadn't considered; I was operating under the assumption that using the AST might actually be faster despite the additional processing time, since it is still relatively abstract and thus can be rather smaller, which means shorter load and transfer times (which I anticipate to be longer than the average processing time for an individual relic) and a smaller storage and main memory footprint.

I suppose this is something that will come down to performance metrics; I am willing to experiment with a few different designs before committing to one.

Keep in mind that the results of JIT compilation would be hierarchically cached, and even with cache misses and re-compile requests issued by the re-tuning decision engine, I don't expect that it would recompile very often.