Compiler/code generator patterns reference?

Programming, for all ages and all languages.
Post Reply
User avatar
Artlav
Member
Member
Posts: 178
Joined: Fri Aug 21, 2009 5:54 am
Location: Moscow, Russia
Contact:

Compiler/code generator patterns reference?

Post by Artlav »

Is there some sort of a reference book/site for code generation and translation patterns?

I've been looking at too much disassembled code lately, and got to notice some rather clever implementations of various HLL structures, like switch operator done with subtractions, for example.
This got me thinking back to my earlier forages into compiler writing, and having to figure out how to do all such statements in there, due to the lack of this very thing.

So, is there a reference material that would show how do you implement if/while/switch/etc operators, classes, sets and so on?
I've seen books on compiler construction, but they are a lot of text with a few scattered examples hidden in the depths.

While i'm looking for something a little more condensed.
I.e.

if value then A else B;
-Process value
-Emit unresolved conditional jump X
-Process A
-Emit unresolved unconditional jump Y
-Place current location into X
-Process B
-Place current location into Y

for (initial;final;increment){A;}
-Process initial
-Remember current position X
-Process final
-Emit unresolved conditional jump Y
-Process A
-Process increment
-Emit goto X
-Store current position to Y

Something to that effect.
Some of them are obvious, some aren't so much.

So, is there a reference like that around?
Boris
Member
Member
Posts: 145
Joined: Sat Nov 07, 2015 3:12 pm

Re: Compiler/code generator patterns reference?

Post by Boris »

I don't think you can have a reference which will work for any language and any target architecture.
If you are looking for something very generic, why don't you look at LLVM work ? http://llvm.org/docs/LangRef.html
User avatar
Artlav
Member
Member
Posts: 178
Joined: Fri Aug 21, 2009 5:54 am
Location: Moscow, Russia
Contact:

Re: Compiler/code generator patterns reference?

Post by Artlav »

Boris wrote:I don't think you can have a reference which will work for any language and any target architecture.
If you are looking for something very generic, why don't you look at LLVM work ? http://llvm.org/docs/LangRef.html
It does not have to be specific to any architecture. Shouldn't, even.
Some examples of assembler code are fine, but the idea is to document the patterns used.

I'm looking for an equivalent of a linked list Wikipaedia page, rather than Rosetta Code's list of linked list implementation in every language there is.
Description of the pattern rather than all the minutea of making a compiler.

On one hand, we have compiler construction books, like Crenshaw or Fraser, which list the entire code of a compiler with comments and descriptions.
Wading through it to find one little piece is a pain.
On the other hand we have Wikipaedia page on "case statement", which have zero information about implementing it, or articles on scripting language implementations, which are just "how to get started" pieces of code with no patterns described.

I want something in-between.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Compiler/code generator patterns reference?

Post by Brendan »

Hi,
Artlav wrote:Some examples of assembler code are fine, but the idea is to document the patterns used.
This sounds really nice in theory; but in practice I doubt compilers use these patterns.

For example; rather than doing this:
  • for (initial;final;increment){A;}
    -Process initial
    -Remember current position X
    -Process final
    -Emit unresolved conditional jump Y
    -Process A
    -Process increment
    -Emit goto X
    -Store current position to Y
A compiler is much more likely to convert "for" loops (e.g. by modifying the AST/Abstract Syntax Tree) from the original higher level form:

Code: Select all

    for (initial;condition;increment) { A; }
Into a lower level form like:

Code: Select all

    initial;
    if(!condition) goto endloop;
startloop:
    A;
    if(condition) goto startloop;
endloop:
And do the same for things like do, do/while, switch, etc, and converting things like array accesses into pointer arithmetic, etc; so that all the "higher level syntactical sugar" disappears and you're left with little more than (maybe) "if, else, goto, function call, function return, operators and literals".

And then do various optimisations (constant folding, dead code elimination, etc) on the "lower level AST".

And then convert "lower level AST" into "intermediate language" (e.g. SSA).

And then do various optimisations on "intermediate language".

And then (in the back-end) convert from "intermediate language" into something that more closely resembles target specific machine code (e.g. instruction selection, register allocation, etc).

And then do various optimisations on that (peephole, etc).

And then emit the resulting code.

Mostly what I'm saying is that there's no "convert from higher level form directly into something you emit", and it's much more like "many separate steps, where each step brings it closer to something you emit".


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
SpyderTL
Member
Member
Posts: 1074
Joined: Sun Sep 19, 2010 10:05 pm

Re: Compiler/code generator patterns reference?

Post by SpyderTL »

I agree that there should be some sort of mapping list of high level constructs to assembly code, somewhere, even if you have to further optimize it with the surrounding code at a later time.

While I was googling, I found this link that has a list of assembly optimization rules.

http://mark.masmcode.com/
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Compiler/code generator patterns reference?

Post by Schol-R-LEA »

@Brendan: Quite right, though the question of code optimization and generation patterns is still relevant, just not in the way the OP was thinking.

Also, I would expect highly optimizing compilers to insert an optimization step before converting the high-level AST to the less abstract low-level AST, but most compilers won't (or at least won't use them unless given an option to do very aggressive optimizations) because such techniques are both difficult to write and have considerable time requirements. You would only use them for those few optimizations that rely on knowledge of the specific type of operation in question, such as converting a case (switch) to a jump table.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Compiler/code generator patterns reference?

Post by Brendan »

Hi,
Schol-R-LEA wrote:Also, I would expect highly optimizing compilers to insert an optimization step before converting the high-level AST to the less abstract low-level AST, but most compilers won't (or at least won't use them unless given an option to do very aggressive optimizations) because such techniques are both difficult to write and have considerable time requirements. You would only use them for those few optimizations that rely on knowledge of the specific type of operation in question, such as converting a case (switch) to a jump table.
For the raw/high level AST; you can expect a lot of opportunities for things like constant folding and dead code elimination (especially for modern languages where constants and dead code elimination are relied on to avoid pre-processing - e.g. where "if(featureEnabled) { ..." might be used instead of "#ifdef FEATURE_ENABLED").

For this reason, I'd want to do a constant folding pass and a dead code elimination pass as soon as possible (on the high level AST) to avoid unnecessary work sooner (and make the compiler faster).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Compiler/code generator patterns reference?

Post by Schol-R-LEA »

Makes sense, I agree. Dead code elimination is something you probably would want more than one pass of, though, with the high-level pass at the start and at the very least a peephole pass at the end. Different optimization strategies may cause dead code to be generated, and while it probably can be avoided in most case, I don't know if it always can be, so a polish pass during the peephole stage might be a good idea.

Mind you, this relates to some of the issues I am having with the design of my AOT/JIT pair, and the data format for relics itself. I want to do at least some high-level optimization in the AOT, preferably quite a few for code in the release-candidate and release states; if nothing else, the I expect the superoptimizer to trigger rebuilds for some code fairly often, so having the optimizations completed ahead of time would avoid cases where the rebuild time and memory usage exceeds what would be saved by specializing rebuilds.

However, a general high-level optimizer risks losing information that could be used by the JIT optimizer, so I don't want to discard the data entirely. I am thinking that I would have to pack the original AST into the relic, and possibly allow it to have more than one pre-optimized AST as well, and let the JIT decide which AST branches it wants to build from. I may even make it possible for the fetch system to request a specific optimization tree rather than have the whole set of ASTs transferred, but determining something so complicated would probably be going past the point of diminishing returns.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Compiler/code generator patterns reference?

Post by Brendan »

Hi,
Schol-R-LEA wrote:Makes sense, I agree. Dead code elimination is something you probably would want more than one pass of, though, with the high-level pass at the start and at the very least a peephole pass at the end. Different optimization strategies may cause dead code to be generated, and while it probably can be avoided in most case, I don't know if it always can be, so a polish pass during the peephole stage might be a good idea.

Mind you, this relates to some of the issues I am having with the design of my AOT/JIT pair, and the data format for relics itself. I want to do at least some high-level optimization in the AOT, preferably quite a few for code in the release-candidate and release states; if nothing else, the I expect the superoptimizer to trigger rebuilds for some code fairly often, so having the optimizations completed ahead of time would avoid cases where the rebuild time and memory usage exceeds what would be saved by specializing rebuilds.
You'll find that:
  • some optimisations are higher level/generic (constant folding, dead code elimination, common sub-expression elimination, etc) and belong in your AOT compiler (and don't make sense in your JIT)
  • some optimisations (register allocation, peephole, branch removal, etc) depend on low level/machine specific information and belong in your JIT (and don't make any sense in the AOT compiler)
Schol-R-LEA wrote:However, a general high-level optimizer risks losing information that could be used by the JIT optimizer, so I don't want to discard the data entirely. I am thinking that I would have to pack the original AST into the relic, and possibly allow it to have more than one pre-optimized AST as well, and let the JIT decide which AST branches it wants to build from. I may even make it possible for the fetch system to request a specific optimization tree rather than have the whole set of ASTs transferred, but determining something so complicated would probably be going past the point of diminishing returns.
Your AOT can insert "hints" - things like if a branch is likely to be taken/untaken, how likely each different case of a switch statement is, whether various pieces of code are expected to be executed often or rarely, etc. The idea is to do as much expensive analysis as possible during AOT (and generate the hints from that), because you know you can't afford to do anything expensive at run-time (because it'll cost more than you gain).

Note that (in my opinion) a JIT should use something like "machine code for a virtual machine", and shouldn't use AST at all. Mostly you want nice linear sequences of "instructions" (separated by control flow where unavoidable), and don't want to be bouncing around a tree data structure following parent/child/sibling pointers all the time.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Compiler/code generator patterns reference?

Post by Schol-R-LEA »

That's an aspect I hadn't considered; I was operating under the assumption that using the AST might actually be faster despite the additional processing time, since it is still relatively abstract and thus can be rather smaller, which means shorter load and transfer times (which I anticipate to be longer than the average processing time for an individual relic) and a smaller storage and main memory footprint.

I suppose this is something that will come down to performance metrics; I am willing to experiment with a few different designs before committing to one.

Keep in mind that the results of JIT compilation would be hierarchically cached, and even with cache misses and re-compile requests issued by the re-tuning decision engine, I don't expect that it would recompile very often.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Post Reply