Schol-R-LEA wrote:
PS: Mystran, if you haven't already done so, you might want to take a look at the thesis paper, especially the part about Register Transfer Language. Remember my mini-rant about how it would make more sense to compile C into a Lisp-like AST language rather than compiling Lisp into C? Well, if I'm reading this right, that is what gcc actually does!
I'll take a look. I can't quite recall that rant, but I agree that Lisp is nicer intermediate form. I have a Lisp compiler on the works, which works by source-to-source transformations, mostly evil amounts of lambda's. Nothing really new though, quite similar to Orbit, Twobit and such.. Infact, my intermediate language isn't even quite Lisp: it's more or less just call-by-value lambda-calculus with primitive functions.
It occurs to me that if there were options to emit RTL code (or a binary representation thereof), and accept it as input later, you'd effective have a portable intermediate language similar to the 'slim binaries' used in Oberon: a partially compiled, (mostly) portable, language-independent representation which could be used with a JIT final compiler, which could furthermore do both safety checking and localized optimization. Put that in Java's pipe and smoke it!
I don't know. What I currently do is compile stuff into lambda-nodes of form: (lambda args call) where args is Scheme like argument list and call is of form (a b c ... d) where all the members are either symbols or lambda-expressions.
Symbol's can refer into lexical variables, constants, or global varialbes. All lexicals and constants are replaced with gensyms at this point so there's no problem with shadowing or anything.
Everything else in the language is modelled with primitives. Continuations are explicit, so no lambda ever returns: everything is a tail-call at this point. For example, I actually compile (if p t e) by compiling p normally, and giving it a continuation:
(lambda (p1) (:primitive:test
(lambda (b-1)
(b-1
(lambda () "compiled t")
(lambda () "compiled e")))
p1))
This is mostly irrelevant though, but at least it keeps the intermediate more regular.
At this point there are several things that can be done. Converting lexical variables into (env_index . var_index) is one thing. The resulting stuff is easy to interpret with an interpreter which simply works the lambda's call-part, and either calls a primitive or loops for another lambda. No tail-call problems, since we only ever loop.
On the other hand, if one inlines certain primitives, and makes other stuff like environments explicit, one get's a full compiler.
The CPS conversion done is the trivial one, but there's some beta-reduction heuristics to simplify the resulting form: namely, symbols as arguments to known procedures (say, other lambda nodes) can always be inlined, lambda-nodes in the same position can be moved inside the known procedure if they are referred to only once (although one might want to "logically" raise them again when generating code to avoid deep environment indexes).
Boolean short-circuiting and constant propagation aren't much harder: just (partially) evaluate some of the primitive calls.
What I plan to add is "single-assignment-elimination" like that in Twobit: convert single assignments of procedures in the beginning of procedures into a letrec-form (or defines), and convert singles assignments of regular variables in the beginning of procedure into let-forms.
Beyond the initial CPS conversion the code doesn't care at all what language it's compiling. Ofcourse code-gen has to know the calling convention and primitives depend on the language, but other than that I don't see a reasons you couldn't compile anything like this.
I'm sure someone else has thought of it before though - it's a pretty old idea - which means there are probably reasons why it would be practical.
The particular method I'm using is described in at least:
"Lambda, the ultimate label or a simple optimizing compiler for Scheme" (available at least from ACM, I remember there's another version somewhere else)
"ORBIT: an optimizing compiler for scheme" (ACM at least)
Twobit:
http://www.ccs.neu.edu/home/will/Twobit/
"Realistic compilation by program transformation" (several versions exist)
Sorry for not having time right now to get links for the papers. I am sure there are others.