Page 1 of 1

Intermediate representation

Posted: Tue Jan 29, 2008 11:25 am
by dc0d32
Hi,

I'm _trying_ to resume my CIL/JVM bytecode -> native compiler project. I'm in indecision about the intermediate representation to use for [future] optimizer plug ins to work on, and generate target code.

Currently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.

As an alternative, regular binary trees can be used, which; I believe will provide more flexibility. We can even use basic blocks with each basic block as regular tree.

I've never worked on a compiler project before, nor have I really analyzed source of one. [well, I have looked at gcc & lcc sources, but they are way huge]. Thats where the indecision comes from.

Please enlighten me. I need that !

Thanks in advance

[I know, there is a lot of other stuff i need to worry about at the same time, like GC, threading, generics n the rest. I'm going by "Start slow, naive" law as they say :) ]

~Prashant

Posted: Tue Jan 29, 2008 12:31 pm
by Combuster
Basically, you have the constructs of one particular language, and you need to translate those to the constructs of another (assembly)

Some theory behind compilers and other stuff
Functional starting code for the university's compiler assignment (compiles a subset of java to assembly)

Posted: Tue Jan 29, 2008 1:19 pm
by dc0d32
Thanks for the quick reply Combuster!

I know neither Haskell nor (if I'm not mistaken) Dutch ! [ will try n learn it(Haskell) in near future though :D]

Since we're reading byte code, guess there is no need to worry about parsing and all that automaton theory. After all, byte code is one intermediate representation, only thing is that it can not be readily processed.
We are actually writing just the backend of a hypothetical java->asm compiler, along with an optimizer which optimizes the IR even further; doing what the java compiler's optimizer didn't.

~Prashant

Posted: Wed Jan 30, 2008 8:02 am
by babernat
One thing that could help you make your decision is whether or not you'd like to take advantage of any optimization in your process. Since jvm or clr bytecode is somewhat regular and produced for a virtual machine, it is reasonable to expect you could take advantage of some optimizations since you know which architecture you're compiling for. At the very least you would want to accomplish what something like jit compiling already does on the Java platform.

If you want to go down the optimization route, you should probably first translate things into a SSA because there are algorithms available for optimization using this form. As a hint you can look at

http://gcc.gnu.org/onlinedocs/gccint/Tr ... l#Tree-SSA

or pick up any compiler book covering optimization.

Posted: Wed Jan 30, 2008 9:04 am
by JamesM
Currently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.
I could tell you more about that, but unfortunately my employer has lots of patents in that area... :S

Posted: Wed Jan 30, 2008 9:09 am
by Alboin
JamesM wrote:
Currently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.
I could tell you more about that, but unfortunately my employer has lots of patents in that area... :S
Aren't patents public? (ie. not secret.)

Posted: Wed Jan 30, 2008 2:39 pm
by JamesM
Alboin wrote:
JamesM wrote:
Currently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.
I could tell you more about that, but unfortunately my employer has lots of patents in that area... :S
Aren't patents public? (ie. not secret.)
Patents pending. And most of the docs are marked "secret", so that sort of tells me the information I need ;)