Hi,
I'm _trying_ to resume my CIL/JVM bytecode -> native compiler project. I'm in indecision about the intermediate representation to use for [future] optimizer plug ins to work on, and generate target code.
Currently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.
As an alternative, regular binary trees can be used, which; I believe will provide more flexibility. We can even use basic blocks with each basic block as regular tree.
I've never worked on a compiler project before, nor have I really analyzed source of one. [well, I have looked at gcc & lcc sources, but they are way huge]. Thats where the indecision comes from.
Please enlighten me. I need that !
Thanks in advance
[I know, there is a lot of other stuff i need to worry about at the same time, like GC, threading, generics n the rest. I'm going by "Start slow, naive" law as they say ]
~Prashant
Intermediate representation
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Basically, you have the constructs of one particular language, and you need to translate those to the constructs of another (assembly)
Some theory behind compilers and other stuff
Functional starting code for the university's compiler assignment (compiles a subset of java to assembly)
Some theory behind compilers and other stuff
Functional starting code for the university's compiler assignment (compiles a subset of java to assembly)
Thanks for the quick reply Combuster!
I know neither Haskell nor (if I'm not mistaken) Dutch ! [ will try n learn it(Haskell) in near future though ]
Since we're reading byte code, guess there is no need to worry about parsing and all that automaton theory. After all, byte code is one intermediate representation, only thing is that it can not be readily processed.
We are actually writing just the backend of a hypothetical java->asm compiler, along with an optimizer which optimizes the IR even further; doing what the java compiler's optimizer didn't.
~Prashant
I know neither Haskell nor (if I'm not mistaken) Dutch ! [ will try n learn it(Haskell) in near future though ]
Since we're reading byte code, guess there is no need to worry about parsing and all that automaton theory. After all, byte code is one intermediate representation, only thing is that it can not be readily processed.
We are actually writing just the backend of a hypothetical java->asm compiler, along with an optimizer which optimizes the IR even further; doing what the java compiler's optimizer didn't.
~Prashant
One thing that could help you make your decision is whether or not you'd like to take advantage of any optimization in your process. Since jvm or clr bytecode is somewhat regular and produced for a virtual machine, it is reasonable to expect you could take advantage of some optimizations since you know which architecture you're compiling for. At the very least you would want to accomplish what something like jit compiling already does on the Java platform.
If you want to go down the optimization route, you should probably first translate things into a SSA because there are algorithms available for optimization using this form. As a hint you can look at
http://gcc.gnu.org/onlinedocs/gccint/Tr ... l#Tree-SSA
or pick up any compiler book covering optimization.
If you want to go down the optimization route, you should probably first translate things into a SSA because there are algorithms available for optimization using this form. As a hint you can look at
http://gcc.gnu.org/onlinedocs/gccint/Tr ... l#Tree-SSA
or pick up any compiler book covering optimization.
Thanks for all the fish.
Aren't patents public? (ie. not secret.)JamesM wrote:I could tell you more about that, but unfortunately my employer has lots of patents in that area... :SCurrently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.
C8H10N4O2 | #446691 | Trust the nodes.
Patents pending. And most of the docs are marked "secret", so that sort of tells me the information I needAlboin wrote:Aren't patents public? (ie. not secret.)JamesM wrote:I could tell you more about that, but unfortunately my employer has lots of patents in that area... :SCurrently I'm thinking of simple 3 address codes or so called quadruples with basic blocks [which is even partially implemented], with which sub-optimal code generation looks trivial.