Page 1 of 4

Writing a compiler - how?

Posted: Wed Mar 14, 2012 3:03 pm
by tloszk
Hi there,

I'm new on this forum, so please, don't be angry with me if I ask something stupid. And sorry for my poor English, it is not my native tongue. :)

Well, I would like to write a compiler for a subset of C, but I don't know, where to start. It would be a "fun" project for me, nothing serious. What should I read/learn to do this? I would not use lexer/parser generator, because I would like to learn how a whole compiler build up. Oh, and I'm not newbie in programming. :)

Thanks for the answers,
tloszk

Re: Writing a compiler - how?

Posted: Wed Mar 14, 2012 9:41 pm
by bubach
Here's a copy of the classic compiler tutorial by Crenshaw:
http://compilers.iecc.com/crenshaw/

And here's a small C-like compiler that might give some insight:
http://exmortis.narod.ru/comp_src/bflat050.zip

Re: Writing a compiler - how?

Posted: Thu Mar 15, 2012 1:35 am
by Yoda
Look at Tiny C Compiler - http://bellard.org/tcc/
It is really TINY opensource compiler able to produce executables (linker is built-in). This project may be used as startup for yours.

Re: Writing a compiler - how?

Posted: Thu Mar 15, 2012 5:48 am
by brain
I'm currently writing a language interpreter for my os, parts of the concept are similar e.g. parsing, recursive descent, and evaluating constant expressions for optimization. you're welcome to read the source but its a basic interpreter not c, so ymmv...

Re: Writing a compiler - how?

Posted: Thu Mar 15, 2012 7:00 am
by Solar
tloszk wrote:What should I read/learn to do this? I would not use lexer/parser generator, because I would like to learn how a whole compiler build up.
I always marvel at how often a given implementation of a concept is confused for study material on the field in general.

+1 for the book links from berkus. The Dragon Book is to compiler building what K&R is to the C language.

Re: Writing a compiler - how?

Posted: Thu Mar 15, 2012 7:17 am
by bluemoon
+1 for Dragon Book. It's a must have for both student and collectors. 8)

Re: Writing a compiler - how?

Posted: Sun Mar 18, 2012 2:23 pm
by tloszk
Thanks for all the answers, as I thought, it is not an easy finger exercise. So I will read the Dragon Book, but until I did not do that, I won't write a single line of code, because it would be a piece of waste.

So thanks for you all! :)

Re: Writing a compiler - how?

Posted: Sun Mar 18, 2012 2:57 pm
by JamesM
tloszk wrote:Thanks for all the answers, as I thought, it is not an easy finger exercise. So I will read the Dragon Book, but until I did not do that, I won't write a single line of code, because it would be a piece of waste.

So thanks for you all! :)
Hi,

You might want to write some code as you go along - the dragon book covers all aspects.

Also, you may want to read up on lex and yacc - those tools can really help you once you know what lexing and parsing entail, to get a compiler up and running in a reasonable amount of time.

If you need any specific help, feel free to ask me - I'm a compiler engineer by trade! :)

Cheers,

James

Re: Writing a compiler - how?

Posted: Mon Apr 16, 2012 1:47 am
by Thomas

Re: Writing a compiler - how?

Posted: Mon Apr 16, 2012 11:13 pm
by MasterLee
Free Online Course:

Planned Launch April 23rd - duration 8 Weeks
https://www.coursera.org/course/compilers

Re: Writing a compiler - how?

Posted: Tue Apr 17, 2012 5:44 am
by ACcurrent
I recently wrote my own programming language which compiles to javascript. The first version took me about 20 minutes (though the compiler is in python) so I would not say this is a crazy project (although c is a hard language). The default Compiler Construction: Theory and practice by William A Barrett, Rodney M. Bates, David A. Gustafson and John D. Couch. Essentially the key to creating a compiler would be to first look for tokens like ;,{,:,\n which seperate lines. Then if there is a { or : look for keywords such as if, else etc. Try to determine if there is an assignment using the "=". Hunt for types like "int" and so on. After you've got through of lexing and parsing generating code is easy.

Re: Writing a compiler - how?

Posted: Tue Apr 17, 2012 6:25 am
by JamesM
Essentially the key to creating a compiler would be to first look for tokens like ;,{,:,\n which seperate lines. Then if there is a { or : look for keywords such as if, else etc. Try to determine if there is an assignment using the "=". Hunt for types like "int" and so on. After you've got through of lexing and parsing generating code is easy.
Argh! What?! No! Argh!!! Are you seriously suggesting that horrific ad-hoc approach is a substitute for proper lexing and parsing!?

Re: Writing a compiler - how?

Posted: Tue Apr 17, 2012 6:42 am
by Solar
@ACcurrent:

Seriously. That is basically violating everything that is written in any of the books recommended here. If you go about lexing a language procedurally like you are suggesting, you will get nowhere.

Especially since it's so completely unnecessary with C, for which complete BNF descriptions exist.

Re: Writing a compiler - how?

Posted: Tue Apr 17, 2012 6:49 am
by ACcurrent
It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.

Re: Writing a compiler - how?

Posted: Tue Apr 17, 2012 7:12 am
by JamesM
ACcurrent wrote:It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.
No. I'm sorry, I can support many opinions that I don't share, but this one is just plain wrong.

Not only that, but you advocated your utterly braindead solution to someone else, and that I cannot abide.

Think about it this way, Lexing matches regular languages. You can do that with regular expressions. Parsing matches context-free or context-sensitive languages. Many languages (including all C dialects, Java, and anything with the '(Type)x' cast notation) require help from the semantic analysis step to parse.

Once parsed, the AST then has to be semantically analysed for semantic flaws. Unless you don't care about error checking for some reason.

Then, that AST has to be converted into some simpler intermediate representation to flatten structure and prepare for optimisation. Let's assume you don't care about optimisation for the moment as that's more complex.

To prepare for code generation, you need a way of converting your IR into machine instructions for your chosen target. Not only that but you'll need to know all about the target's ABI and instruction set.

A nontrivial compiler is the most difficult program that exists on a computer, kernel included.. It cannot be done in 131 lines of the world's most crufty Python. What you implemented was an (extremely poor) source-to-source translator.