Writing a compiler - how?

tloszk · Post by **tloszk** » Wed Mar 14, 2012 3:03 pm

Hi there,

I'm new on this forum, so please, don't be angry with me if I ask something stupid. And sorry for my poor English, it is not my native tongue.

Well, I would like to write a compiler for a subset of C, but I don't know, where to start. It would be a "fun" project for me, nothing serious. What should I read/learn to do this? I would not use lexer/parser generator, because I would like to learn how a whole compiler build up. Oh, and I'm not newbie in programming.

Thanks for the answers,
tloszk

bubach · Post by **bubach** » Wed Mar 14, 2012 9:41 pm

Here's a copy of the classic compiler tutorial by Crenshaw:
http://compilers.iecc.com/crenshaw/

And here's a small C-like compiler that might give some insight:
http://exmortis.narod.ru/comp_src/bflat050.zip

Yoda · Post by **Yoda** » Thu Mar 15, 2012 1:35 am

Look at Tiny C Compiler - http://bellard.org/tcc/
It is really TINY opensource compiler able to produce executables (linker is built-in). This project may be used as startup for yours.

brain · Post by **brain** » Thu Mar 15, 2012 5:48 am

I'm currently writing a language interpreter for my os, parts of the concept are similar e.g. parsing, recursive descent, and evaluating constant expressions for optimization. you're welcome to read the source but its a basic interpreter not c, so ymmv...

Solar · Post by **Solar** » Thu Mar 15, 2012 7:00 am

tloszk wrote:What should I read/learn to do this? I would not use lexer/parser generator, because I would like to learn how a whole compiler build up.

I always marvel at how often a given implementation of a concept is confused for study material on the field in general.

+1 for the book links from berkus. The Dragon Book is to compiler building what K&R is to the C language.

bluemoon · Post by **bluemoon** » Thu Mar 15, 2012 7:17 am

+1 for Dragon Book. It's a must have for both student and collectors.

tloszk · Post by **tloszk** » Sun Mar 18, 2012 2:23 pm

Thanks for all the answers, as I thought, it is not an easy finger exercise. So I will read the Dragon Book, but until I did not do that, I won't write a single line of code, because it would be a piece of waste.

So thanks for you all!

JamesM · Post by **JamesM** » Sun Mar 18, 2012 2:57 pm

tloszk wrote:Thanks for all the answers, as I thought, it is not an easy finger exercise. So I will read the Dragon Book, but until I did not do that, I won't write a single line of code, because it would be a piece of waste.

So thanks for you all!

Hi,

You might want to write some code as you go along - the dragon book covers all aspects.

Also, you may want to read up on lex and yacc - those tools can really help you once you know what lexing and parsing entail, to get a compiler up and running in a reasonable amount of time.

If you need any specific help, feel free to ask me - I'm a compiler engineer by trade!

Cheers,

James

Thomas · Post by **Thomas** » Mon Apr 16, 2012 1:47 am

Hi tloszk ,

Free online links for you learn Compiler Construction. Hopefully it is of some use to you

--Thomas

MasterLee · Post by **MasterLee** » Mon Apr 16, 2012 11:13 pm

Free Online Course:

Planned Launch April 23rd - duration 8 Weeks
https://www.coursera.org/course/compilers

ACcurrent · Post by **ACcurrent** » Tue Apr 17, 2012 5:44 am

I recently wrote my own programming language which compiles to javascript. The first version took me about 20 minutes (though the compiler is in python) so I would not say this is a crazy project (although c is a hard language). The default Compiler Construction: Theory and practice by William A Barrett, Rodney M. Bates, David A. Gustafson and John D. Couch. Essentially the key to creating a compiler would be to first look for tokens like ;,{,:,\n which seperate lines. Then if there is a { or : look for keywords such as if, else etc. Try to determine if there is an assignment using the "=". Hunt for types like "int" and so on. After you've got through of lexing and parsing generating code is easy.

JamesM · Post by **JamesM** » Tue Apr 17, 2012 6:25 am

Essentially the key to creating a compiler would be to first look for tokens like ;,{,:,\n which seperate lines. Then if there is a { or : look for keywords such as if, else etc. Try to determine if there is an assignment using the "=". Hunt for types like "int" and so on. After you've got through of lexing and parsing generating code is easy.

Argh! What?! No! Argh!!! Are you seriously suggesting that horrific ad-hoc approach is a substitute for proper lexing and parsing!?

Solar · Post by **Solar** » Tue Apr 17, 2012 6:42 am

@ACcurrent:

Seriously. That is basically violating everything that is written in any of the books recommended here. If you go about lexing a language procedurally like you are suggesting, you will get nowhere.

Especially since it's so completely unnecessary with C, for which complete BNF descriptions exist.

ACcurrent · Post by **ACcurrent** » Tue Apr 17, 2012 6:49 am

It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.

JamesM · Post by **JamesM** » Tue Apr 17, 2012 7:12 am

ACcurrent wrote:It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.

No. I'm sorry, I can support many opinions that I don't share, but this one is just plain wrong.

Not only that, but you advocated your utterly braindead solution to someone else, and that I cannot abide.

Think about it this way, Lexing matches regular languages. You can do that with regular expressions. Parsing matches context-free or context-sensitive languages. Many languages (including all C dialects, Java, and anything with the '(Type)x' cast notation) require help from the semantic analysis step to parse.

Once parsed, the AST then has to be semantically analysed for semantic flaws. Unless you don't care about error checking for some reason.

Then, that AST has to be converted into some simpler intermediate representation to flatten structure and prepare for optimisation. Let's assume you don't care about optimisation for the moment as that's more complex.

To prepare for code generation, you need a way of converting your IR into machine instructions for your chosen target. Not only that but you'll need to know all about the target's ABI and instruction set.

A nontrivial compiler is the most difficult program that exists on a computer, kernel included.. It cannot be done in 131 lines of the world's most crufty Python. What you implemented was an (extremely poor) source-to-source translator.

OSDev.org

Writing a compiler - how?

Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?