Writing a compiler - how?

Programming, for all ages and all languages.
tloszk
Posts: 3
Joined: Mon Mar 12, 2012 9:05 am

Writing a compiler - how?

Post by tloszk »

Hi there,

I'm new on this forum, so please, don't be angry with me if I ask something stupid. And sorry for my poor English, it is not my native tongue. :)

Well, I would like to write a compiler for a subset of C, but I don't know, where to start. It would be a "fun" project for me, nothing serious. What should I read/learn to do this? I would not use lexer/parser generator, because I would like to learn how a whole compiler build up. Oh, and I'm not newbie in programming. :)

Thanks for the answers,
tloszk
User avatar
bubach
Member
Member
Posts: 1223
Joined: Sat Oct 23, 2004 11:00 pm
Location: Sweden
Contact:

Re: Writing a compiler - how?

Post by bubach »

Here's a copy of the classic compiler tutorial by Crenshaw:
http://compilers.iecc.com/crenshaw/

And here's a small C-like compiler that might give some insight:
http://exmortis.narod.ru/comp_src/bflat050.zip
"Simplicity is the ultimate sophistication."
http://bos.asmhackers.net/ - GitHub
User avatar
Yoda
Member
Member
Posts: 255
Joined: Tue Mar 09, 2010 8:57 am
Location: Moscow, Russia

Re: Writing a compiler - how?

Post by Yoda »

Look at Tiny C Compiler - http://bellard.org/tcc/
It is really TINY opensource compiler able to produce executables (linker is built-in). This project may be used as startup for yours.
Yet Other Developer of Architecture.
OS Boot Tools.
Russian national OSDev forum.
User avatar
brain
Member
Member
Posts: 234
Joined: Thu Nov 05, 2009 5:04 pm
Location: UK
Contact:

Re: Writing a compiler - how?

Post by brain »

I'm currently writing a language interpreter for my os, parts of the concept are similar e.g. parsing, recursive descent, and evaluating constant expressions for optimization. you're welcome to read the source but its a basic interpreter not c, so ymmv...
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Writing a compiler - how?

Post by Solar »

tloszk wrote:What should I read/learn to do this? I would not use lexer/parser generator, because I would like to learn how a whole compiler build up.
I always marvel at how often a given implementation of a concept is confused for study material on the field in general.

+1 for the book links from berkus. The Dragon Book is to compiler building what K&R is to the C language.
Every good solution is obvious once you've found it.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: Writing a compiler - how?

Post by bluemoon »

+1 for Dragon Book. It's a must have for both student and collectors. 8)
tloszk
Posts: 3
Joined: Mon Mar 12, 2012 9:05 am

Re: Writing a compiler - how?

Post by tloszk »

Thanks for all the answers, as I thought, it is not an easy finger exercise. So I will read the Dragon Book, but until I did not do that, I won't write a single line of code, because it would be a piece of waste.

So thanks for you all! :)
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Writing a compiler - how?

Post by JamesM »

tloszk wrote:Thanks for all the answers, as I thought, it is not an easy finger exercise. So I will read the Dragon Book, but until I did not do that, I won't write a single line of code, because it would be a piece of waste.

So thanks for you all! :)
Hi,

You might want to write some code as you go along - the dragon book covers all aspects.

Also, you may want to read up on lex and yacc - those tools can really help you once you know what lexing and parsing entail, to get a compiler up and running in a reasonable amount of time.

If you need any specific help, feel free to ask me - I'm a compiler engineer by trade! :)

Cheers,

James
MasterLee
Member
Member
Posts: 90
Joined: Fri Mar 13, 2009 8:51 am

Re: Writing a compiler - how?

Post by MasterLee »

Free Online Course:

Planned Launch April 23rd - duration 8 Weeks
https://www.coursera.org/course/compilers
50₰
ACcurrent
Member
Member
Posts: 125
Joined: Thu Aug 11, 2011 12:04 am
Location: Watching You

Re: Writing a compiler - how?

Post by ACcurrent »

I recently wrote my own programming language which compiles to javascript. The first version took me about 20 minutes (though the compiler is in python) so I would not say this is a crazy project (although c is a hard language). The default Compiler Construction: Theory and practice by William A Barrett, Rodney M. Bates, David A. Gustafson and John D. Couch. Essentially the key to creating a compiler would be to first look for tokens like ;,{,:,\n which seperate lines. Then if there is a { or : look for keywords such as if, else etc. Try to determine if there is an assignment using the "=". Hunt for types like "int" and so on. After you've got through of lexing and parsing generating code is easy.
Get back to work!
Github
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Writing a compiler - how?

Post by JamesM »

Essentially the key to creating a compiler would be to first look for tokens like ;,{,:,\n which seperate lines. Then if there is a { or : look for keywords such as if, else etc. Try to determine if there is an assignment using the "=". Hunt for types like "int" and so on. After you've got through of lexing and parsing generating code is easy.
Argh! What?! No! Argh!!! Are you seriously suggesting that horrific ad-hoc approach is a substitute for proper lexing and parsing!?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Writing a compiler - how?

Post by Solar »

@ACcurrent:

Seriously. That is basically violating everything that is written in any of the books recommended here. If you go about lexing a language procedurally like you are suggesting, you will get nowhere.

Especially since it's so completely unnecessary with C, for which complete BNF descriptions exist.
Every good solution is obvious once you've found it.
ACcurrent
Member
Member
Posts: 125
Joined: Thu Aug 11, 2011 12:04 am
Location: Watching You

Re: Writing a compiler - how?

Post by ACcurrent »

It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.
Get back to work!
Github
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: Writing a compiler - how?

Post by JamesM »

ACcurrent wrote:It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.
No. I'm sorry, I can support many opinions that I don't share, but this one is just plain wrong.

Not only that, but you advocated your utterly braindead solution to someone else, and that I cannot abide.

Think about it this way, Lexing matches regular languages. You can do that with regular expressions. Parsing matches context-free or context-sensitive languages. Many languages (including all C dialects, Java, and anything with the '(Type)x' cast notation) require help from the semantic analysis step to parse.

Once parsed, the AST then has to be semantically analysed for semantic flaws. Unless you don't care about error checking for some reason.

Then, that AST has to be converted into some simpler intermediate representation to flatten structure and prepare for optimisation. Let's assume you don't care about optimisation for the moment as that's more complex.

To prepare for code generation, you need a way of converting your IR into machine instructions for your chosen target. Not only that but you'll need to know all about the target's ABI and instruction set.

A nontrivial compiler is the most difficult program that exists on a computer, kernel included.. It cannot be done in 131 lines of the world's most crufty Python. What you implemented was an (extremely poor) source-to-source translator.
Post Reply