Writing a compiler - how?
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
Re: Writing a compiler - how?
This reminds me of when a friend of mine wrote a calculator program by iteratively searching through the input and evaluating +,-,*,/, etc.
Re: Writing a compiler - how?
You wrote a compiler in 20 minutes? I don't believe you, however simple it was.
It takes me more than 20 minutes to switch the computer on and have the first cup of coffee, let alone do any productive work.
It takes me more than 20 minutes to switch the computer on and have the first cup of coffee, let alone do any productive work.
Re: Writing a compiler - how?
Wow. Including version control setup, source comments, and manual? I am impressed. But then again, as was said before, JS# is a source-to-source converter, not a compiler...ACcurrent wrote:And after all, the 1st version of JS# was written in 20 minutes.
Every good solution is obvious once you've found it.
Re: Writing a compiler - how?
You are discussing different things (or different subsets of the same thing) I belive.
I agree with JamesM, that a full language cannot be parsed that way efficiently (more likely your code became a mess and won't work at all).
But ACcurrent probably referring to a partial language, and he is right about it could be enough and more efficient under some circumstances. For example a partial language (some descriptor language, like CSS or a language with extremely strict syntax like Sinclair-BASIC) can be parsed faster without lexing. I never saw anybody writing a lexer for a key=value configuration language for example, neither do I saw any C compiler without.
(For clarity I use the word "full language" contra "partial language" in a sense that it lacks either any of three control structures (or you cannot include their combinations in infinite depth), or complex expressions with precedence and unlimited number of sub-expressions. Sorry, I can't translate the right technical term for this. Dictionary translates it to "full", which sounds stupid, but I've no better.)
I agree with JamesM, that a full language cannot be parsed that way efficiently (more likely your code became a mess and won't work at all).
But ACcurrent probably referring to a partial language, and he is right about it could be enough and more efficient under some circumstances. For example a partial language (some descriptor language, like CSS or a language with extremely strict syntax like Sinclair-BASIC) can be parsed faster without lexing. I never saw anybody writing a lexer for a key=value configuration language for example, neither do I saw any C compiler without.
(For clarity I use the word "full language" contra "partial language" in a sense that it lacks either any of three control structures (or you cannot include their combinations in infinite depth), or complex expressions with precedence and unlimited number of sub-expressions. Sorry, I can't translate the right technical term for this. Dictionary translates it to "full", which sounds stupid, but I've no better.)
Re: Writing a compiler - how?
I don't believe in compiler from scratch in 20 minutes. It may be one of the two possibilities: either it is built from a ready pre-built blocks, or it was just a stub and the rest 2 days were spent to debug it and to stuff with the rest of features.
Re: Writing a compiler - how?
some nice links in this thread. writing a compiler is something i've been considering too. i already understand how to do the majority of it, but what i haven't wrapped my head around yet is how to parse expressions. i mean, things like handling operator precedence in mathematical expressions, etc... expressions can get incredibly complicated, and one little error in parsing breaks the entire program.
then i'd have to come up with a good way to create a "stack" of smaller operations, and do them in exactly the right order and then plug the various results into each other correctly. if i sat down and gave it some real serious thought i can probably come up with a solution, but it seems to be such a big task i haven't wanted to try.
everything else should be relatively easy by comparison. (keyword: relatively)
then i'd have to come up with a good way to create a "stack" of smaller operations, and do them in exactly the right order and then plug the various results into each other correctly. if i sat down and gave it some real serious thought i can probably come up with a solution, but it seems to be such a big task i haven't wanted to try.
everything else should be relatively easy by comparison. (keyword: relatively)
Re: Writing a compiler - how?
Well, you don't have to build a tree. It's easier to use polish notation (which uses only stacks), and the result will be the same as walking through the tree.berkus wrote:Well, when you build an expression tree
http://en.wikipedia.org/wiki/Polish_notation
Also there're lot of examples on the web in different programming languages how to do that.
Here's one in Javascript: http://scriptasylum.com/tutorials/infix ... stfix.html
Re: Writing a compiler - how?
@turdus,
I totally agree with you, - polish notation greatly simplifies parsing an expressions and you don't need anything else if you create hobby compiler.
But IMHO, building a tree provides more possibilities to optimize code for size, speed and memory usage.
I totally agree with you, - polish notation greatly simplifies parsing an expressions and you don't need anything else if you create hobby compiler.
But IMHO, building a tree provides more possibilities to optimize code for size, speed and memory usage.
Re: Writing a compiler - how?
Hi,
Actually RPN like parsing leads to form of parsing known as operator precedence parsing , it is actually an LR form of parsing and not an LL type . It is actually an efficient method of parsing expressions. But during later stages of development its considered far more elegant to have an AST and then make use of the visitor pattern for rest of the process . As we are hobbyist's we need not strictly follow the norm !
You guys might find this interesting : http://www.cs.usfca.edu/~galles/compilerdesign/
You might also find this thread interesting : http://forum.osdev.org/viewtopic.php?t=21349
Link : http://www.ipdatacorp.com/mmpd.html . DASM that comes with the MMURTL package makes use of operator precedence parsing for the most part and generates "okay code " and that's a good enough start !.
If you can figure out a good way to pass semantic and other info in a good way ( generally people use some form of trees for it ) to later stages of the compiler you are good to go
Hopefully something in my rambling is useful to you guys
--Thomas
Not really ... . It is strictly not required to create an expression tree for parsing .berkus wrote:That's an implicit tree (or what comes from breadth-first traversal of such tree), so I'm not buying that.turdus wrote:Well, you don't have to build a tree. It's easier to use polish notation (which uses only stacks), and the result will be the same as walking through the tree.berkus wrote:Well, when you build an expression tree
Actually RPN like parsing leads to form of parsing known as operator precedence parsing , it is actually an LR form of parsing and not an LL type . It is actually an efficient method of parsing expressions. But during later stages of development its considered far more elegant to have an AST and then make use of the visitor pattern for rest of the process . As we are hobbyist's we need not strictly follow the norm !
You guys might find this interesting : http://www.cs.usfca.edu/~galles/compilerdesign/
You might also find this thread interesting : http://forum.osdev.org/viewtopic.php?t=21349
Link : http://www.ipdatacorp.com/mmpd.html . DASM that comes with the MMURTL package makes use of operator precedence parsing for the most part and generates "okay code " and that's a good enough start !.
If you can figure out a good way to pass semantic and other info in a good way ( generally people use some form of trees for it ) to later stages of the compiler you are good to go
Hopefully something in my rambling is useful to you guys
--Thomas
- NickJohnson
- Member
- Posts: 1249
- Joined: Tue Mar 24, 2009 8:11 pm
- Location: Sunnyvale, California
- AndrewAPrice
- Member
- Posts: 2299
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
Re: Writing a compiler - how?
Expression trees, while not necessary, make things a lot more natural to parse. Grammars are represented by a hierarchy ofThomas wrote:Not really ... . It is strictly not required to create an expression tree for parsing.
rules (with tokens and lexemes as the leaves), so it's very intuitive to represent an instance of the grammar (your program) as a tree. For example:
Code: Select all
int a = someFunc(sin(a), cos(b + c * 2));
Code: Select all
- declare variable
- type: int
- name: a
- value:
- sumFunc
- parameter 1:
-sin
- a
- parameter 2:
- cos
- +
- b
- *
- c
- 2
My OS is Perception.
Re: Writing a compiler - how?
... I think some people are missing the point.
RPN is extremely easy to parse as there is no precedence. However, you need to have your language defined such that it uses RPN. You're shifting the burden of definition onto the programmer.
Infix expressions (or even prefix expressions as in s-exprs in lisp) are easier for a human to write and grok than pure reverse polish notation.
So your hands are tied parsing wise by what your language defines.
RPN is extremely easy to parse as there is no precedence. However, you need to have your language defined such that it uses RPN. You're shifting the burden of definition onto the programmer.
Infix expressions (or even prefix expressions as in s-exprs in lisp) are easier for a human to write and grok than pure reverse polish notation.
So your hands are tied parsing wise by what your language defines.
Re: Writing a compiler - how?
yeah a tree like that would make it pretty straightforward. even if not needed, i would definitely do that for my first compiler.MessiahAndrw wrote:Expression trees, while not necessary, make things a lot more natural to parse. Grammars are represented by a hierarchy ofThomas wrote:Not really ... . It is strictly not required to create an expression tree for parsing.
rules (with tokens and lexemes as the leaves), so it's very intuitive to represent an instance of the grammar (your program) as a tree. For example:
Code: Select all
int a = someFunc(sin(a), cos(b + c * 2));
A popular alternative is to output to an intermediate stack based language.Code: Select all
- declare variable - type: int - name: a - value: - sumFunc - parameter 1: -sin - a - parameter 2: - cos - + - b - * - c - 2
Re: Writing a compiler - how?
We are talking about that the parsing of expression doesn't need building a tree for stack approach like RPN. And that expression stack may easily (but not necessarily effectively) be implemented in common x86 architecture.MessiahAndrw wrote:Expression trees, while not necessary, make things a lot more natural to parse...
A popular alternative is to output to an intermediate stack based language.
Re: Writing a compiler - how?
Sorry, but you're totally wrong. Any infix notation with different precedence operators can be converted into predecende-free postfix notation without problems, and it's totally independent of the language.JamesM wrote:... I think some people are missing the point.
RPN is extremely easy to parse as there is no precedence. However, you need to have your language defined such that it uses RPN.
Have you checked the link I posted?