Writing a compiler - how?

NickJohnson · Post by **NickJohnson** » Tue Apr 17, 2012 8:31 am

This reminds me of when a friend of mine wrote a calculator program by iteratively searching through the input and evaluating +,-,*,/, etc.

iansjack · Post by **iansjack** » Tue Apr 17, 2012 8:42 am

You wrote a compiler in 20 minutes? I don't believe you, however simple it was.

It takes me more than 20 minutes to switch the computer on and have the first cup of coffee, let alone do any productive work.

Solar · Post by **Solar** » Tue Apr 17, 2012 9:18 am

ACcurrent wrote:And after all, the 1st version of JS# was written in 20 minutes.

Wow. Including version control setup, source comments, and manual? I am impressed. But then again, as was said before, JS# is a source-to-source converter, not a compiler...

turdus · Post by **turdus** » Tue Apr 17, 2012 9:19 am

You are discussing different things (or different subsets of the same thing) I belive.

I agree with JamesM, that a full language cannot be parsed that way efficiently (more likely your code became a mess and won't work at all).

But ACcurrent probably referring to a partial language, and he is right about it could be enough and more efficient under some circumstances. For example a partial language (some descriptor language, like CSS or a language with extremely strict syntax like Sinclair-BASIC) can be parsed faster without lexing. I never saw anybody writing a lexer for a key=value configuration language for example, neither do I saw any C compiler without.

(For clarity I use the word "full language" contra "partial language" in a sense that it lacks either any of three control structures (or you cannot include their combinations in infinite depth), or complex expressions with precedence and unlimited number of sub-expressions. Sorry, I can't translate the right technical term for this. Dictionary translates it to "full", which sounds stupid, but I've no better.)

Yoda · Post by **Yoda** » Wed Apr 18, 2012 6:10 am

I don't believe in compiler from scratch in 20 minutes. It may be one of the two possibilities: either it is built from a ready pre-built blocks, or it was just a stub and the rest 2 days were spent to debug it and to stuff with the rest of features.

miker00lz · Post by **miker00lz** » Mon Apr 23, 2012 10:36 pm

some nice links in this thread. writing a compiler is something i've been considering too. i already understand how to do the majority of it, but what i haven't wrapped my head around yet is how to parse expressions. i mean, things like handling operator precedence in mathematical expressions, etc... expressions can get incredibly complicated, and one little error in parsing breaks the entire program.

then i'd have to come up with a good way to create a "stack" of smaller operations, and do them in exactly the right order and then plug the various results into each other correctly. if i sat down and gave it some real serious thought i can probably come up with a solution, but it seems to be such a big task i haven't wanted to try.

everything else should be relatively easy by comparison. (keyword: relatively)

turdus · Post by **turdus** » Tue Apr 24, 2012 5:11 am

berkus wrote:Well, when you build an expression tree

Well, you don't have to build a tree. It's easier to use polish notation (which uses only stacks), and the result will be the same as walking through the tree.
http://en.wikipedia.org/wiki/Polish_notation
Also there're lot of examples on the web in different programming languages how to do that.
Here's one in Javascript: http://scriptasylum.com/tutorials/infix ... stfix.html

Yoda · Post by **Yoda** » Tue Apr 24, 2012 7:49 am

@turdus,
I totally agree with you, - polish notation greatly simplifies parsing an expressions and you don't need anything else if you create hobby compiler.
But IMHO, building a tree provides more possibilities to optimize code for size, speed and memory usage.

Thomas · Post by **Thomas** » Tue Apr 24, 2012 11:12 am

Hi,

berkus wrote:
turdus wrote:
berkus wrote:Well, when you build an expression tree
Well, you don't have to build a tree. It's easier to use polish notation (which uses only stacks), and the result will be the same as walking through the tree.
That's an implicit tree (or what comes from breadth-first traversal of such tree), so I'm not buying that.

Not really ... . It is strictly not required to create an expression tree for parsing .

Actually RPN like parsing leads to form of parsing known as operator precedence parsing , it is actually an LR form of parsing and not an LL type . It is actually an efficient method of parsing expressions. But during later stages of development its considered far more elegant to have an AST and then make use of the visitor pattern for rest of the process . As we are hobbyist's we need not strictly follow the norm !

You guys might find this interesting : http://www.cs.usfca.edu/~galles/compilerdesign/
You might also find this thread interesting : http://forum.osdev.org/viewtopic.php?t=21349

Link : http://www.ipdatacorp.com/mmpd.html . DASM that comes with the MMURTL package makes use of operator precedence parsing for the most part and generates "okay code " and that's a good enough start !.

If you can figure out a good way to pass semantic and other info in a good way ( generally people use some form of trees for it

) to later stages of the compiler you are good to go

Hopefully something in my rambling is useful to you guys
--Thomas

NickJohnson · Post by **NickJohnson** » Tue Apr 24, 2012 12:07 pm

https://www.coursera.org/course/compilers

AndrewAPrice · Post by **AndrewAPrice** » Mon Apr 30, 2012 10:12 pm

Thomas wrote:Not really ... . It is strictly not required to create an expression tree for parsing.

Expression trees, while not necessary, make things a lot more natural to parse. Grammars are represented by a hierarchy of
rules (with tokens and lexemes as the leaves), so it's very intuitive to represent an instance of the grammar (your program) as a tree. For example:

Code: Select all

int a = someFunc(sin(a), cos(b + c * 2));

Code: Select all

- declare variable
  - type: int
  - name: a
  - value:
     - sumFunc
       - parameter 1:
         -sin
           - a
       - parameter 2:
         - cos
           - +
              - b
              - *
                - c
                - 2

A popular alternative is to output to an intermediate stack based language.

JamesM · Post by **JamesM** » Tue May 01, 2012 2:15 am

... I think some people are missing the point.

RPN is extremely easy to parse as there is no precedence. However, you need to have your language defined such that it uses RPN. You're shifting the burden of definition onto the programmer.

Infix expressions (or even prefix expressions as in s-exprs in lisp) are easier for a human to write and grok than pure reverse polish notation.

So your hands are tied parsing wise by what your language defines.

miker00lz · Post by **miker00lz** » Wed May 02, 2012 1:05 am

MessiahAndrw wrote:
Thomas wrote:Not really ... . It is strictly not required to create an expression tree for parsing.
Expression trees, while not necessary, make things a lot more natural to parse. Grammars are represented by a hierarchy of
rules (with tokens and lexemes as the leaves), so it's very intuitive to represent an instance of the grammar (your program) as a tree. For example:
Code: Select all
int a = someFunc(sin(a), cos(b + c * 2));
Code: Select all
- declare variable
  - type: int
  - name: a
  - value:
     - sumFunc
       - parameter 1:
         -sin
           - a
       - parameter 2:
         - cos
           - +
              - b
              - *
                - c
                - 2
A popular alternative is to output to an intermediate stack based language.

yeah a tree like that would make it pretty straightforward. even if not needed, i would definitely do that for my first compiler.

Yoda · Post by **Yoda** » Wed May 02, 2012 7:28 am

MessiahAndrw wrote:Expression trees, while not necessary, make things a lot more natural to parse...
A popular alternative is to output to an intermediate stack based language.

We are talking about that the parsing of expression doesn't need building a tree for stack approach like RPN. And that expression stack may easily (but not necessarily effectively) be implemented in common x86 architecture.

turdus · Post by **turdus** » Wed May 02, 2012 8:29 am

JamesM wrote:... I think some people are missing the point.

RPN is extremely easy to parse as there is no precedence. However, you need to have your language defined such that it uses RPN.

Sorry, but you're totally wrong. Any infix notation with different precedence operators can be converted into predecende-free postfix notation without problems, and it's totally independent of the language.

Have you checked the link I posted?

OSDev.org

Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?

Re: Writing a compiler - how?