I've made a set of insanely simple LALR parser generator tools that you can use in your projects. I'll go ahead and explain how you can use it.
The idea is to eventually rewrite the tools in my own language and integrate these tools into my OS, so it's self-hosting and self-compiling. I have yet to get there, but if you want to use the tools to do the same, be my guest!
See my topic: Self hosting OS/Compiler from scratch
It took a while to create the parser generator. I've been wanting to write one for a long time, and I finally got my head wrapped around it. It's not just any parser, I wrote one years ago. This one should be able to handle most any programming language you throw at it. It only took 20 hours or so to complete, after I understood it for the most part.
Here's the URL: https://github.com/joshwyant/ParserGenerator
So without further ado...
First, you create your terminals and nonterminals:
Code: Select all
enum Terminal {
Unknown, Ident, Var, Equals, DoubleEquals, Eof
}
enum Nonterminal {
Init, Start, Expression, Factor
}
Code: Select all
partial class MyGrammar : LALRGrammar<Terminal, Nonterminal> {
public MyGrammar() : base(Unknown, Eof, Init, Start) {
var start = DefineProduction(Start);
var expression = DefineProduction(Expression);
var factor = DefineProduction(Factor);
ProductionRule t;
t = start % expression;
t = expression % Expression / Equals / Factor;
t = expression % Expression / DoubleEquals / Factor;
t = expression % Factor;
t = factor % Ident;
}
}
You're passing in the 4 system terminals and nonterminals to the base constructor (like "Start" and "Eof"), so the framework can handle those correctly.
Finally, you create a lexer. This one demonstrates the the typical complexity, except for whitespace:
Code: Select all
partial class MyGrammar {
public class Lexer : LexerBase {
Dictionary<string, Terminal> ReservedWords
= new Dictionary<string, Terminal> { "var", Var };
public Lexer(TextReader reader) : base(reader) { }
protected override IEnumerable<Token> Lex() {
while (Reader.HasNext()) {
var c = Reader.Peek();
if (char.IsLetter(c) || c == "_") {
while (char.IsLetterOrDigit(Reader.Peek()) || Reader.Peek() == "_")
Reader.Read();
if (ReservedWords.Contains(...))
yield return new Token(...);
else
yield return new Token(Ident);
}
else {
switch (Reader.Read()) {
case '=':
if (Reader.Peek() == '=') {
Reader.Read(); yield return new Token(DoubleEquals);
}
else yield return new Token(Equals);
break;
// ...
default:
yield return new Token(Terminal.Unknown);
break;
}
}
}
yield return new Token(Eof);
}
}
public override Lexer GetLexer(TextReader reader) {
return new Lexer(reader);
}
}
For a full lexer, see: https://github.com/joshwyant/ParserGene ... e/Lexer.cs
Now, you're ready to use your parser:
Code: Select all
var grammar = new MyGrammar();
var abstractSyntaxTree = grammar.Parse(".......");
Now, I don't believe there are many tools like this. You can even save, load, cache, or JIT compile the parse table to a function or save it to an assembly.
How would you generate code from the abstract syntax tree?
Well, semantic analysis comes first. You will have a lot of junk from your grammar in the tree that doesn't make sense in terms of semantics. So, you'll create elements for your syntax tree:
Code: Select all
class FullyQualifiedName {
public string[] NameParts;
}
class IfStatement : Statement {
public Expression Condition;
public Statement Body;
public Statement Else;
}
Code: Select all
var tokens = astNode.Flatten();
if (tokens.Peek() == If)
return scanIf(tokens);
...
Well, there you have it. If only I knew how to do this a long time ago, if only I either had the code in front of me, or someone explained it to me in plain english.
Have a nice day!
Josh
EDIT:
A fluent interface is also supported for the grammar. I actually like it better:
Code: Select all
partial class MyGrammar : LALRGrammar<Terminal, Nonterminal> {
public MyGrammar() : base(Unknown, Eof, Init, Start)
{
DefineProduction(Start)
.As(Expression);
DefineProduction(Expression)
.As(Expression, Equals, Factor)
.Or(Expression, DoubleEquals, Factor)
.Or(Factor);
DefineProduction(Factor)
.As(Ident);
}
}