ACcurrent wrote:It works so... I'm compromising speed as a compiler's speed does not really matter, its more the speed of the executable code that matters. And after all, the 1st version of JS# was written in 20 minutes. Though yes, I do agree on the fact that some times the method I described above is not really going to cut it. In fact I have found that using "ad-hoc" methods makes it much easier to write optimization loops. The code can get messy though and trudging through it can be a pain. Essentially what I mean is that the main challenge is lexing and tokenization. After one tokenizes the language the work becomes easy.
No. I'm sorry, I can support many opinions that I don't share, but this one is just plain
wrong.
Not only that, but you
advocated your utterly braindead solution to someone else, and that I cannot abide.
Think about it this way, Lexing matches regular languages. You can do that with regular expressions. Parsing matches context-free or context-sensitive languages. Many languages (including all C dialects, Java, and anything with the '(Type)x' cast notation) require help from the semantic analysis step to parse.
Once parsed, the AST then has to be semantically analysed for semantic flaws. Unless you don't care about error checking for some reason.
Then, that AST has to be converted into some simpler intermediate representation to flatten structure and prepare for optimisation. Let's assume you don't care about optimisation for the moment as that's more complex.
To prepare for code generation, you need a way of converting your IR into machine instructions for your chosen target. Not only that but you'll need to know all about the target's ABI and instruction set.
A nontrivial compiler is the most difficult program that exists on a computer, kernel included.. It cannot be done in 131 lines of the world's most crufty Python. What you implemented was an (extremely poor) source-to-source translator.