Brendan wrote:Wajideu wrote:For syntactical/semantical terminology, there really is no existing term to describe the conversion from tokens into an AST; so for that I used the term "deducing"; meaning to "trace the course or derivation of." or to "draw as a logical conclusion".
The conversion from tokens to AST is called "parsing". Note that it's silly to split it into separate steps as it's typically just a "foreach(token in list) { decide_where_it_goes; slap_it_into_AST; }".
Deducing is a subset of parsing. There's a sort of ambiguous terminology in the field of compiler design atm, where the parser only does half of the actual parsing. Another example is that assembling and linking are a part of compiling, but the compiler itself does not assemble or link programs. This is why I chose to use a new keyword, "glossing" to describe the stage of syntactical/semantical analysis; and in my original model I used the word "computing" instead of "compiling".
Brendan wrote:Wajideu wrote:There is also no term to describe the process of pre-processing the AST and re-sending it through the lexical and syntactal analyzer; so I used the term "refining"; meaning to "remove impurities from a substance" or to "improve something by making small changes".
"Pre-processing" itself doesn't tell anyone anything (are you removing redundant information that should have never been put in the AST during parsing, or converting it from one format to another because the parser used the wrong format, or..?). Re-sending it through the lexical and syntactical analyser doesn't make sense (why would anyone want to do that, ever?).
Pre-processing generally entails the expansion of pre-processor directives. ie. GNU first passes the .c files to the pre-processor which expands the #define and #include functions and removes any code that may be in excluded via #if/ifdef/ifndef blocks; outputting a single .i file. The process of tokenizing [matching, scanning, evaluating], deducing, and then re-expanding pre-processor directives repeatedly (however many passes the language specification itself specifies) until you are left with only a single file to be processed is what I refer to as the "refining loop".
Brendan wrote:Wajideu wrote:There is also no term to describe the process of building an ASG from an AST. For this, I chose the word "explaining"; meaning to "make an idea, situation, or problem clear to someone by describing it in more detail or revealing relevant facts or ideas.".
It would've been easier to call it "conversion from AST to ASG". I don't see the point of bothering with ASG (e.g. nothing prevents you from detecting common sequences and inserting them back into the AST).
The reason that I came up with specific names for all of the parts of the compilation process was to break it down into modules. Part of the reason why writing a compiler is so difficult is that you have to do so much before you have anything functional to test, and if something is broken it's difficult as hell to fix. By following this roadmap, you easily test each stage as you work, and you can easily swap out certain parts. ie. if you're adding a new middle-end, (say, before you were using RTL and now you want to add CIL for .NET language support), you know that the only code you need to touch is the optimizer and the generator.
Brendan wrote:Converting ASG into IL needs to be a step on its own.
I've been pondering that myself, but hadn't quite come to a decision on it.
Brendan wrote:Wajideu wrote:The entire process of deducing an AST from tokens, refining the AST, an explaining the AST in the form of an ASG I refer to as "glossing"; borrowed from "glossary", or "gloss"; meaning "a translation or explanation of a word or phrase." or to "provide an explanation, interpretation, or paraphrase for (a text, word, etc.).". All 3 stages are executed by a "glossator", borrowed from the term used to describe a person who creates glossaries.
That's nice. Stop doing it. "Glossing" typically means "applying gloss" (most often the application of lip gloss).
It's also typically used like, "I'll gloss over that later".
Brendan wrote:I can't see any sanity checking anywhere (e.g. type checking, etc).
That's part of the explaining stage (semantical analysis) where the ASG is created.
Brendan wrote:Wajideu wrote:"Optimizing" and "generating" are common terminology used to describe the middle-end of a compiler, and "assembling" hasn't changed.
There was no term to describe how the machine code is placed into the section of an object file, so for that, I used the term "formatting", meaning "to arrange or put into a format".
There is no term because no term is needed. It's not a separate step, it's just the code generator storing its generated code while it does its thing.
This is why there is no clear separation between the assembler and the linker; and why the binutils bfd library is such a mess. By having the assembler only output stubs, and having the formatter manage how those stubs should be wrapped into an object file, you can completely hide the object format from the assembler.
Brendan wrote:Wajideu wrote:There was no term specifically used to describe the process of converting an object file into the final program or library, so for that I used "finalizing", since it's technically the final stage. The last 3 terms are self explanatory things that the linker is capable of doing.
For converting an object file into an executable or library; use "linking". It doesn't need to be explained in more detail than that (and doesn't belong in a list of steps for a compiler anyway).
It's yet another ambiguous term. There are several stages to the linking process. And it's broken down this far to make everything more modular.
Brendan wrote:Wajideu wrote:Brendan wrote:How will you be planning to support whole program optimisation?
Optimization is only supposed to be done on the IL (intermediate language). ie. GCC only performs optimization on the RTL, Generic, and GIMPLE IL's it uses.
No peep-hole optimiser after conversion to machine code? Note that what you've got as one "optimiser" step is many steps that form the majority of the compiler's work.
GCC supports whole program optimisation by ramming its GIMPLE into an ELF section and letting the linker do optimisation and code generation. Visual C++ does something similar. In both cases it's mostly a hack caused by retrofitting something that the original tool-chain wasn't designed for.
You're not retrofitting, so why bother generating machine code in the compiler at all? Just use IL as your object file format and let the linker do some of the later optimisation stages and generate the native code.
That's a very good suggestion...
EDIT:
---------------------------------------------
I feel I should add that technically speaking, in binutils the bfd library and the use of linker scripts is an implementation of a formatter.
===============================================================
As suggested by Brendan, I made a few changes to the roadmap. I replaced the explaining stage (semantic analysis) with 4 steps:
- sanitizing - cleaning the AST and performing sanity checks; to "alter (something regarded as less acceptable) so as to make it more palatable."
- mentioning - handling the declaration of symbols, object binding, and assignment. to "refer to something briefly and without going into detail."
- expounding - building an ASG from the AST; to "present and explain (a theory or idea) systematically and in detail."
- delegating - translating the ASG into an IL; to "entrust (a task or responsibility) to another person, typically one who is less senior than oneself."
I'm not quite sure about whether or not I should change the way optimizing is done. I'm more focussed on breaking down what modern compilers do into a more manageable modular state than changing the way that modern compilers work.
Oh, and
Brendan wrote:Note that what you've got as one "optimiser" step is many steps that form the majority of the compiler's work.
This is why in the roadmap, the generating stage loops back to the explaining (now delegating) stage. Optimizing is done through multiple passes that each perform a different type of optimization. This structure also allows the option of delegating one IL to another, like how GCC works in 4 levels (GENERIC, high GIMPLE, low GIMPLE, and RTL)