Waji's Standards

b.zaar · Post by **b.zaar** » Tue Oct 14, 2014 10:06 pm

I could be Right out of line this time but...

Wajideu wrote:simply for joking about stereotypical Linux users;

May have been misunderstood... Hacker is an affectionate term and applied generally to all unix.

Wajideu wrote:I find it hilarious that I seem to be have a better understanding of some of your tools than you do

Improving the system is a better goal than disgracing someone for how they use it.

Wajideu wrote:and am actually wasting a lot of my time debating with you about the importance of these tools.

Only by presenting different points of view can we see the short comings...

Wajideu wrote:I'm done arguing about this.

Now you can put that video up.

Wajideu · Post by **Wajideu** » Tue Oct 14, 2014 11:14 pm

Well, since people aren't very enthusiastic about the idea, I'm just going to halt the 'plan' project for now. I can always use autotools for now and just switch my projects over to something else later.

In particular, I'm going to bring back up something I mentioned a few pages back, the computation model. Looking back at it now, I definitely overthought and overcomplicated it by trying to map it to the human thought process, but aside from that, I believe that I was able to breakdown the pipeline for compiling a language enough to where you can see a clear path of development.

Code: Select all

                               .------------.
                               v            |
[SOURCE] -(1)-> matching -> scanning -> evaluating --.
                   ^           |                     |
                   '-----------'                    (2)
                                                     |
.-----> explaining <- refining <- deducing <---------'
|     '     |            |           ^
|     '     v            '-----------'
|     '    [IL]
|     '     |
| (3) '     v
|     ' optimizing
|     '     |
|     '     v
'------ generating
            |
            v
        [ASSEMBLY] <------.
            |             |
            |     (4)     |
            v             |
        assembling -> formatting -> [OBJECT] -> finalizing
                          ^                         |
                          |                         v
                          |          (5)    [PROGRAM/LIBRARY]
                          |                         |
                          |                         |
        inspecting <- converting <- extracting <----'


Indication Key:

   (1) Code->Tokens Tokenizing Step
   (2) Tokens->IL "Glossing" Step

   (3) IL Optimization Loop
   (4) IL->Assembly Loop
   (5) Linking Loop


Development Key:
   matching   = matching lexemes to regular expressions
   scanning   = scanning for lexemes (part of lexical analyzer)
   evaluating = evaluating lexemes as tokens (part of lexical analyzer)
   deducing   = generating abstract syntax tree (AST) from tokens
   refining   = pre-processing the AST
   explaining = generating abstract semantic graph (ASG) and converting to IL
   optimizing = optimizing IL
   generating = generating assembly code from IL
   assembling = assembling assembly code into machine code
   formatting = placing machine code into object files
   finalizing = converting object file into final format
   extracting = extracting sections of object file
   converting = converting sections into something usable
   inspecting = inspecting the content of usable sections

Especially if you cut out the loops.

Code: Select all

[SOURCE] -----> matching -> scanning -> evaluating --.
                                                     |
        explaining <- refining <- deducing <---------'
            |
            v
           [IL]
            |
            v
        optimizing
            |
            v
        generating
            |
            v
        [ASSEMBLY]
            |
            v
        assembling -> formatting -> [OBJECT] -> finalizing
                                                    |
                                                    v
                                            [PROGRAM/LIBRARY]
                                                    |
                                                    |
        inspecting <- converting <- extracting <----'

Brendan · Post by **Brendan** » Tue Oct 14, 2014 11:33 pm

Hi,

Wajideu wrote:In particular, I'm going to bring back up something I mentioned a few pages back, the computation model. Looking back at it now, I definitely overthought and overcomplicated it by trying to map it to the human thought process, but aside from that, I believe that I was able to breakdown the pipeline for compiling a language enough to where you can see a clear path of development.

Ok, now you need to do a quick course in compiler development; even if it's just to get familiar with the terminology everyone else uses.

Cheers,

Brendan

Wajideu · Post by **Wajideu** » Tue Oct 14, 2014 11:36 pm

@Brendan, I already know about the terminology we use. That's why I specifically avoided using it to lessen confusion. I've read up a lot on the subject, and nothing that I've come across breaks it down as clearly as the computation model I presented.

Just re-iterating something you can pick up in any book would be pointless.

EDIT:
You reminded me that I forgot this

Code: Select all

compiling
+- parsing
|  +- tokenizing
|  |   +- matching
|  |   +- scanning
|  |   '- evaluating
|  '- glossing
|      +- deducing
|      +- refining
|      '- explaining
+- optimizing
+- generating
+- assembling
+- formatting
'- linking
    +- finalizing
    +- extracting
    +- converting
    '- inspecting

FallenAvatar · Post by **FallenAvatar** » Tue Oct 14, 2014 11:45 pm

Wajideu wrote:

Code: Select all

compiling
+- parsing
|  +- tokenizing
|  |   +- scanning
|  |   +- matching
|  |   '- evaluating
|  '- glossing
|      +- deducing
|      +- refining
|      '- explaining
+- optimizing
+- generating
+- assembling
+- formatting
'- linking
    +- finalizing
    +- extracting
    +- converting
    '- inspecting

Last time I checked, a compiler != assembler nor a linker. Maybe you should go back to those books you read and pay more attention this time...

- Monk

P.S. The point of using terminology that is already accepted is that people can communicate more effectively. Not using the accepted terminology makes you sound uneducated/ignorant/lazy/incompetent.

Wajideu · Post by **Wajideu** » Tue Oct 14, 2014 11:50 pm

tjmonk15 wrote:Last time I checked, a compiler != assembler nor a linker. Maybe you should go back to those books you read and pay more attention this time...

No where did I say that. It just shows that assembling and linking is part of the compiling process; not that the compiler is an assembler and linker.

Most of the terms are self-explanatory, with the exception of "glossing". That term I borrowed from "glossary"; which nicely fits the definition of an ASG (abstract semantic graph). A short name for "Token to abstract syntax tree to abstract semantic graph-er' would be a "glossator"; a term which is used to describe a person who creates glossaries.

Brendan · Post by **Brendan** » Tue Oct 14, 2014 11:54 pm

Hi,

Wajideu wrote:@Brendan, I already know about the terminology we use. That's why I specifically avoided using it to lessen confusion. I've read up a lot on the subject, and nothing that I've come across breaks it down as clearly as the computation model I presented.

Every term you use that isn't known by anybody needs to be defined by you; otherwise it's just random words without meaning.

Code: Select all

compiling
+- tokenizing            
|   +- scanning
|   +- matching
|   '- evaluating     ;What is this?
+-parsing
|      +- deducing    ;What is this?
|      +- refining    ;What is this?
|      '- explaining  ;What is this?
+- optimizing
+- generating
+- assembling         ;Should probably be "disassembling" (e.g. code generator generates machine code, and
                                   this step converts the generated machine code into human readable text)
+- formatting         ;Formatting text? Formatting the object file?
'- linking
    +- finalizing     ;What is this?
    +- extracting     ;What is this?
    +- converting     ;What is this?
    '- inspecting     ;What is this?

How will you be planning to support whole program optimisation?

Cheers,

Brendan

Wajideu · Post by **Wajideu** » Wed Oct 15, 2014 12:13 am

Brendan wrote:Hi,

Wajideu wrote:@Brendan, I already know about the terminology we use. That's why I specifically avoided using it to lessen confusion. I've read up a lot on the subject, and nothing that I've come across breaks it down as clearly as the computation model I presented.
Every term you use that isn't known by anybody needs to be defined by you; otherwise it's just random words without

I explained what all those are in the development key of my model:

Code: Select all

Development Key:
   matching   = matching lexemes to regular expressions
   scanning   = scanning for lexemes (part of lexical analyzer)
   evaluating = evaluating lexemes as tokens (part of lexical analyzer)
   deducing   = generating abstract syntax tree (AST) from tokens
   refining   = pre-processing the AST
   explaining = generating abstract semantic graph (ASG) and converting to IL
   optimizing = optimizing IL
   generating = generating assembly code from IL
   assembling = assembling assembly code into machine code
   formatting = placing machine code into object files
   finalizing = converting object file into final format
   extracting = extracting sections of object file
   converting = converting sections into something usable
   inspecting = inspecting the content of usable sections

Just for the record, this terminology actually does match up with existing teachings. Lexical analysis uses the same terminology, "tokenizing", "matching", "scanning", and "evaluating".

For syntactical/semantical terminology, there really is no existing term to describe the conversion from tokens into an AST; so for that I used the term "deducing"; meaning to "trace the course or derivation of." or to "draw as a logical conclusion".

There is also no term to describe the process of pre-processing the AST and re-sending it through the lexical and syntactal analyzer; so I used the term "refining"; meaning to "remove impurities from a substance" or to "improve something by making small changes".

There is also no term to describe the process of building an ASG from an AST. For this, I chose the word "explaining"; meaning to "make an idea, situation, or problem clear to someone by describing it in more detail or revealing relevant facts or ideas.".

The entire process of deducing an AST from tokens, refining the AST, an explaining the AST in the form of an ASG I refer to as "glossing"; borrowed from "glossary", or "gloss"; meaning "a translation or explanation of a word or phrase." or to "provide an explanation, interpretation, or paraphrase for (a text, word, etc.).". All 3 stages are executed by a "glossator", borrowed from the term used to describe a person who creates glossaries.

"Optimizing" and "generating" are common terminology used to describe the middle-end of a compiler, and "assembling" hasn't changed.

There was no term to describe how the machine code is placed into the section of an object file, so for that, I used the term "formatting", meaning "to arrange or put into a format".

There was no term specifically used to describe the process of converting an object file into the final program or library, so for that I used "finalizing", since it's technically the final stage. The last 3 terms are self explanatory things that the linker is capable of doing.

Brendan wrote:How will you be planning to support whole program optimisation?

Optimization is only supposed to be done on the IL (intermediate language). ie. GCC only performs optimization on the RTL, Generic, and GIMPLE IL's it uses.

EDIT:
Something you may not have noticed is that the generating stage loops back to the explaining stage in the compiling process. In other words, this insinuates the possibility that there could be several intermediate languages that the code takes the form of before finally becoming assembly.

Brendan · Post by **Brendan** » Wed Oct 15, 2014 1:18 am

Hi,

Wajideu wrote:I explained what all those are in development key of my model:

Code: Select all

Development Key:
   matching   = matching lexemes to regular expressions
   scanning   = scanning for lexemes (part of lexical analyzer)
   evaluating = evaluating lexemes as tokens (part of lexical analyzer)
   deducing   = generating abstract syntax tree (AST) from tokens
   refining   = pre-processing the AST
   explaining = generating abstract semantic graph (ASG) and converting to IL
   optimizing = optimizing IL
   generating = generating assembly code from IL
   assembling = assembling assembly code into machine code
   formatting = placing machine code into object files
   finalizing = converting object file into final format
   extracting = extracting sections of object file
   converting = converting sections into something usable
   inspecting = inspecting the content of usable sections

Ok, that makes more sense.

Wajideu wrote:For syntactical/semantical terminology, there really is no existing term to describe the conversion from tokens into an AST; so for that I used the term "deducing"; meaning to "trace the course or derivation of." or to "draw as a logical conclusion".

The conversion from tokens to AST is called "parsing". Note that it's silly to split it into separate steps as it's typically just a "foreach(token in list) { decide_where_it_goes; slap_it_into_AST; }".

Wajideu wrote:There is also no term to describe the process of pre-processing the AST and re-sending it through the lexical and syntactal analyzer; so I used the term "refining"; meaning to "remove impurities from a substance" or to "improve something by making small changes".

"Pre-processing" itself doesn't tell anyone anything (are you removing redundant information that should have never been put in the AST during parsing, or converting it from one format to another because the parser used the wrong format, or..?). Re-sending it through the lexical and syntactical analyser doesn't make sense (why would anyone want to do that, ever?).

Wajideu wrote:There is also no term to describe the process of building an ASG from an AST. For this, I chose the word "explaining"; meaning to "make an idea, situation, or problem clear to someone by describing it in more detail or revealing relevant facts or ideas.".

It would've been easier to call it "conversion from AST to ASG". I don't see the point of bothering with ASG (e.g. nothing prevents you from detecting common sequences and inserting them back into the AST).

Converting ASG into IL needs to be a step on its own.

Wajideu wrote:The entire process of deducing an AST from tokens, refining the AST, an explaining the AST in the form of an ASG I refer to as "glossing"; borrowed from "glossary", or "gloss"; meaning "a translation or explanation of a word or phrase." or to "provide an explanation, interpretation, or paraphrase for (a text, word, etc.).". All 3 stages are executed by a "glossator", borrowed from the term used to describe a person who creates glossaries.

That's nice. Stop doing it. "Glossing" typically means "applying gloss" (most often the application of lip gloss).

I can't see any sanity checking anywhere (e.g. type checking, etc).

Wajideu wrote:"Optimizing" and "generating" are common terminology used to describe the middle-end of a compiler, and "assembling" hasn't changed.

There was no term to describe how the machine code is placed into the section of an object file, so for that, I used the term "formatting", meaning "to arrange or put into a format".

There is no term because no term is needed. It's not a separate step, it's just the code generator storing its generated code while it does its thing.

Wajideu wrote:There was no term specifically used to describe the process of converting an object file into the final program or library, so for that I used "finalizing", since it's technically the final stage. The last 3 terms are self explanatory things that the linker is capable of doing.

For converting an object file into an executable or library; use "linking". It doesn't need to be explained in more detail than that (and doesn't belong in a list of steps for a compiler anyway).

Wajideu wrote:
Brendan wrote:How will you be planning to support whole program optimisation?
Optimization is only supposed to be done on the IL (intermediate language). ie. GCC only performs optimization on the RTL, Generic, and GIMPLE IL's it uses.

No peep-hole optimiser after conversion to machine code? Note that what you've got as one "optimiser" step is many steps that form the majority of the compiler's work.

GCC supports whole program optimisation by ramming its GIMPLE into an ELF section and letting the linker do optimisation and code generation. Visual C++ does something similar. In both cases it's mostly a hack caused by retrofitting something that the original tool-chain wasn't designed for.

You're not retrofitting, so why bother generating machine code in the compiler at all? Just use IL as your object file format and let the linker do some of the later optimisation stages and generate the native code.

Cheers,

Brendan

Wajideu · Post by **Wajideu** » Wed Oct 15, 2014 1:51 am

Brendan wrote:
Wajideu wrote:For syntactical/semantical terminology, there really is no existing term to describe the conversion from tokens into an AST; so for that I used the term "deducing"; meaning to "trace the course or derivation of." or to "draw as a logical conclusion".
The conversion from tokens to AST is called "parsing". Note that it's silly to split it into separate steps as it's typically just a "foreach(token in list) { decide_where_it_goes; slap_it_into_AST; }".

Deducing is a subset of parsing. There's a sort of ambiguous terminology in the field of compiler design atm, where the parser only does half of the actual parsing. Another example is that assembling and linking are a part of compiling, but the compiler itself does not assemble or link programs. This is why I chose to use a new keyword, "glossing" to describe the stage of syntactical/semantical analysis; and in my original model I used the word "computing" instead of "compiling".

Brendan wrote:
Wajideu wrote:There is also no term to describe the process of pre-processing the AST and re-sending it through the lexical and syntactal analyzer; so I used the term "refining"; meaning to "remove impurities from a substance" or to "improve something by making small changes".
"Pre-processing" itself doesn't tell anyone anything (are you removing redundant information that should have never been put in the AST during parsing, or converting it from one format to another because the parser used the wrong format, or..?). Re-sending it through the lexical and syntactical analyser doesn't make sense (why would anyone want to do that, ever?).

Pre-processing generally entails the expansion of pre-processor directives. ie. GNU first passes the .c files to the pre-processor which expands the #define and #include functions and removes any code that may be in excluded via #if/ifdef/ifndef blocks; outputting a single .i file. The process of tokenizing [matching, scanning, evaluating], deducing, and then re-expanding pre-processor directives repeatedly (however many passes the language specification itself specifies) until you are left with only a single file to be processed is what I refer to as the "refining loop".

Brendan wrote:
Wajideu wrote:There is also no term to describe the process of building an ASG from an AST. For this, I chose the word "explaining"; meaning to "make an idea, situation, or problem clear to someone by describing it in more detail or revealing relevant facts or ideas.".
It would've been easier to call it "conversion from AST to ASG". I don't see the point of bothering with ASG (e.g. nothing prevents you from detecting common sequences and inserting them back into the AST).

The reason that I came up with specific names for all of the parts of the compilation process was to break it down into modules. Part of the reason why writing a compiler is so difficult is that you have to do so much before you have anything functional to test, and if something is broken it's difficult as hell to fix. By following this roadmap, you easily test each stage as you work, and you can easily swap out certain parts. ie. if you're adding a new middle-end, (say, before you were using RTL and now you want to add CIL for .NET language support), you know that the only code you need to touch is the optimizer and the generator.

Brendan wrote:Converting ASG into IL needs to be a step on its own.

I've been pondering that myself, but hadn't quite come to a decision on it.

Brendan wrote:
Wajideu wrote:The entire process of deducing an AST from tokens, refining the AST, an explaining the AST in the form of an ASG I refer to as "glossing"; borrowed from "glossary", or "gloss"; meaning "a translation or explanation of a word or phrase." or to "provide an explanation, interpretation, or paraphrase for (a text, word, etc.).". All 3 stages are executed by a "glossator", borrowed from the term used to describe a person who creates glossaries.
That's nice. Stop doing it. "Glossing" typically means "applying gloss" (most often the application of lip gloss).

It's also typically used like, "I'll gloss over that later".

Brendan wrote:I can't see any sanity checking anywhere (e.g. type checking, etc).

That's part of the explaining stage (semantical analysis) where the ASG is created.

Brendan wrote:
Wajideu wrote:"Optimizing" and "generating" are common terminology used to describe the middle-end of a compiler, and "assembling" hasn't changed.

There was no term to describe how the machine code is placed into the section of an object file, so for that, I used the term "formatting", meaning "to arrange or put into a format".
There is no term because no term is needed. It's not a separate step, it's just the code generator storing its generated code while it does its thing.

This is why there is no clear separation between the assembler and the linker; and why the binutils bfd library is such a mess. By having the assembler only output stubs, and having the formatter manage how those stubs should be wrapped into an object file, you can completely hide the object format from the assembler.

Brendan wrote:
Wajideu wrote:There was no term specifically used to describe the process of converting an object file into the final program or library, so for that I used "finalizing", since it's technically the final stage. The last 3 terms are self explanatory things that the linker is capable of doing.
For converting an object file into an executable or library; use "linking". It doesn't need to be explained in more detail than that (and doesn't belong in a list of steps for a compiler anyway).

It's yet another ambiguous term. There are several stages to the linking process. And it's broken down this far to make everything more modular.

Brendan wrote:
Wajideu wrote:
Brendan wrote:How will you be planning to support whole program optimisation?
Optimization is only supposed to be done on the IL (intermediate language). ie. GCC only performs optimization on the RTL, Generic, and GIMPLE IL's it uses.
No peep-hole optimiser after conversion to machine code? Note that what you've got as one "optimiser" step is many steps that form the majority of the compiler's work.

GCC supports whole program optimisation by ramming its GIMPLE into an ELF section and letting the linker do optimisation and code generation. Visual C++ does something similar. In both cases it's mostly a hack caused by retrofitting something that the original tool-chain wasn't designed for.

You're not retrofitting, so why bother generating machine code in the compiler at all? Just use IL as your object file format and let the linker do some of the later optimisation stages and generate the native code.

That's a very good suggestion...

EDIT:
---------------------------------------------
I feel I should add that technically speaking, in binutils the bfd library and the use of linker scripts is an implementation of a formatter.

===============================================================

As suggested by Brendan, I made a few changes to the roadmap. I replaced the explaining stage (semantic analysis) with 4 steps:

sanitizing - cleaning the AST and performing sanity checks; to "alter (something regarded as less acceptable) so as to make it more palatable."
mentioning - handling the declaration of symbols, object binding, and assignment. to "refer to something briefly and without going into detail."
expounding - building an ASG from the AST; to "present and explain (a theory or idea) systematically and in detail."
delegating - translating the ASG into an IL; to "entrust (a task or responsibility) to another person, typically one who is less senior than oneself."

I'm not quite sure about whether or not I should change the way optimizing is done. I'm more focussed on breaking down what modern compilers do into a more manageable modular state than changing the way that modern compilers work.

Oh, and

Brendan wrote:Note that what you've got as one "optimiser" step is many steps that form the majority of the compiler's work.

This is why in the roadmap, the generating stage loops back to the explaining (now delegating) stage. Optimizing is done through multiple passes that each perform a different type of optimization. This structure also allows the option of delegating one IL to another, like how GCC works in 4 levels (GENERIC, high GIMPLE, low GIMPLE, and RTL)

Brendan · Post by **Brendan** » Wed Oct 15, 2014 8:23 am

Hi,

Wajideu wrote:There's a sort of ambiguous terminology in the field of compiler design atm, where the parser only does half of the actual parsing.

If there is, I'm not aware of it and can't think of a reason for it. Most just shove their grammar at a parser generator like yacc.

Wajideu wrote:Another example is that assembling and linking are a part of compiling, but the compiler itself does not assemble or link programs. This is why I chose to use a new keyword, "glossing" to describe the stage of syntactical/semantical analysis; and in my original model I used the word "computing" instead of "compiling".

Assembling (converting from human readable assembly language to machine executable machine code) is something that should only be done by a modern compiler for inline assembly. Note that I do mean "by the compiler" (and I do not mean "by an external assembler like GAS" or the linker or anything else). If the compiler doesn't understand the inline assembly it can't (e.g.) auto-detect whatever is clobbered, and can't (e.g.) fix things like references to local variables embedded in the assembly, and can't (e.g.) handle constant expressions like "lea eax,[ebx*myPureFunction(123)]". GCC fails for all of this.

I'd convert to AST then do all the sanity checking on the AST. I'd also provide a "check but don't bother compiling" command line option where the compiler stops after doing sanity checks (e.g. so an IDE can re-use the compiler's front end to detect and highlight errors while you type, like a spell checker).

Wajideu wrote:
Brendan wrote:
Wajideu wrote:There is also no term to describe the process of pre-processing the AST and re-sending it through the lexical and syntactal analyzer; so I used the term "refining"; meaning to "remove impurities from a substance" or to "improve something by making small changes".
"Pre-processing" itself doesn't tell anyone anything (are you removing redundant information that should have never been put in the AST during parsing, or converting it from one format to another because the parser used the wrong format, or..?). Re-sending it through the lexical and syntactical analyser doesn't make sense (why would anyone want to do that, ever?).
Pre-processing generally entails the expansion of pre-processor directives. ie. GNU first passes the .c files to the pre-processor which expands the #define and #include functions and removes any code that may be in excluded via #if/ifdef/ifndef blocks; outputting a single .i file. The process of tokenizing [matching, scanning, evaluating], deducing, and then re-expanding pre-processor directives repeatedly (however many passes the language specification itself specifies) until you are left with only a single file to be processed is what I refer to as the "refining loop".

You mean an actual pre-processor (the sort of thing old languages used because their optimisers were bad, before everyone realised they aren't a good idea)? If yes; then it belongs at the start (before tokenising).

Wajideu wrote:
Brendan wrote:I can't see any sanity checking anywhere (e.g. type checking, etc).
That's part of the explaining stage (semantical analysis) where the ASG is created.

Then it needs to be its own stage and not part of the conversion from AST to ASG.

Wajideu wrote:As suggested by Brendan, I made a few changes to the roadmap. I replaced the explaining stage (semantic analysis) with 4 steps:
sanitizing - cleaning the AST and performing sanity checks; to "alter (something regarded as less acceptable) so as to make it more palatable."

Sanity checking should not modify anything at all (it only checks and generates error messages for problems that aren't detected by the parser). Mostly you want to detect as many errors as possible while it's easy to generate useful error messages (e.g. before the optimisers have mangled the daylights out of it). You'll also find that the AST makes it easy to get very descriptive error messages, as you can walk the AST from the problem back up - e.g. "Type mismatch on line 12 (the problem) in 'while' loop (the problem's parent in the tree) in function 'foo' (the parent's parent in the tree)".

Wajideu wrote:
mentioning - handling the declaration of symbols, object binding, and assignment. to "refer to something briefly and without going into detail."

expounding - building an ASG from the AST; to "present and explain (a theory or idea) systematically and in detail."

delegating - translating the ASG into an IL; to "entrust (a task or responsibility) to another person, typically one who is less senior than oneself."

Look; these words you keep making up are annoying. Two minutes after reading whatever you say they're supposed to represent I do not remember what any of them are supposed to represent, and they do nothing more than make things unnecessarily difficult. There is no practical difference between (e.g.) "qoiwuearadsnf" and "expounding", or "yorky-borky-dorky" and "delegating". Please just use things like "AST to ASG converter" or "ASG to IL translator" instead of marketing gibberish words.

Wajideu wrote:I'm not quite sure about whether or not I should change the way optimizing is done. I'm more focussed on breaking down what modern compilers do into a more manageable modular state than changing the way that modern compilers work.

Define one IR (not many, like GCC). Convert the ASG to this IR, then convert this IR into machine code. All of the optimisation stages would only modify the code and wouldn't transform it from one representation to another. This also means that you don't need to implement any optimisation stages until after you've got the compiler generating correct native code; and it's easy to enable/disable or re-order different optimisation stages whenever you like.

Wajideu wrote:
Brendan wrote:Note that what you've got as one "optimiser" step is many steps that form the majority of the compiler's work.
This is why in the roadmap, the generating stage loops back to the explaining (now delegating) stage. Optimizing is done through multiple passes that each perform a different type of optimization. This structure also allows the option of delegating one IL to another, like how GCC works in 4 levels (GENERIC, high GIMPLE, low GIMPLE, and RTL)

Some optimisation passes need to be done repeatedly, some need to be done in specific places, some only need to be done once and never again. None of them require looping back to the qoiwuearadsnf stage or the yorky-borky-dorky stage because none of the optimisers will ever want to convert the IR back into AST or ASG.

Cheers,

Brendan

Wajideu · Post by **Wajideu** » Wed Oct 15, 2014 10:55 am

Brendan wrote:You mean an actual pre-processor (the sort of thing old languages used because their optimisers were bad, before everyone realised they aren't a good idea)? If yes; then it belongs at the start (before tokenising).

It' doesn't really matter much as long as it comes before semantic checking. I figured it'd be easier to handle if you tokenized it first, so imho the best approach would be to loop it from the refining stage to the scanning stage. (which would also allow the preprocessing to be expanded multiple times). For the record, I actually agree with you on the preprocessing being outdated thing

There's a language design that I've been working on for several months that's based on C and C# which replaces the use of preprocessor directives with attributes and bridges the gap between low-level functional programming and object-oriented programming through the use of prototype and behaviour declarations. But, you can't just ignore the fact that there are languages that require preprocessing just because you don't like it...

Brendan wrote:You'll also find that the AST makes it easy to get very descriptive error messages, as you can walk the AST from the problem back up - e.g. "Type mismatch on line 12 (the problem) in 'while' loop (the problem's parent in the tree) in function 'foo' (the parent's parent in the tree)".

That's a good bit of information to know. I've been studying a bit on the GCC's tree structure lately as well.

Brendan wrote:Look; these words you keep making up are annoying. Two minutes after reading whatever you say they're supposed to represent I do not remember what any of them are supposed to represent, and they do nothing more than make things unnecessarily difficult. There is no practical difference between (e.g.) "qoiwuearadsnf" and "expounding", or "yorky-borky-dorky" and "delegating". Please just use things like "AST to ASG converter" or "ASG to IL translator" instead of marketing gibberish words.

The entire point of the roadmap is to modularize the entire compilation process. What are you going to call the "AST to ASG Converter" in your project? "atac"? And "ASG to IL translator" would be "atit"? Going the 'nix route of abbreviations of abbreviations? We need words that describe these things things so we can communicate ideas about them more clearly. I didn't just randomly make up words, I spent a lot of time looking up various terms that accurately describe what process is being done by that particular module. I want something that anyone could somewhat understand what it does just by hearing the name. A compiler compiles, a parser parses, an expounder expounds, a delegator delegates, etc. Short, sweet, and to the point.

Brendan · Post by **Brendan** » Wed Oct 15, 2014 5:54 pm

Hi,

Wajideu wrote:There's a language design that I've been working on for several months that's based on C and C# which replaces the use of preprocessor directives with attributes and bridges the gap between low-level functional programming and object-oriented programming through the use of prototype and behaviour declarations. But, you can't just ignore the fact that there are languages that require preprocessing just because you don't like it...

Yes you can. You can say "any language can be used to generate the input for any other language"; which may mean (e.g.) using CPP or PHP or Perl or Javascript to generate the input for a C or C++ or Pascal compiler; and may also mean (e.g.) using C or C++ or Pascal to create a program that generates the input for CPP or PHP or Perl or Javascript.

Wajideu wrote:
Brendan wrote:Look; these words you keep making up are annoying. Two minutes after reading whatever you say they're supposed to represent I do not remember what any of them are supposed to represent, and they do nothing more than make things unnecessarily difficult. There is no practical difference between (e.g.) "qoiwuearadsnf" and "expounding", or "yorky-borky-dorky" and "delegating". Please just use things like "AST to ASG converter" or "ASG to IL translator" instead of marketing gibberish words.
The entire point of the roadmap is to modularize the entire compilation process. What are you going to call the "AST to ASG Converter" in your project? "atac"? And "ASG to IL translator" would be "atit"? Going the 'nix route of abbreviations of abbreviations?

I'm not going to have one, but if I did I'd call it the "AST to ASG Converter".

Wajideu wrote:We need words that describe these things things so we can communicate ideas about them more clearly. I didn't just randomly make up words, I spent a lot of time looking up various terms that accurately describe what process is being done by that particular module. I want something that anyone could somewhat understand what it does just by hearing the name. A compiler compiles, a parser parses, an expounder expounds, a delegator delegates, etc. Short, sweet, and to the point.

"AST to ASG Converter" are words that describe these things so we can communicate ideas about them more clearly.

"Compiler" and "parser" are well known, so everyone already has a good idea of what they do without looking them up. Expounder and delegator are not. I see "expounder" or "delegator" and don't know/remember what it actually means in practical terms (what is it delegating? What does it delegate to?), and then I have to search back through everything to find your own personal definition. If you called it "qoiwuearadsnf" I still wouldn't know what it means in practical terms and would still need to look back through everything to find your personal definition; so "qoiwuearadsnf" is not worse that "expounder" (and "qoiwuearadsnf" is probably better because it reduces the chance of me incorrectly guessing what I think it might mean). Basically, regardless of how carefully you think you've selected the words, they're worse than a sequence of characters I created by blindly mashing my hands onto the keyboard.

Of course it's actually worse than that. I see you using a word that has a well known meaning (like "parser"), and because I know you're making up your own definitions for other things I don't know if you're using the well known meaning for "parser" or if you've also made up your own definition for that too.

After living with your terminology for a while I would get used to it and would start remembering all the funky buzzwords; but I won't be working on your project or talking about it long enough for that to happen. The buzzwords aren't used in any of my projects, aren't used in any of the other 10 different projects I'll probably come in contact with this week, and won't be used in the 500 or so projects I'll deal with this year. There's no benefit I get from learning these buzzwords that justifies the hassle of bothering to learn them.

Cheers,

Brendan

Wajideu · Post by **Wajideu** » Wed Oct 15, 2014 10:58 pm

Brendan wrote:Yes you can. You can say "any language can be used to generate the input for any other language"; which may mean (e.g.) using CPP or PHP or Perl or Javascript to generate the input for a C or C++ or Pascal compiler; and may also mean (e.g.) using C or C++ or Pascal to create a program that generates the input for CPP or PHP or Perl or Javascript.

That isn't a solution. You're just not compiling the language period, someone else is.

Brendan wrote:
Wajideu wrote:The entire point of the roadmap is to modularize the entire compilation process. What are you going to call the "AST to ASG Converter" in your project? "atac"? And "ASG to IL translator" would be "atit"? Going the 'nix route of abbreviations of abbreviations?
I'm not going to have one, but if I did I'd call it the "AST to ASG Converter".

Really. I'm sure those spaces in the filename will play nicely with all build environments. And I'm sure people using Ext filesystems aren't going to have any headaches due to the use of both uppercase and lowercase letters. Oh, and lets not forget the fact that Fat16 filesystems are going to truncate that to something like "AST TO~1", so that'll be eventful. Plus you're already making assertions that the person knows what both an AST and ASG is.

Brendan wrote:
Wajideu wrote:We need words that describe these things things so we can communicate ideas about them more clearly. I didn't just randomly make up words, I spent a lot of time looking up various terms that accurately describe what process is being done by that particular module. I want something that anyone could somewhat understand what it does just by hearing the name. A compiler compiles, a parser parses, an expounder expounds, a delegator delegates, etc. Short, sweet, and to the point.
"AST to ASG Converter" are words that describe these things so we can communicate ideas about them more clearly.

Not when the person has no idea what an AST or ASG is. By that logic, we should be calling laptops "computing circuit powered by a binary logic central processing unit with actuated magnetic disk data storage and liquid crystal display controlled by networking and interfacing peripherals with digital sound processing; inside a 24"x11"x1.5" plastic box".

We need words, not descriptions.

Brendan wrote:"Compiler" and "parser" are well known, so everyone already has a good idea of what they do without looking them up. Expounder and delegator are not. I see "expounder" or "delegator" and don't know/remember what it actually means in practical terms (what is it delegating? What does it delegate to?), and then I have to search back through everything to find your own personal definition. If you called it "qoiwuearadsnf" I still wouldn't know what it means in practical terms and would still need to look back through everything to find your personal definition; so "qoiwuearadsnf" is not worse that "expounder" (and "qoiwuearadsnf" is probably better because it reduces the chance of me incorrectly guessing what I think it might mean). Basically, regardless of how carefully you think you've selected the words, they're worse than a sequence of characters I created by blindly mashing my hands onto the keyboard.

Compiler and parser are well known because they are used often. No one ever says, "I compiled a stack a papers for you", but that's precisely the meaning that it was chosen for. The more people use it, the more they become familiar with it. I gave you the definition of expounding and delegating. The word "expound", as in, "expounding upon" is all the time in education. "Delegate" is also used a lot in politics as a person who is sent to do something in place of another.

The fact that you have a poor vocabulary, evidenced by your retarded use of the word "cluster-bork", and "yorky-borky-dorky" to describe the work of anyone else aside from yourself, isn't mine or anyone elses problem.

Brendan wrote:Of course it's actually worse than that. I see you using a word that has a well known meaning (like "parser"), and because I know you're making up your own definitions for other things I don't know if you're using the well known meaning for "parser" or if you've also made up your own definition for that too.

I haven't changed the meaning of any of the terms already in use. I've only assigned words to things which didn't already have a name. I gave you the exact definition from the dictionary right beside each word to show why the word was chosen.

Brendan · Post by **Brendan** » Thu Oct 16, 2014 12:33 am

Hi,

Wajideu wrote:
Brendan wrote:Yes you can. You can say "any language can be used to generate the input for any other language"; which may mean (e.g.) using CPP or PHP or Perl or Javascript to generate the input for a C or C++ or Pascal compiler; and may also mean (e.g.) using C or C++ or Pascal to create a program that generates the input for CPP or PHP or Perl or Javascript.
That isn't a solution. You're just not compiling the language period, someone else is.

So what? Does your OS have a single application that does everything from linking to spreadsheets to 3D games (and is there anything terribly wrong with "do one thing and do it well")?

Wajideu wrote:
Brendan wrote:
Wajideu wrote:The entire point of the roadmap is to modularize the entire compilation process. What are you going to call the "AST to ASG Converter" in your project? "atac"? And "ASG to IL translator" would be "atit"? Going the 'nix route of abbreviations of abbreviations?
I'm not going to have one, but if I did I'd call it the "AST to ASG Converter".
Really. I'm sure those spaces in the filename will play nicely with all build environments. And I'm sure people using Ext filesystems aren't going to have any headaches due to the use of both uppercase and lowercase letters. Oh, and lets not forget the fact that Fat16 filesystems are going to truncate that to something like "AST TO~1", so that'll be eventful. Plus you're already making assertions that the person knows what both an AST and ASG is.

I'm 100% sure that:

your comment is irrelevant, mostly because we were never talking about file names to begin with
it doesn't take much intelligence to (e.g.) replace spaces with underscores
you'll find that the majority of your stages end up being many files and not one (e.g. my parser is 10 files)
I don't care about half-baked build systems
I don't care about half-baked file systems like FAT

Wajideu wrote:
Brendan wrote:
Wajideu wrote:We need words that describe these things things so we can communicate ideas about them more clearly. I didn't just randomly make up words, I spent a lot of time looking up various terms that accurately describe what process is being done by that particular module. I want something that anyone could somewhat understand what it does just by hearing the name. A compiler compiles, a parser parses, an expounder expounds, a delegator delegates, etc. Short, sweet, and to the point.
"AST to ASG Converter" are words that describe these things so we can communicate ideas about them more clearly.
Not when the person has no idea what an AST or ASG is. By that logic, we should be calling laptops "computing circuit powered by a binary logic central processing unit with actuated magnetic disk data storage and liquid crystal display controlled by networking and interfacing peripherals with digital sound processing; inside a 24"x11"x1.5" plastic box".

We need words, not descriptions.

Most people familiar with compiler development know what AST and ASG are. Anyone that doesn't can google it and find an adequate description on Wikipedia. Childish buzzwords like "expounder" or "delegator" will not be familiar to anyone and can not be googled.

Wajideu wrote:
Brendan wrote:"Compiler" and "parser" are well known, so everyone already has a good idea of what they do without looking them up. Expounder and delegator are not. I see "expounder" or "delegator" and don't know/remember what it actually means in practical terms (what is it delegating? What does it delegate to?), and then I have to search back through everything to find your own personal definition. If you called it "qoiwuearadsnf" I still wouldn't know what it means in practical terms and would still need to look back through everything to find your personal definition; so "qoiwuearadsnf" is not worse that "expounder" (and "qoiwuearadsnf" is probably better because it reduces the chance of me incorrectly guessing what I think it might mean). Basically, regardless of how carefully you think you've selected the words, they're worse than a sequence of characters I created by blindly mashing my hands onto the keyboard.
Compiler and parser are well known because they are used often. No one ever says, "I compiled a stack a papers for you", but that's precisely the meaning that it was chosen for. The more people use it, the more they become familiar with it. I gave you the definition of expounding and delegating. The word "expound", as in, "expounding upon" is all the time in education. "Delegate" is also used a lot in politics as a person who is sent to do something in place of another.

Words like compiler and parser have become standard (in the context of compilers, not politics or whatever) because many people have used them for many years to mean the same thing. How many people (other than you) have ever used "expounder" or "delegator" within a compiler development course (or tutorial or book or...), or in a compiler project, on in a research paper about some aspect of compilers, or anywhere else?

Essentially, the fact that these words to have a well known meaning in plain English does not mean these words have a specific meaning in the context of compilers; and without a specific meaning in the context of compilers they are useless.

Wajideu wrote:The fact that you have a poor vocabulary, evidenced by your retarded use of the word "cluster-bork", and "yorky-borky-dorky" to describe the work of anyone else aside from yourself, isn't mine or anyone elses problem.

My vocabulary is fine. I use "cluster-bork" when I really mean "cluster-f*ck" when something (e.g. auto-conf) is so disgusting that it deserves this description. The "yorky-borky-dorky" I made up in an attempt to find something that sounds as silly as the words you've been misappropriating.

Cheers,

Brendan

OSDev.org

Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards

Re: Waji's Standards