How to make a compiler

Programming, for all ages and all languages.
Post Reply
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: How to make a compiler

Post by Schol-R-LEA »

Actually, in terms of the time spent, the dominant part of most compilers is the lexical analyzer - which in most cases, amounts to a Deterministic Finite State Automaton, whether formally constructed or as an ad-hoc equivalent. Either way, I would have a lot of trouble seeing that as a graph coloring problem.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: How to make a compiler

Post by alexfru »

mariuszp wrote:It's literally just lots of graph problems; graph coloring, vertex reachability, etc.
Only if your compiler is an optimizing one.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: How to make a compiler

Post by Korona »

Schol-R-LEA wrote:Actually, in terms of the time spent, the dominant part of most compilers is the lexical analyzer - which in most cases, amounts to a Deterministic Finite State Automaton, whether formally constructed or as an ad-hoc equivalent. Either way, I would have a lot of trouble seeing that as a graph coloring problem.
That's only true if you either use an existing backend or don't care about optimization. Lexical analysis is solved and there are so many algorithms to do it easily: recursive descend, PEG, LR parsers generators, or even Earley parsers that can handle any CFG.
mariuszp wrote:It's literally just lots of graph problems; graph coloring, vertex reachability, etc. Prove me wrong.
A LLVM-style Greedy RA is superior to graph coloring RAs in practice, prove me wrong. Graph coloring just cannot easily handle non-homogeneuos register files (and of course, it's also quite expensive). When I wrote an implementation (only a bit over 1k SLOC, can do live range splitting but not register spilling right now) inspired by the LLVM one, I was surprised (i) how natural it is, and (ii) how good the results are if you tune it a bit (for smallish functions with only a few basic blocks/variables, the result is often optimal). It's incredibly easy to tune -- when you see a missed optimization, it's usually straightforward to see how to extend the RA to handle it.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: How to make a compiler

Post by Schol-R-LEA »

mariuszp wrote:
Schol-R-LEA wrote:Actually, in terms of the time spent, the dominant part of most compilers is the lexical analyzer - which in most cases, amounts to a Deterministic Finite State Automaton, whether formally constructed or as an ad-hoc equivalent. Either way, I would have a lot of trouble seeing that as a graph coloring problem.
All problems in NP have a polynomial-time reduction to graph coloring.
facepalm I can't believe I fell for that. Yeah, that can describe more or less any practical algorithm, can't it (as P is a subset of NP).
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: How to make a compiler

Post by Korona »

Doesn't hold when you want to parse C++. (And context-sensitivity gives you a PSPACE-complete problem.)
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Qbyte
Member
Member
Posts: 51
Joined: Tue Jan 02, 2018 12:53 am
Location: Australia

Re: How to make a compiler

Post by Qbyte »

Schol-R-LEA wrote:Actually, in terms of the time spent, the dominant part of most compilers is the lexical analyzer - which in most cases, amounts to a Deterministic Finite State Automaton, whether formally constructed or as an ad-hoc equivalent. Either way, I would have a lot of trouble seeing that as a graph coloring problem.
The lexer is actually the simplest and least time consuming part of writing a compiler. There's also pretty much only two ways to design a lexer; one which transforms the entire source file into a token array in a single step which is then handed off to the parser, or one which generates tokens for the parser on the fly. I personally prefer the former design for reasons of performance and code simplicity.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: How to make a compiler

Post by Schol-R-LEA »

Qbyte wrote:
Schol-R-LEA wrote:Actually, in terms of the time spent, the dominant part of most compilers is the lexical analyzer - which in most cases, amounts to a Deterministic Finite State Automaton, whether formally constructed or as an ad-hoc equivalent. Either way, I would have a lot of trouble seeing that as a graph coloring problem.
The lexer is actually the simplest and least time consuming part of writing a compiler. There's also pretty much only two ways to design a lexer; one which transforms the entire source file into a token array in a single step which is then handed off to the parser, or one which generates tokens for the parser on the fly. I personally prefer the former design for reasons of performance and code simplicity.
Sorry if I wasn't clear; I wasn't referring to the implementation time, but the running time during a compile. Depending on the number of potential transitions, and the amount of optimization the compiler performs, lexical analysis can consume up to 80% of the running time (the textbook Modern Compiler Design by Grune, et. al., explains why in detail).

Writing a lexical analyzer is pretty easy, comparatively. Writing an efficient one is another story, though as you said, it still is usually simpler and faster to implement than the other components.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: How to make a compiler

Post by Thomas »

Hi mariuszp,

With availability of lot of video courses online, you can do anything you set your mind to. You will need to invest some time and energy for it. With a combination of a book ( a good free one suggested already by others ) and online resources, you should be able to get it done. The classic text by Aho et el is another good reference. Hopefully you can find a low priced used copy online.

I think, these videos are very instructional.
https://courses.cs.washington.edu/cours ... index.html

--Thomas
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: How to make a compiler

Post by Thomas »

I remember enjoying reading this too.

https://holub.com/compiler/. The author has released the book and sources in public domain. Read this if you want to learn how to write a lex and yacc clone. However i did not like the design of the final compiler he came up with. There are far more modern and more modular ways to do it. But a good read for sure.

--Thomas
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: How to make a compiler

Post by Solar »

Boost.Spirit is (IMHO) a strong C++ alternative to the C (lex/flex, yacc/bison) way of going about things.

Upsides:
  • Integration. No additional syntaxes, no additional tools, no additional building steps.
  • Direct generation of C++ object hierarchy.
  • Modularity and re-usability of subparsers. (For example, I've created a Boost.Spirit parser for "quoted strings" that includes the usual C escapes, octal, hexadecimal, and Unicode character constants, and generates a UTF-8 std::string. And re-used it in multiple projects since.)
  • In combination with Boost iterators, can give very good error messages on failed parses with very little effort.
Downsides:
  • Compile times. (Can be alleviated significantly by modularization, see above.)
  • "Classic" C++ template error messages (when building the parser) can be daunting. (Might get better with Contracts.)
If you're willing to really tackle the learning curve, Boost.Spirit can be very rewarding, for both quick off-the-cuff oneliners or complex parse trees. But you have to dig deep to familiarize yourself with its intricacies. As with many other of my favourite tools (Vim, LaTeX, C++ in general), you won't see a payoff if you're only scratching the surface. The benefits are to be found at the deep end. ;-)

Additionally, let me mention Crafting Interpreters by Bob Nystrom as an introduction on how to make an interpreter (which isn't that different from making a compiler, really). I found that piece to be more enlightening than many more theoretical approaches, especially if you're a beginner in the art and can't wrap your head around all the related lingo yet.
Every good solution is obvious once you've found it.
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: How to make a compiler

Post by alexfru »

Schol-R-LEA wrote:Depending on the number of potential transitions, and the amount of optimization the compiler performs, lexical analysis can consume up to 80% of the running time (the textbook Modern Compiler Design by Grune, et. al., explains why in detail).
I presume, that figure of 80% corresponds to non- or nearly non-optimizing compilers.
The languages that allow writing code with exponential compile times (where the compiler needs to try this or that in order to resolve ambiguity, potentially many times nested) are probably a separate category of their own.
User avatar
nyc
Posts: 17
Joined: Sun Dec 29, 2019 10:59 pm
Libera.chat IRC: nyc

Re: How to make a compiler

Post by nyc »

Korona wrote:A LLVM-style Greedy RA is superior to graph coloring RAs in practice, prove me wrong. Graph coloring just cannot easily handle non-homogeneuos register files (and of course, it's also quite expensive). When I wrote an implementation (only a bit over 1k SLOC, can do live range splitting but not register spilling right now) inspired by the LLVM one, I was surprised (i) how natural it is, and (ii) how good the results are if you tune it a bit (for smallish functions with only a few basic blocks/variables, the result is often optimal). It's incredibly easy to tune -- when you see a missed optimization, it's usually straightforward to see how to extend the RA to handle it.
I like things like http://people.cs.pitt.edu/~soffa/research/Comp/gurr.ps as ways to make graph colouring or analysis go through, though I'm admittedly coming from the point of view of someone not really interested enough in compilers to code up an implementation of it or a more recent algorithm in the same vein.
Post Reply