OSDev.org

Posted: **Wed Mar 23, 2016 5:25 pm**

Hi, All!

I'm working on OS that should implement ideas of orthogonal persistence, language-based protection and capability-based security. Which, I belive, together should create a synergetic effect. And for this I need a programming language with a very specific requirements. I haven't found any existing programming language that would fulfill all of them, so I'm working on my own. Actually 99.9% of my efforts in the recent couple years were focused on compiler development, rather then on more typical os-dev stuff. Yes, Alta Lang is exactly mine diagnosis:)

And still there is a lot of compiler-related work ahead - I don't have any of the OOP features yet. So, it seems that for at least couple more years I'll be a compiler developer, rather then an OsDever. So I'm curious if there is a similar to OsDev wiki/forum/community, but for compiler maniacs?

Also, is there an interest in compiler development on OsDev? I think I could write a couple of articles to the wiki.

Posted: **Thu Mar 24, 2016 1:19 am**

It's been tried before and failed. But there's nothing to stop you trying again and setting up such a site.

Posted: **Thu Mar 24, 2016 4:07 am**

kjam wrote:So I'm curious if there is a similar to OsDev wiki/forum/community, but for compiler maniacs?

There were similar ideas, like this, for example.

kjam wrote:Also, is there an interest in compiler development on OsDev? I think I could write a couple of articles to the wiki.

All alta langs are welcome compiler developers! But the majority is using C/C++ or assembly. However, compiler development knowledge can be useful even for C/C++ developers.

Posted: **Thu Mar 24, 2016 4:10 am**

I would be interested in such articles here. I'm also an Alta Lang by the way!

Posted: **Thu Mar 24, 2016 7:10 am**

I would be happy to contribute (I wrote the original Alta Lang article, mainly because there were several of us around and it seemed appropriate, so...). However, while the idea always seems to generate a fair amount of interest, previous CompilerDev wiki and forum projects (both in this group and elsewhere - e.g., there was an unrelated Compilers and Languages wiki around 2011 that only lasted about six months before vanishing mysteriously), have foundered on four shoals: lack of focus, lack of momentum, lack of moderator activity, and lack of funds on the part of the site owner. If you mean to try this again, you'll want to get all your ducks in a row before setting it up to avoid those pitfalls.

If you mean to set up such a site, I would start by making sure you (or someone you trust) will have the time, motivation, and ability to maintain and moderate it, especially in the crucial early stages when there are more stubs than articles. You also want to be certain that enough money is laid out from the start to cover hosting and other expenses for at least four years.

Next, you need to delineate in detail what the initial version of the wiki will and will not cover - unlike OS dev (which involves several complex but for the most part closely related topics), there are actually at least four different and only tangentially related topics such a wiki can and to some extent must cover, namely, compiler design and implementation, programming notation design, computational linguistics (especially the Chomsky hierarchy and how it relates to the Church-Turing thesis, finite automata theory, regular expressions, context-free grammars, and string rewriting systems), and code optimization. Trying to cover all of these topics right at the start is going to bog down the writing process, especially if there are only a handful of contributors, but at the same time, ignoring any one topic will make the site unusable. You should try get enough editors lined up ahead of time, with one or more specifically assigned to writing on each major topic, and populate the key articles before the site goes live. You should expect to do at least three months of work with your cadre of core editors writing the key pieces before the site has enough material to be sustainable - actually, it will probably be closer to three years, but I would say three months is the lower boundary.

Finally, have at least three members with admin access at all times, and if you need to leave or take a break, make sure someone else takes over for you before you go - at least two projects like this failed because the sole administrator went out of contact unexpectedly, and no one could get in to moderate the site.

Posted: **Thu Mar 24, 2016 8:13 am**

Regarding a forum, you end up with a chicken-or-egg problem: while you will need a forum (or at least a mailing list) for the wiki contributors, and many wikis get a lot of their material from their connected fora (OS Dev being a case in point - the forum came first, and many early wiki articles were build off of essays posted on the old Megatokyo site), the forum needs the wiki in order to get the newcomers up to speed - back in the days when OS dev was nothing more than a bunch of FAQ and howto files scattered over dozens of USENET posts and hard-to-find websites, the repetition in the message boards was comparable to SO or Daniweb today, if not worse, and the reputation of the OS dev fora as hostile to newbs came out of the reaction people had to this. The problem is still serious on this board, but it is at least manageable; in the period of 2002-2010 or so, it was far from manageable, and before that the OS-dev community could hardly be said to exist at all (in fact, I had tried to stir up interest in creating a comp.os.development newsgroup in 1997 and got no response at all, and the one I ended up creating, named alt.operating-systems.development (IIRC), had a total of perhaps five posts in the next four years, mainly because no one knew it was there).

This would be an even bigger problem when setting up a compiler dev forum, for two reasons. First, you would be starting almost from scratch - without the existing wiki, it will take time to get enough experienced compiler devs posting to the group to answer all the questions that are bound to come up. Second, whereas OS dev is mostly a hobbyist topic, compiler dev would have a lot more student questions - very few undergrad or even grad OS courses actually involve writing an OS (and those which do usually follow well-established course material such as Minix or NACHOS, which have their own topic-specific fora), but pretty much any compiler course finishes with each student implementing a simple lexer and a recursive-descent parser for some toy language (often a new one each year to reduce the amount of copypasta of previous students' assignments). You can expect that at first nearly every question posted will be code begging, or at least beginner's questions like 'how do I use a DFSM for lexical analysis?' and 'why does my grammar have to be left-recursive?', the latter almost certainly being among the more common misconceptions (you want to eliminate left recursion in a grammar meant for a naive LL(1) recursive-descent parser, but many textbooks and classroom presentations make it look like the opposite is the case).

Posted: **Thu Mar 24, 2016 2:34 pm**

I haven't thought of starting a forum/wiki myself, and from what I can see starting a separate project for this does not sound like a good idea, at least for now.

But I'm glad to see some interest in this topic on OSDev. So, I'll try to contribute to the subject of compiler development on OSDev and if it will be growing big, then we can think of separating it into project of its own.

For the first step, probably the best way to contribute would be to finally make my mind regarding the license

, open the sources and write documentation for my own project.

But probably I can contribute to the wiki as well. I'll write a story of my project to seed a discussion, give an idea of which areas I have an expertise in and see which topics are interesting for the community. Stay tuned!

Posted: **Thu Mar 24, 2016 3:11 pm**

I would stick it between General Programming and General Ramblings sections in Everything Else. This forum does run into alternative languages a lot as a bunch of people here make new languages to write there OSs in. Makes the compiler talk easier to follow or ignore according to personal interest, and keeps Osdev <==> languages relations close by.

Posted: **Fri Mar 25, 2016 12:49 am**

There is a compilerdev newsgroup, but as far as I'm concerned you can just post your compiler questions in general programming. There is a fair bit of knowledge around to get all the basic questions answered. I'm even working on a tutorial that joins both subjects together.

Posted: **Fri Mar 25, 2016 5:09 am**

Schol-R-LEA wrote:compiler dev would have a lot more student questions - very few undergrad or even grad OS courses actually involve writing an OS (and those which do usually follow well-established course material such as Minix or NACHOS, which have their own topic-specific fora), but pretty much any compiler course finishes with each student implementing a simple lexer and a recursive-descent parser for some toy language

It's interesting to get a good picture of compiler development from the point of view of a person who is closely connected to some university or anything alike bound to computer science study. I actually had to build my picture by reading information from different internet sources and still feel some incompleteness here and there. Also it is obvious that a comprehensive compiler development wiki should introduce the subject in a very systematic manner. That's why the question has jumped out of my mind - how systematic the compiler development is? I have a strong feeling of the incompleteness not only of my picture, but also of the "standard" compiler development approach in universities and other computer science organizations. Some impression of a shaky foundation was backed by the way compiler development is taught. I see some theoretical roots from linguistic, I see the detailed parser development approach based on the grammar, and I see a lot of heuristics and experience based advices for how to build your compiler. But I can't see the well shaped theory behind it all. There was some profitable adoption from linguistic which ended up in the very useful grammar generalization, but all the rest just looks so pale in contrast with the neat parser foundation.

I suppose the computer science is a branch of mathematics and as such it just should be very, very and once more - very systematic. But as it is the case for OS development the compiler development also looks like just a bunch of patterns to follow and a number of examples to study. It tells you - take that example, study it and next try to do something good with your experience gained in the process. But where's the theory? There's no sound theory, no mathematics with it's very general approach of formulas and other helpful generalizations. In mathematics I need no example to study, I just get my equation and after solving it I have a very sound result. And what do we have in the compiler development area beside the grammar approach? That's my impression. And I'll be very glad if somebody can show some better way of the compiler development systematization.

Posted: **Fri Mar 25, 2016 6:17 am**

embryo2 wrote:That's why the question has jumped out of my mind - how systematic the compiler development is? I have a strong feeling of the incompleteness not only of my picture, but also of the "standard" compiler development approach in universities and other computer science organizations. Some impression of a shaky foundation was backed by the way compiler development is taught. I see some theoretical roots from linguistic, I see the detailed parser development approach based on the grammar, and I see a lot of heuristics and experience based advices for how to build your compiler. But I can't see the well shaped theory behind it all. There was some profitable adoption from linguistic which ended up in the very useful grammar generalization, but all the rest just looks so pale in contrast with the neat parser foundation.

I suppose the computer science is a branch of mathematics and as such it just should be very, very and once more - very systematic. But as it is the case for OS development the compiler development also looks like just a bunch of patterns to follow and a number of examples to study. It tells you - take that example, study it and next try to do something good with your experience gained in the process. But where's the theory? There's no sound theory, no mathematics with it's very general approach of formulas and other helpful generalizations. In mathematics I need no example to study, I just get my equation and after solving it I have a very sound result. And what do we have in the compiler development area beside the grammar approach? That's my impression. And I'll be very glad if somebody can show some better way of the compiler development systematization.

As I understand it, you get most math and algorithms in the optimizer and register allocator. There you have to analyze various graphs, control and data flows. Common subexpression elimination (CSE), induction variables and a bunch of other optimizations need some non-trivial math/algos.

At the same time since compilers can't reason about programs they're compiling in the same way as the developers of said programs, they can't come up with certain optimizations or there do not exist suitable algorithms (in terms of time and space usage) and so some things aren't done consistently or require hints from the programmer (e.g. various annotations about likelihood of conditions, whether a piece of code is reachable or whether a subroutine returns at all, etc) or use simpler greedy algorithms or heuristic approach (where the perfect algorithm known today has exponential cost or requires gigabytes of memory).

Also, the last mile is tricky. When your compiler is already large and complex and there are many interesting interactions between various parts of it and the target hardware isn't exactly simple either (due to pipelines and caches), you get to do quite a bit of fine tuning when you discover that your compiler runs slow in some cases or your benchmarks aren't great or some cases aren't handled well. And there are trade-offs to be made nearly everywhere, so you look for the right balance between what's done and what's not, how, who does what, in what order (priorities), compile times, host system requirements, code speed, code size, compiler complexity/maintainability/testability/extensibility/etc.

IOW, there's always a mix of math/science and craft/art. Naturally, CS doesn't address every aspect of it.

Perhaps, the best thing one can do in order to get this is to actually implement the whole thing once or twice. A short course will definitely avoid certain areas or lack certain activities that compiler writers (and even software developers in general) would do in real life. A parser is a relatively small and simple thing (there are some ugly language features that complicate parsing at times) and IMHO spending much time studying parsing of various grammars is probably a large waste of resources as there are more interesting and pressing problems of practical importance.

Posted: **Fri Mar 25, 2016 9:16 am**

embryo2 wrote: I have a strong feeling of the incompleteness not only of my picture, but also of the "standard" compiler development approach in universities and other computer science organizations. Some impression of a shaky foundation was backed by the way compiler development is taught. I see some theoretical roots from linguistic, I see the detailed parser development approach based on the grammar, and I see a lot of heuristics and experience based advices for how to build your compiler. But I can't see the well shaped theory behind it all. There was some profitable adoption from linguistic which ended up in the very useful grammar generalization, but all the rest just looks so pale in contrast with the neat parser foundation.

Your typical undergrad course, and to some extent a typical graduate level intro course (they were actually the same class at CSU-East Bay, though the grad students got additional exercises), won't even come close to even reviewing every topic in one semester or quarter, never mind going into much depth into one of the most heavily studied areas in computer science. The course I took, which was typical and perhaps a little better than most, did the standard coverage of the Church-Turing thesis, the Chomsky hierarchy of languages, and the relation between regular languages and regular expressions (material that was covered, though less thoroughly, in the department's sophomore Programming Languages Concepts course, which was supposed to be a prereq though you could have fooled me from how some of the other students were acting); had us diagram several increasingly complex FSAs and write increasingly complex regexes, and then convert one to the other and back (also done in the PLC course I took, though apparently at least one of the other professors teaching it didn't cover it); covered how to define a deterministic FSA which can be used in designing a lexical analyzer; slid through a little material on converting an NFSA to a DFSA; spent a week or so on the basics of context-free grammars and normal form descriptions thereof (again, the basics of this should have been covered in PLC, but apparently the idea of a consistent core curriculum is beyond most universities); managed to give an overview of the three or four most important of the literally dozens of parsing methods, but like most focused primarily on LL(1) grammars suitable for top-down parsing by recursive descent; spent another week having us define an grammar for basic arithmetic expressions, including scheduling the operator priorities, and showed us how to formally determine the ε-productions (terminals) and non-terminal productions, followed by a week showing us how to eliminate the FIRST-FIRST and FIRST-FORWARD conflicts we'd created; and then finally three weeks having us finishing implementing our direct-emit, recursive-descent parser for said expressions, then extending the parser to handle simple conditionals, indefinite iteration, and calls to built-in routines, with optional support for actual procedure definitions and call-by-value as extra credit.

Grad students read the Dragon Book instead of the one we covered, had to use their choice of parser generator (usually Bison) instead of writing the parser by hand, and the procedure part was required for them, but otherwise it was the same class.

Even as rushed and truncated as this was, most of the students were lost after week one. I had a leg up, in that I'd spent over a decade reading bits and pieces out of different textbooks on the subject and had gone over the original 8080 Small-C code with a fine-toothed comb (both Southern Connecticut University and CSU-East Bay had the reprint collections for DDJ in their stacks, so I read lots of the early articles from them), but then my ambition got the better of me and I got bogged down writing an excessively complex FSM implementation for the lexer. My compiler, which I still tweak from time to time, can be found in my Github repo.

alexfru wrote:Perhaps, the best thing one can do in order to get this is to actually implement the whole thing once or twice. A short course will definitely avoid certain areas or lack certain activities that compiler writers (and even software developers in general) would do in real life. A parser is a relatively small and simple thing (there are some ugly language features that complicate parsing at times) and IMHO spending much time studying parsing of various grammars is probably a large waste of resources as there are more interesting and pressing problems of practical importance.

I can see how the emphasis on lexical analysis and parsing seem overdone, and rather agree, Spending tons of time in a compiler on parsing sounds, on the surface, a lot like spending a lot of time in OS design on the bootloader - it's a single part of a bigger picture, and not even an especially complex or central part at that.

However, there are several reasons for this overemphasis on these issues. First, there simply isn't time to really cover the full material in one course.

Second, even the basic material requires a huge pile of theory to actually understand (you can use some parts without it, but you're almost certain to get lost somewhere along the way if you try).

Third, lexical analysis and parsing are (by CS standards) solved problems; they focus on them because it is a known quantity with less ambiguity, whereas intermediate representations are heavy wizardry requiring a lot more information and time than a single course provides, and optimization is a flat-out black magic, the sort of bleeding edge material that the course material will be obsolete before you even get to it, and until the late 1990s was mostly taught word-of-mouth from professor to grad student like some sort of esoteric rite. As for automatic semantic analysis, last I checked that was still pretty much a research topic, and one where advances were slow to develop. Those heuristics for optimization and semantic analysis that do exist have filtering down to some of the people actually writing compilers professionally, though rarely do any practical techniques filter back up as even open-source compiler developers tend to take a very proprietary view of their secrets.

Fourth, compiler courses generally haven't changed much since the mid-1980s, and in many cases are taught by the same professors who taught it then, too. There's a lot of resistance to modifying such a well-establish model if there's no real need, and since the course is mostly for laying groundwork rather (as very few CS students will go on to write compilers ever again), the need isn't seen by most of those making such decisions. While there is actually a lot of disagreement in the field on how best to teach compiler design (as demonstrated by the number of different textbooks, you would never know it given the uniformity with which courses are laid out in 99% of universities.

Posted: **Sat Mar 26, 2016 6:17 am**

alexfru wrote:As I understand it, you get most math and algorithms in the optimizer and register allocator.

It's craft and art mostly. There's no sound foundation like derivative based optimization in mathematics for example. A bunch of hints and many graph processing algorithms. It's like many different bullets for different guns, but there's no theory how to get an optimal bullet for the target. Trial and error, heuristics and other black magic.

alexfru wrote:At the same time since compilers can't reason about programs they're compiling in the same way as the developers of said programs, they can't come up with certain optimizations

Yes, may be it's the main problem.

alexfru wrote:Naturally, CS doesn't address every aspect of it.

I see the only sound area - it's parser development theory with grammars and other stuff. That's why it's taught everywhere. It's understood and useful enough like equations in mathematic.

Posted: **Sat Mar 26, 2016 6:36 am**

Schol-R-LEA wrote:Your typical undergrad course, and to some extent a typical graduate level intro course (they were actually the same class at CSU-East Bay, though the grad students got additional exercises), won't even come close to even reviewing every topic in one semester or quarter, never mind going into much depth into one of the most heavily studied areas in computer science.

But there can be at least an outline of the way ahead. Something like a graph of dependencies among areas of study. First goes this, next goes this and this, then we need this and that areas and they depend on this and yet another area.

But may be the graph of mathematics is too large today to be even outlined in a manageable fashion? A lot of branches with no people understanding them all.

Schol-R-LEA wrote:Even as rushed and truncated as this was, most of the students were lost after week one.

They lack the foundation. And all the stuff without sound base just bury them deep enough. That's why they won't move further in the field.

Schol-R-LEA wrote:I had a leg up, in that I'd spent over a decade reading bits and pieces out of different textbooks

That's much better foundation.

Schol-R-LEA wrote:However, there are several reasons for this overemphasis on these issues. First, there simply isn't time to really cover the full material in one course.

But there is enough time to show the outline of the further studies. And the outline I see is pale.

Schol-R-LEA wrote:Second, even the basic material requires a huge pile of theory to actually understand

Here again the question is disturbing me - if there's so much theory behind, then who in the world has studied it? Where are the courses for learning all this stuff? It's again an empty field for some craftsman of the CS black magic.

Schol-R-LEA wrote:Those heuristics for optimization and semantic analysis that do exist have filtering down to some of the people actually writing compilers professionally, though rarely do any practical techniques filter back

Really looks like black magic with alchemists working on an elixir in a dark and heavily defended castle.

Posted: **Sat Mar 26, 2016 7:07 am**

embryo2 wrote:
alexfru wrote:As I understand it, you get most math and algorithms in the optimizer and register allocator.
It's craft and art mostly. There's no sound foundation like derivative based optimization in mathematics for example. A bunch of hints and many graph processing algorithms. It's like many different bullets for different guns, but there's no theory how to get an optimal bullet for the target. Trial and error, heuristics and other black magic.

Trying every option may be your ultimate theory, except it's impractical.

It's quite amusing to realize that even such a seemingly simple problem of selecting the optimal jump/branch instructions (when you have a few of different range and therefore length) is actually an NP problem in the general case. So, every assembler either limits the number of labels and jumps/branches (by trying every possible option and therefore "hanging" on large inputs) or at best produces suboptimal object code. And we haven't got anywhere near compilers yet. The battle is already lost in assemblers.

embryo2 wrote:I see the only sound area - it's parser development theory with grammars and other stuff. That's why it's taught everywhere. It's understood and useful enough like equations in mathematic.

Indeed, teaching to the test(able) is quite plausible.

OSDev.org

CompilerDev Wiki/Forum?

CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?

Re: CompilerDev Wiki/Forum?