Hi,
Joshw wrote:For a long time I've always wanted to achieve the lofty goal of creating a completely self-hosting operating system and compiler from scratch. No external libraries would be used or ported.
I'm mostly attempting to do the same - design my language, build initial tools, write an OS in my language, then write native tools for that OS.
Joshw wrote:I'm still trying to figure out a general way to do register allocation that will work with x86 and ARM. Once I do, I'll be able to generate code for both those platforms with my compiler.
Why? It's much easier to write 2 different "back ends" (one for each architecture). Also note that when you're writing the initial tools (that you plan to throw away once you've done the OS and have native tools) it doesn't make much sense bothering with more than one architecture (until you start writing the OS's native compiler).
Joshw wrote:Any thoughts that might help me along the way? Maybe with parsing and register allocation? I tend not to understand the greek and formulas so well and I just want to figure out how to apply it in my code.
Yes - keep the language clean and simple. This makes it easier for you to write the compilers, but also makes it easier for other programmers to learn and use your language later (including making it easier for programmers to find bugs). If you must add fancy features, add features that help humans avoid mistakes and/or make it easier for people and compilers ensure that code is correct. Don't add features that make things more complex, or more ambiguous, or make it harder to ensure code is correct (e.g. templates/generics, operator overloading, etc).
Remember that it's easy to add more complex feature to a "too simple" language later on (without causing compatibility problems and breaking older code written for your language), but virtually impossible to remove features from a "too complex" language later.
Don't spend too much time optimising the initial compiler's code or implementing code optimisers in the initial compiler (and forget about things like doing register allocation properly). It's far too easy to get distracted and forget that you're planning to throw the initial compiler away.
Make sure you can dump everything to a file between passes, so that you can compare "before pass N" to "after pass N" to see what the compiler actually did during that pass. Also; implement the compiler as many small simple passes (rather than a few large complex passes). These things makes it much much easier to check that the compiler is doing the right thing (and figure out where it's doing the wrong thing).
I'd also recommend researching existing languages; paying special attention to things that beginners frequently have trouble with. You'll find that the things beginners have trouble with are also the things experienced programmers have trouble with - the only difference is that beginners don't know what they've done wrong and ask for help. Experience programmers still make the same mistakes but do know how to fix them without help. If you're designing a new language, you're able to design the language to prevent common mistakes. A simple example (for C) would be octal - sooner or later everyone does something like "century = year / 01000;" and has to spent some time trying to find their mistake, and you can prevent that by using a "less error prone" way of representing octal numbers (or maybe do what C# does and simply don't bother with octal at all).
In the same way, I'd also recommend ignoring what existing experienced programmers think are "good features" of their favourite language. There's a special type of
"Stockholm syndrome" that happens with bad language features, where programmers think the feature is bad when they're a beginner (but aren't too sure because they're a beginner), then they learn to live with the bad feature and accept it, and then eventually learn to like the bad language feature (and even have trouble imagining a world without the bad feature). Basically, if an experienced programmer says a feature is bad then the feature probably is bad, and if a beginner says a feature is good then it probably is good; but if an experienced programmer says a feature is good or a beginner says a feature is bad you can't trust them.
Cheers,
Brendan