Self hosting OS/Compiler from scratch

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
Joshw
Member
Member
Posts: 46
Joined: Wed Mar 05, 2008 4:41 pm
Location: San Francisco, California, USA
Contact:

Self hosting OS/Compiler from scratch

Post by Joshw »

Hello all,

For a long time I've always wanted to achieve the lofty goal of creating a completely self-hosting operating system and compiler from scratch. No external libraries would be used or ported.

I've been coming closer to this goal little by little, whenever I get bored and feel like working on my own projects again :)

When I was younger I wrote an x86 protected mode operating system with multithreading, ELF loading, a FAT32 file system, and graphical display thanks to the wiki :)

To demonstrate my understanding of machine architecture, I wrote this mips simulator a while back: https://code.google.com/p/mips-machine/

I have written a simple yet useful top down parser complete with .NET code generation: https://code.google.com/p/game-creator/. However, a recursive descent parser doesn't meet my needs. I always loved how simple and elegant C# is. My language should be able to correctly parse things such as:

Code: Select all

List<Nullable<int>> list;
f(g<a. b>(c));
So I'm currently learning how to generate a parser table for an LALR parser. I have attempted such a language in the past using the recursive descent method, but that's where I always hit a roadblock. The rest of the language parsed fine :)

I'm still trying to figure out a general way to do register allocation that will work with x86 and ARM. Once I do, I'll be able to generate code for both those platforms with my compiler.

My ultimate goal is to have a cross-platform, general purpose, self-hosting OS :)

Any thoughts that might help me along the way? Maybe with parsing and register allocation? I tend not to understand the greek and formulas so well and I just want to figure out how to apply it in my code.

Thanks!

Josh
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: Self hosting OS/Compiler from scratch

Post by Antti »

Joshw wrote:creating a completely self-hosting operating system and compiler from scratch. No external libraries would be used or ported.
I appreciate your goal and the topic is perfect. I am also trying to do something like this. Projects like these are interesting and I wish you luck. Are you using C# or your own language based on C#? How do you bootstrap the system, i.e. what is the host platform and languages/compilers before you have your self-hosting OS/compiler?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Brendan »

Hi,
Joshw wrote:For a long time I've always wanted to achieve the lofty goal of creating a completely self-hosting operating system and compiler from scratch. No external libraries would be used or ported.
I'm mostly attempting to do the same - design my language, build initial tools, write an OS in my language, then write native tools for that OS.
Joshw wrote:I'm still trying to figure out a general way to do register allocation that will work with x86 and ARM. Once I do, I'll be able to generate code for both those platforms with my compiler.
Why? It's much easier to write 2 different "back ends" (one for each architecture). Also note that when you're writing the initial tools (that you plan to throw away once you've done the OS and have native tools) it doesn't make much sense bothering with more than one architecture (until you start writing the OS's native compiler).
Joshw wrote:Any thoughts that might help me along the way? Maybe with parsing and register allocation? I tend not to understand the greek and formulas so well and I just want to figure out how to apply it in my code.
Yes - keep the language clean and simple. This makes it easier for you to write the compilers, but also makes it easier for other programmers to learn and use your language later (including making it easier for programmers to find bugs). If you must add fancy features, add features that help humans avoid mistakes and/or make it easier for people and compilers ensure that code is correct. Don't add features that make things more complex, or more ambiguous, or make it harder to ensure code is correct (e.g. templates/generics, operator overloading, etc).

Remember that it's easy to add more complex feature to a "too simple" language later on (without causing compatibility problems and breaking older code written for your language), but virtually impossible to remove features from a "too complex" language later.

Don't spend too much time optimising the initial compiler's code or implementing code optimisers in the initial compiler (and forget about things like doing register allocation properly). It's far too easy to get distracted and forget that you're planning to throw the initial compiler away.

Make sure you can dump everything to a file between passes, so that you can compare "before pass N" to "after pass N" to see what the compiler actually did during that pass. Also; implement the compiler as many small simple passes (rather than a few large complex passes). These things makes it much much easier to check that the compiler is doing the right thing (and figure out where it's doing the wrong thing).

I'd also recommend researching existing languages; paying special attention to things that beginners frequently have trouble with. You'll find that the things beginners have trouble with are also the things experienced programmers have trouble with - the only difference is that beginners don't know what they've done wrong and ask for help. Experience programmers still make the same mistakes but do know how to fix them without help. If you're designing a new language, you're able to design the language to prevent common mistakes. A simple example (for C) would be octal - sooner or later everyone does something like "century = year / 01000;" and has to spent some time trying to find their mistake, and you can prevent that by using a "less error prone" way of representing octal numbers (or maybe do what C# does and simply don't bother with octal at all).

In the same way, I'd also recommend ignoring what existing experienced programmers think are "good features" of their favourite language. There's a special type of "Stockholm syndrome" that happens with bad language features, where programmers think the feature is bad when they're a beginner (but aren't too sure because they're a beginner), then they learn to live with the bad feature and accept it, and then eventually learn to like the bad language feature (and even have trouble imagining a world without the bad feature). Basically, if an experienced programmer says a feature is bad then the feature probably is bad, and if a beginner says a feature is good then it probably is good; but if an experienced programmer says a feature is good or a beginner says a feature is bad you can't trust them. ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Joshw
Member
Member
Posts: 46
Joined: Wed Mar 05, 2008 4:41 pm
Location: San Francisco, California, USA
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Joshw »

Antti wrote:Are you using C# or your own language based on C#? How do you bootstrap the system, i.e. what is the host platform and languages/compilers before you have your self-hosting OS/compiler?
Hi Antti, I'm using C# to write my compiler :)

Thanks so much for the tips Brendan! Guess I should get a working language and code generator first before I worry about making it too pretty and fancy :)
User avatar
mathematician
Member
Member
Posts: 437
Joined: Fri Dec 15, 2006 5:26 pm
Location: Church Stretton Uk

Re: Self hosting OS/Compiler from scratch

Post by mathematician »

Brendan wrote:Don't spend too much time optimising the initial compiler's code or implementing code optimisers in the initial compiler (and forget about things like doing register allocation properly). It's far too easy to get distracted and forget that you're planning to throw the initial compiler away.
Yes, except that you will be using the quick and dirty compiler for some years, and in the meantime you do not want to be getting the idea that your OS is unconscionably slow, when the real fault lies with the compiler.
The continuous image of a connected set is connected.
Joshw
Member
Member
Posts: 46
Joined: Wed Mar 05, 2008 4:41 pm
Location: San Francisco, California, USA
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Joshw »

So I could maybe add optimizations as I go :)
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Brendan »

Hi,
mathematician wrote:
Brendan wrote:Don't spend too much time optimising the initial compiler's code or implementing code optimisers in the initial compiler (and forget about things like doing register allocation properly). It's far too easy to get distracted and forget that you're planning to throw the initial compiler away.
Yes, except that you will be using the quick and dirty compiler for some years, and in the meantime you do not want to be getting the idea that your OS is unconscionably slow, when the real fault lies with the compiler.
For kernel and device driver work, you're mostly going to need tools that allow at least some sort of assembly language (whether it's linking against object files created from an assembler like NASM or GAS, or inline assembly, or something else). The normal "profile/test to find the slow parts, then optimise the slow parts" approach would still apply; and the slowest parts could be replaced by hand optimised assembly if/when necessary (regardless of the reason for the slowness). Of course optimising assembly by hand shouldn't be much of a problem for anyone that's capable of writing tools to automatically optimise compiler generated assembly... ;)

However, speed wouldn't be the main goal for either the initial/"disposable" compiler or the initial kernel/OS (plenty of time to worry about performance after you've implemented good native tools); and (without any optimisation at all) if you don't end up with something fast enough to be used to develop the native/optimising compiler then you've probably done something very wrong (especially for reasonable hardware).

Also note that I'm not necessarily saying "don't bother doing implementing any optimisation in the initial compiler at all" (I'm only trying to say "don't get distracted"). A small amount of optimisation (e.g. a simple peep-hole pass) might be a good idea, but spending 5 years working on generating the best possible code doesn't make sense when you're only using the compiler for a few years and "good enough for now" is good enough.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Kevin
Member
Member
Posts: 1071
Joined: Sun Feb 01, 2009 6:11 am
Location: Germany
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Kevin »

Brendan will disagree because he considers his OS too different for this to make sense, but I don't think you have to wait for your OS to be running before you can throw away your first compiler.

The only reason why you even need a throw-away compiler in the first place is that you're designing a new language, so the first compiler can't be written in its own language. As soon as your preliminary compiler implements a subset of the language that is powerful enough, you can start writing the real compiler. Not running on your own OS, but e.g. Linux. At this point, you can start working on optimisations etc. without having to throw them away later.

Then you use that compiler to start developing your OS, and as soon as its mature enough, porting your compiler will be easy enough. All you really need to change is a few functions for accessing files and similar things.
Brendan wrote:If you must add fancy features, add features that help humans avoid mistakes and/or make it easier for people and compilers ensure that code is correct. Don't add features that make things more complex, or more ambiguous, or make it harder to ensure code is correct (e.g. templates/generics, operator overloading, etc).
How are generics a featuer that make it harder to ensure correctness? Aren't they generally used where you had an untyped pointer before, so introducing type safety, which is one of the most important measures to ensure correctness?

How would you implement a list, for example?
Developer of tyndur - community OS of Lowlevel (German)
User avatar
bwat
Member
Member
Posts: 359
Joined: Fri Jul 03, 2009 6:21 am

Re: Self hosting OS/Compiler from scratch

Post by bwat »

The document referred to in the following link is a description of real-life language bootstrapping. The language (Erlang) has processes and message passing and it is being used in industrial settings. The document is only 17 pages, so you have no excuses not to give it a quick read. It is quite inspiring actually.

http://citeseerx.ist.psu.edu/viewdoc/su ... .1.34.3972
Every universe of discourse has its logical structure --- S. K. Langer.
User avatar
mathematician
Member
Member
Posts: 437
Joined: Fri Dec 15, 2006 5:26 pm
Location: Church Stretton Uk

Re: Self hosting OS/Compiler from scratch

Post by mathematician »

Kevin wrote:Brendan will disagree because he considers his OS too different for this to make sense, but I don't think you have to wait for your OS to be running before you can throw away your first compiler.

The only reason why you even need a throw-away compiler in the first place is that you're designing a new language, so the first compiler can't be written in its own language. As soon as your preliminary compiler implements a subset of the language that is powerful enough, you can start writing the real compiler. Not running on your own OS, but e.g. Linux. At this point, you can start working on optimisations etc. without having to throw them away later.
Well, when you wrote the initial compiler, in C let us say, you would obviously need some source code in the target language to test the compiler on. Probably the only practicable way of coming up with that source code, beyond a trivial "hello world" program, would be to start writing the compiler in its native language - pretty well as soon as you had started the C version.
The continuous image of a connected set is connected.
User avatar
bwat
Member
Member
Posts: 359
Joined: Fri Jul 03, 2009 6:21 am

Re: Self hosting OS/Compiler from scratch

Post by bwat »

Joshw wrote: I tend not to understand the greek and formulas so well and I just want to figure out how to apply it in my code.
I see from your link, which led to your CV, that you're reading the dragon book. That does have a fair amount of "Greek" in it. However, at the end of the day it's just symbol pushing. It's all logical and you do get it after a while. Maybe you'd like another more tutorial-oriented compiler book which takes you through the construction of a compiler, e.g., the tiger book? After that, the more theoretical treatment in the dragon book might be clearer. Most of us understand theory a bit better after a little practice.
Every universe of discourse has its logical structure --- S. K. Langer.
User avatar
bwat
Member
Member
Posts: 359
Joined: Fri Jul 03, 2009 6:21 am

Re: Self hosting OS/Compiler from scratch

Post by bwat »

mathematician wrote:Well, when you wrote the initial compiler, in C let us say, you would obviously need some source code in the target language to test the compiler on. Probably the only practicable way of coming up with that source code, beyond a trivial "hello world" program, would be to start writing the compiler in its native language - pretty well as soon as you had started the C version.
Auto-compilation isn't a particularly good test for anything other than throughput. Compiler tests are often large collections of many small pieces of code for testing individual fragments of the language, e.g. expressions, declarations, variable binding, etc. Then you have test routines for the run-time library. The principle of compositionality says that if all of these work, then the compiler works.

[edit: I've just counted the test cases in the last compiler I wrote. I came up with 1883 test case names]
Every universe of discourse has its logical structure --- S. K. Langer.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Combuster »

Auto-compilation isn't a particularly good test for anything other than throughput.
Not exactly. If you let the recompiled compiler recompile itself you should get the exact same binary - you can use that identity to easily verify if there exists any progressive issues.

if you only test the smallest components and never the composition thereof, you don't know if the components pieced together perform as expected - which is exactly what autocompilation does as the complement to unit testing - revealing a completely different category of issues that might exist.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
bwat
Member
Member
Posts: 359
Joined: Fri Jul 03, 2009 6:21 am

Re: Self hosting OS/Compiler from scratch

Post by bwat »

Combuster wrote:
Auto-compilation isn't a particularly good test for anything other than throughput.
Not exactly. If you let the recompiled compiler recompile itself you should get the exact same binary - you can use that identity to easily verify if there exists any progressive issues.

if you only test the smallest components and never the composition thereof, you don't know if the components pieced together perform as expected - which is exactly what autocompilation does as the complement to unit testing - revealing a completely different category of issues that might exist.
No. Composition is a fragment of the language and therefore tested. See my initial post.
Every universe of discourse has its logical structure --- S. K. Langer.
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Self hosting OS/Compiler from scratch

Post by Combuster »

bwat wrote:No. Composition is a fragment of the language and therefore tested. See my initial post.
So you test composition, and yet you claim the compiler source itself is a bad example of composition.

Especially when this particular testcase maintains itself, and has sever other properties you can't test with any other testcase.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
Post Reply