OSDev.org

Posted: **Fri Jan 23, 2009 7:39 pm**

Lets make great compiler!!!!11111one

Okay, seriously, I have decided to write my own language and compiler for my OS. Yes, writing a compiler is hard and time consuming. However, I will be using LLVM which makes writing a compiler much easier with functions like InsertFunction and similar simple constructs.

It would be kind of stupid to waste this opportunity to make something that will be useful for others, not only myself. My question to anyone who wants to put their two cents in: What, if anything, do you find lacking in C/Pascal/etc and would like to have in a low-level language for your OS?

If you are just going to say it is stupid, pointless, won't work etc. don't bother posting please. It was hard enough for me to convince myself to try this.

Posted: **Fri Jan 23, 2009 11:16 pm**

basic compiler construction isnt that difficult. I've written a couple compilers and an assembler before without much trouble. They can actually be quite simple if you dont include a lot of optimization techniques. At it's easiest a compiler can accept your own custom language (or existing language) and convert it to ASM code which isnt difficult in most cases. I'd suggest you look at some compiler generator toolkits such as Antlr. I haven't used antlr but it looks very nice. the YACC/Bison stuff make me ill, instead I used gold parser: http://www.devincook.com/goldparser/

Posted: **Fri Jan 23, 2009 11:50 pm**

I've been thinking of using LLVM (and/or Clang) as the system compiler to port to my OS when it's at the point of running programs. Or even using LLVM in the kernel as a JIT for a low level VM (like Singularity, but totally compatible with raw C, pointers and all).

I'm not sure what I'd want in a systems programming language that C99 can't already provide. There could be more
operators for things like bit searching or a way to assert LOCK when using a pointer, but one can just as easily use
inline ASM.

Posted: **Sat Jan 24, 2009 12:06 am**

If C had templates I'd have an 'C'-gasm.

I'm pretty content otherwise.

Posted: **Sat Jan 24, 2009 12:13 am**

Alboin wrote:If C had templates I'd have an 'C'-gasm.

Ooh, that reminds me. Overloading with predictable name "mangling". I guess ultimately, the best of
C99 plus C++ without classes.

Posted: **Sat Jan 24, 2009 12:40 am**

Well, I was drafting this, but since the topic came up:

Hi all!

After several months of saying relatively little, I'd like to tell you about my project, and my progress. Then I'd like to ask your opinion on my latest work.

I guess I should begin by better introducing myself. I've just started my career as a programmer after finishing my CS degree. While I was in school, I figured that I should study the things I'd never bother to learn on my own, which is why I was just two credits shy of a math minor. Thus, I am just getting back to treat programming as a end in itself (rather than a means to an end).

I've wanted to write my own language since I was 16. But I didn't learn how to until a little later. But I've lacked the time to do anything (due to the degree and work) until a few months ago. By then I had taught myself enough x86 assembly to implement whatever I wanted (poorly).

Of course, I still have plenty to learn. Once I was satisfied that I knew the basics of what I wanted in a language, I shifted my thoughts to being more domain specific. And I became curious about the requirements for an OS developing language.

So I started learning how to develop an OS. I spent my bus-ride into work (slightly less than two hours per day) reading the Intel and AMD manuals. I wrote a protected mode bootloader, and eventually came here when I wanted to expand beyond that.

Long story short, I wrote a "scaffold" OS that pretty much did the basics. The only component of any interest was the randomizing stuff (I figured that by randomizing certain things the chances of a bug surviving due to dumb luck would be minimized). As an aside, Would anyone want me to make a wiki page on the topic of PRNGs?

Anyway, I learned the basics, then set out to write a language. But I've independently re-discovered someone else's work before. This prompted me to learn liked languages that have been used for OS development in the past (or have a supposed feature I'd want). I learned Scheme (in lieu of LISP), Erlang, Forth, LLVM, and I've skimmed a few more. I should really relearn D (I learned it back in 2002).

I also tried to expose myself to different techniques. Such as Quajects. In retrospect, I probably should have done this step years ago. My hope was that I'd find something spectacular for OS dev that isn't widely known (there aren't many OS devs not using C/C++/asm). After all that searching I didn't come back with anything of note.

I've decided on making a yet another C-like language. At the moment, the two are so similar that I could almost accomplish my current goals by using typedefs, macros, and altering the C library. But since my primary interest is in language design, I'd much rather make my own than port/clean existing code (such as pcc).

I've already written a regular expression tokenizer and am currently working on an LALR(1) syntax analyzer (from scratch) in C. Soon I'll be writing the grammar/libraries to bind the four.

But before I go any further, I'd like to get some input.

Question 0: From my experience, I believe that the primary operations of a micro-kernel can be described as "moving data" in contrast to most applications, which can be described as doing data analysis/manipulation. In your experience, is this observation correct? If not, what circumstances negate my observation?

* If a lot of people say that data manipulation frequently occurs, I would consider adding anonymous functions, to allow map and fold. But I could be dissuaded from this.

* At the moment, I'm only going to write an x86, 64-bit version. Although I will want to add more architectures later. For now, I'll need to write the long-mode code in assembly.

* By default integers will be unsigned.

* An integer's size will always be relative to the current computer. For example, on a 32-bit machine "int" will refer to a 4-byte number, and an 8-byte number on a 64-bit machine. There will also be absolute integers (byte, word, dword, qword) for cases like ASCII strings that are just bytes.

Question 1. Have you ever required an integer larger than the processor's bit length? If so, why and how much larger?

Question 2. Have you ever used floating point in your kernel? If yes, why?

* I've determined that a characteristic of languages is if they have meta-data about variables, or not. For an OS development language, I do not think meta-data is appropriate... With one exception, malloc should store how much space a pointer has. Although I must admit I'm still figuring out how this can be used in conjunction with pointer arithmetic as well as a few edge cases. My hope is that this will help prevent buffer overflow.

* I want to create a "semi-debug" mode. This mode will turn on checks (deep down in the language) that you wouldn't dream of running in normal mode (mostly due to expense). My current thoughts on the topic are that there would be a "checklist" to avoid irrelevant cases. My hope is that this will aid debugging on real hardware.

Question 3: How would you want language-level errors to be handled? Try/catch? Assert? Conditions? If+code? I have very little preference on this issue.

* I want to have a relatively large built-in library. Not that I want it to have a Java-size library, but I've maintained C code that has multiple implementations of the same generic data structure within a single codebase!

Question 4: My opinion of Object Orientation varies from time to time. At the moment, I think it is overrated. And I don't see much value in it being used to develop a kernel. Have you ever used object orientation in a kernel in such a way that would be non-trivial to implement in a non-OO language?

* Despite the previous question, I still am a fan of operator and function overloading. While I think binding should happen at compile time for an OS language. Although I must admit that I am afraid that this feature could be terribly misused.

Question 5: Assuming and ignoring a Hardware Abstraction Layer, how much assembly have you used? Do you work with multiple architectures? How would you want to organize this?

Question 6: I haven't really talked much about parallel programming, but I do want it to be embedded into the core of my language. What features would you want? Right now I like Cilk as a base.

General Question: What would you change in C?

What I want is a language that is meant to control hardware, with a few different assumptions than C (ones that were made back in the early 70s due to low-powered hardware). This base language will more than likely lead to a more ambitious one as I use it (using a language with the intent of changing it, in my limited experience, is one of the best ways of creating a new language).

Thank you for any comments given. I tend to consider a lot of points of view, hence I will spend a considerable amount of time pondering your input. Any additional suggestions on topics that I have not brought up (or more than likely not thought of) would be appreciated. I plan on resuming work on this project in a couple of days. I'll post something when it is worth showing.

Posted: **Sat Jan 24, 2009 6:32 am**

Why would you want another compiler? God already gave you one, check LoseThos' website

Posted: **Sat Jan 24, 2009 11:54 am**

Design By Contract: I'm in favour of parts of this to a certain degree. But I'd rather see it as a programming style, rather than an ingrained part of the language. What big advantages do you perceive in implementing it directly in a system design language?

Aspect-Oriented Programming: Again, there are parts of this that can be done by the programmer in a C-like language. There are other parts that require run-time modifications to code. And that alone frightens me. Beyond that, I'm a big fan of W^X, and I'm not sure AOP outweighs that.

Pascal function declarations vs. C, I'm really not picky either way, and will probably just go with C.

I share your desire for generics, I think it is a time/code saver. But I'm still not sure about the syntax. What I'm thinking of doing is using object/function overloading, and then getting the compiler to do the rest. Kind of like inlining. But I must admit that I haven't thought this through yet.

With the exceptions of initializing a stack and changing the stack when context-switching, are there times when an OS programmer directly manipulates the stack?

Support for sane inline assembly syntax would be nice. Look at Ian Lance Taylor emails in gcc ml.

I don't have much time at the moment, but I'll read about this tomorrow and get back to you.

Posted: **Sat Jan 24, 2009 6:15 pm**

I like the ideas, and I'm beginning work. The syntax will be C based, and one of the main reasons I haven't gone with C++ is the lack of an "interface" definition or something similar.

Posted: **Sun Jan 25, 2009 10:00 am**

I'm not sure how I would get a computer to determine preconditions. In my experience, most conditions of an OS are subtle, and rooted in manipulating the hardware. Which I don't know how to automate well. And since I have doubt, my instincts instruct me to defer to the individual programmer's expertise.

W^X is a multi-platform software implementation of the NX or Execute Disable bit. AOP sounds like a page would need write and execute permission, which I loathe permitting. But if it can be done at compile, then that point doesn't matter.

I don't quite understand your example though. Why wouldn't I just use conditionals (run or compile time)?

I was going to leave those stack issues to the programmer.

One of the things I really like about C is that you can keep the language in your head. Now, I may invalidate that with a larger library, but I'd like to avoid having too many keywords/"magic".

Posted: **Sun Jan 25, 2009 12:48 pm**

If you're going to be using llvm, the task of writing a compiler is greatly reduced. For example, in my just-in-time compiler for my own language, I only have one source file that parses the source code and converts it into my internal representation and then another file to convert it into ssa form (which is what llvm uses). The code generation and optimization parts take up about 10 1000-line source code files. So you should be able to write a decent compiler without too much trouble as long as you use llvm. However, the task becomes exponentially harder if you want to generate your own code.

Posted: **Sun Jan 25, 2009 5:42 pm**

LLVM is one of the most interesting technologies I've seen. That being said, I'm also interested in building tools for a non-POSIX, self-hosted, experimental environment. Which means that I'm either going to have to spend a chunk of time porting existing tools for a changing environment. Or that I will choose to go without (even if the cost is additional difficulty).

I plan on writing a higher level language later on that is going to need compiler tools. Otherwise, I won't be able to get crazy features, such as eval, to work. This leads to my belief I will have to do without LLVM.

Berkus: I read Ian Lance Taylor's emails on the subject. I'm starting to see why you mentioned DbC. To get inline assembly and compiler to work together optimally, a system where input and output conditions are specified by the programmer may be the best course of action. I'm going to need to figure out a non-intrusive notation to accomplish this.

As an aside, I'd really appreciate it if someone could answer my earlier questions. I'm going to keep moving in any event, but a few minutes of your time may save me days, if not weeks, over the coming year.

I took today off to read a book on Pixar that I got for my birthday, but tomorrow I'm going to get back to work on the syntax analyzer (which will be released under the BSD license). One of the things I'm wondering is if anyone has ever used one to construct a shell? It occurred to me that it would be overkill, but it should make the job really simple.

Posted: **Sun Jan 25, 2009 7:56 pm**

I would like an assembly language with functions and classes.

I'm also working on a language similar to lisp but different in many aspects (and also designed to be parsed easily). Here is an example of a program:

Code: Select all

#include <string.h>

(function int main (int argc) (ptr (ptr (char)) argv)
(
   (var std::string name (std::console::in))
   (std::console::out (std::string::format "Hello %s!" name)
   
   (var itr argc)
   (while (>= ((- var 1) var) 0)
   (
       (std::console::out (std::string::format "Argument %i is %s." (+ (- argc itr) 1) (char[itr])))
   ))
))

Posted: **Sun Jan 25, 2009 10:50 pm**

At first I thought about avoiding infix notation because it is more expensive to parse and harder to code (not to mention precedence issues). But the more I thought about it, the more I realized that infix, with prefix functions, allowed all the same benefits which, in my opinion, outweighed the computational cost.

The best example I can think of is sum(a, b, c, ...). If I only had infix notation, I would have to say a+b+c+... But since both options are available, I can have the best of both worlds.

I used to think about altering an assembler. But I honestly can't think of any improvements that would actually be worth the trouble. But what do you mean by adding functions?

OSDev.org

Language for Systems Programming

Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming

Re: Language for Systems Programming