Page 1 of 2

Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 11:22 am
by Wishmaster
Hi, i have heard that someone has written a 32bit OS in C--. But i dont know who ;) . I wanted to ask, wheter he can send me the source, because i try to create a 32bit OS in C-- :)
I read this at OSDEV.org.

Bye.

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 11:29 am
by aladdin
I think there is many 32b kernels in C++
try with google

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 11:43 am
by Dreamsmith
aladdin wrote:I think there is many 32b kernels in C++
Yes, but the question was about C--, not C++.

http://www.cminusminus.org/

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 1:30 pm
by Candy
Wishmaster wrote: Hi, i have heard that someone has written a 32bit OS in C--. But i dont know who ;) . I wanted to ask, wheter he can send me the source, because i try to create a 32bit OS in C-- :)
I read this at OSDEV.org.

Bye.
Just a thing I noticed, c-- seems to have a special explicit tail-call mechanism. Why is this?

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 2:01 pm
by df
iirc mmurtl was written in c--

certainly they ported and there is source for the c-- compiler in the mmurtl directory etc

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 3:23 pm
by Dreamsmith
Candy wrote:Just a thing I noticed, c-- seems to have a special explicit tail-call mechanism. Why is this?
To make the compiler's job easier? Fear that the optimizer might miss it otherwise?

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 3:34 pm
by Dreamsmith
Dreamsmith wrote:Yes, but the question was about C--, not C++.
Oops -- even more specifically, it was about Sphinx C--, which, as it turns out, is not the same as cminusminus.org's C--. Ignore my last link, try this instead:

http://www.goosee.com/cmm/

Re:Sphinx C-- and 32bit OS

Posted: Tue Oct 05, 2004 6:31 pm
by mystran
Tail-call eliminination (including the degenerate case called proper tail-recursion) is very nice for certain types of programming. The basic idea is the last thing a procedure does is call another procedure, there is no need for the stack frame of the caller to be saved. Tail-recursion is simply the case where the tail-call is made to the same procedure as the caller.

For example the programming language Scheme, which requires full tail-call elimination, does not need (nor have) any looping constructs, as you can use tail-recursion instead. In C the infinite recursion

Code: Select all

int foo() {
    return foo();
}
would eventually overflow the stack, while in a properly tail-recursive implementation it is does not: it is semantically identical to infinite loop. Indeed, this is how you write loops in Scheme.

If we support full tail-call elimination (and not just tail-recursion) we can futher program in a style where the function call in tail-position works as a "goto". The most obvious use for such a style is writing a finite-state-machine, which in Scheme (for example) can simply be written as a set of procedures each representing one state, and the state transitions are simply function calls. In a language like C, you'd either have to make the state's data, or use some kind of "trampoline" to make sure the stack is cleaned between the procedures.

There are a lot of other stuff which tail-call elimination simplifies.

Now, the reason why there's support in C--, is because not only Scheme, but also many other modern functional languages require tail-call elimination. And C-- targets the audience which wants to compile into a low-level language like C, but are not happy with C's limitations: including implementors of Scheme and other fuctional languages. After all, some of the ugliest parts of compiling stuff (register allocation, instruction scheduling, ...) are exactly the same no matter what language you compile, so having an intermediate language with a compiler that solves this problem is nice, and this is what C-- tries to be.

Re:Sphinx C-- and 32bit OS

Posted: Wed Oct 06, 2004 12:14 am
by Solar
I'm a bit confused here. I would expect any decent C/C++ compiler would be capable of optimizing away tail-recursion at a moderate optimizer setting?

Re:Sphinx C-- and 32bit OS

Posted: Wed Oct 06, 2004 12:41 am
by Dreamsmith
Solar wrote:I'm a bit confused here. I would expect any decent C/C++ compiler would be capable of optimizing away tail-recursion at a moderate optimizer setting?
Indeed, one would hope so. I assume the thrust of Candy's question was on the word explicit, as the benefits of tail call optimization are fairly well known. Why must it be explicitly requested in C--, rather than simply handled by the optimizer as in higher level languages?

I assume the answer is because C-- is meant to be a lower level language (really a compiler back-end), and thus expects the work of deciding when this optimization should be performed to be done at the higher level, i.e., it needs to be told explicitly when to perform this optimization because it's not its job to decide this, its job is to merely do what the higher level told it to.

Think of it this way, C-- is really meant to simply be a portable assembly language. You don't expect your assembler to perform those sorts of optimizations automatically.

Re:Sphinx C-- and 32bit OS

Posted: Wed Oct 06, 2004 8:01 am
by mystran
First, let's make it clear: tail-recursion is only a limited form of proper tail-calls. Optimizing tail-recursion (sometimes called local tail-recursion) away is rather trivial: simply set new values for local variables and jump to the code after function prolog.

Proper tail-calls in general though (sometimes called general tail-recursion) are much more complicated beast. When the compiler implements proper tail-calls, you can call any function without leaving a trace of yourself on the stack. In fact, to implement higher-order functions (like the Scheme lambda-expressions) you need to be able to perform calls to procedures you don't even know at compile time. Nontheless, these should be optimized too (for example Scheme requires this too).

There are several troubles with such a system though. First of all, since arguments passed to a function consume stack space, these should be popped before performing a tail-call. Normal C calling convention can't be used, because C uses "caller-pops", and the original caller (the one we eventually return to) can't even know how much stuff there's in the stack.

So at least we'd have to change calling convention to "callee-pops" instead. This introduces another problem though: to support variable number of arguments you also need to pass the number of arguments, so the callee knows how many arguments it should pop. Finally, we'd also like to change the calling convention to pass return address below the variables to avoid unnecessary suffling, but now we can't use the normal "CALL" instruction anymore. (Architectures with a return address register don't really have this problem as such.)

Finally, since you are writing arguments for the procedure you are calling into more or less the same space your own arguments reside in, you need to do something shuffling of the stack (or registers on some architectures). This can add some overhead extra overhead.

Ofcourse the alternative is to allocate space for the parameters all in stack and use a garbage collector to get rid of them, but that's not part of C--'s scope.

In short: supporting local tail-recursion is a simple optimization, yes. Supporting fully general tail-calls is much more complicated, and doesn't play well with C calling convention. I've not familiar enough with C-- to say about it specifically, but I assume it's because (1) they need non-tail-calls for C calling convention anyway, and (2) tail-calls can add a bit overhead, which some people might not want to pay.

Re:Sphinx C-- and 32bit OS

Posted: Wed Oct 06, 2004 1:47 pm
by Schol-R-LEA
That having been said, I know gcc optimizes tail recursion, and apparently v 3.2.1 and later can optimize tail calls between functions with the same stack frame footprint (so-called 'sibling' functions); see here for a brief discussion on this, or this thesis paper on the subject for more detail. So, while it can't do full TCO, it does about as much as any C compiler can on the x86 architecture.

Oh, if anyone out there is perplexed by all of this, you might want to read this nice, informal explanation of tail call optimization and why we're making such a big thing of it.

PS: Mystran, if you haven't already done so, you might want to take a look at the thesis paper, especially the part about Register Transfer Language. Remember my mini-rant about how it would make more sense to compile C into a Lisp-like AST language rather than compiling Lisp into C? Well, if I'm reading this right, that is what gcc actually does!

Well, sort of; the s-expression form apparently only exists for debugging the compilers. It makes sense, though; gcc supports compilers for several languages and code generators for several different architectures, so the intermediate representation would have to be abstracted away from both the source and the final form. Since Stallman did a lot of work on LispMs in the days of yore, I suppose it was only natural that he create an sexp form for the AST. I haven't read through the whole paper, but I mean to.

It occurs to me that if there were options to emit RTL code (or a binary representation thereof), and accept it as input later, you'd effective have a portable intermediate language similar to the 'slim binaries' used in Oberon: a partially compiled, (mostly) portable, language-independent representation which could be used with a JIT final compiler, which could furthermore do both safety checking and localized optimization. Put that in Java's pipe and smoke it!

I'm sure someone else has thought of it before though - it's a pretty old idea - which means there are probably reasons why it would be practical.

Re:Sphinx C-- and 32bit OS

Posted: Wed Oct 06, 2004 10:38 pm
by mystran
Schol-R-LEA wrote: PS: Mystran, if you haven't already done so, you might want to take a look at the thesis paper, especially the part about Register Transfer Language. Remember my mini-rant about how it would make more sense to compile C into a Lisp-like AST language rather than compiling Lisp into C? Well, if I'm reading this right, that is what gcc actually does!
I'll take a look. I can't quite recall that rant, but I agree that Lisp is nicer intermediate form. I have a Lisp compiler on the works, which works by source-to-source transformations, mostly evil amounts of lambda's. Nothing really new though, quite similar to Orbit, Twobit and such.. Infact, my intermediate language isn't even quite Lisp: it's more or less just call-by-value lambda-calculus with primitive functions.
It occurs to me that if there were options to emit RTL code (or a binary representation thereof), and accept it as input later, you'd effective have a portable intermediate language similar to the 'slim binaries' used in Oberon: a partially compiled, (mostly) portable, language-independent representation which could be used with a JIT final compiler, which could furthermore do both safety checking and localized optimization. Put that in Java's pipe and smoke it!
I don't know. What I currently do is compile stuff into lambda-nodes of form: (lambda args call) where args is Scheme like argument list and call is of form (a b c ... d) where all the members are either symbols or lambda-expressions.

Symbol's can refer into lexical variables, constants, or global varialbes. All lexicals and constants are replaced with gensyms at this point so there's no problem with shadowing or anything.

Everything else in the language is modelled with primitives. Continuations are explicit, so no lambda ever returns: everything is a tail-call at this point. For example, I actually compile (if p t e) by compiling p normally, and giving it a continuation:

(lambda (p1) (:primitive:test
(lambda (b-1)
(b-1
(lambda () "compiled t")
(lambda () "compiled e")))
p1))

This is mostly irrelevant though, but at least it keeps the intermediate more regular.

At this point there are several things that can be done. Converting lexical variables into (env_index . var_index) is one thing. The resulting stuff is easy to interpret with an interpreter which simply works the lambda's call-part, and either calls a primitive or loops for another lambda. No tail-call problems, since we only ever loop.

On the other hand, if one inlines certain primitives, and makes other stuff like environments explicit, one get's a full compiler.

The CPS conversion done is the trivial one, but there's some beta-reduction heuristics to simplify the resulting form: namely, symbols as arguments to known procedures (say, other lambda nodes) can always be inlined, lambda-nodes in the same position can be moved inside the known procedure if they are referred to only once (although one might want to "logically" raise them again when generating code to avoid deep environment indexes).

Boolean short-circuiting and constant propagation aren't much harder: just (partially) evaluate some of the primitive calls.

What I plan to add is "single-assignment-elimination" like that in Twobit: convert single assignments of procedures in the beginning of procedures into a letrec-form (or defines), and convert singles assignments of regular variables in the beginning of procedure into let-forms.

Beyond the initial CPS conversion the code doesn't care at all what language it's compiling. Ofcourse code-gen has to know the calling convention and primitives depend on the language, but other than that I don't see a reasons you couldn't compile anything like this.
I'm sure someone else has thought of it before though - it's a pretty old idea - which means there are probably reasons why it would be practical.
The particular method I'm using is described in at least:
"Lambda, the ultimate label or a simple optimizing compiler for Scheme" (available at least from ACM, I remember there's another version somewhere else)
"ORBIT: an optimizing compiler for scheme" (ACM at least)
Twobit: http://www.ccs.neu.edu/home/will/Twobit/
"Realistic compilation by program transformation" (several versions exist)

Sorry for not having time right now to get links for the papers. I am sure there are others.

Re:Sphinx C-- and 32bit OS

Posted: Fri Oct 08, 2004 1:31 am
by Wishmaster
I mean Sphinx C-- c--sphinx.narod.ru/indexe.htm .

I remember, i mean Addek has written an OS with C-- ::)

Could you give me the source, please ?

Bye.

Re:Sphinx C-- and 32bit OS

Posted: Fri Oct 08, 2004 1:33 am
by Wishmaster
Oops -- even more specifically, it was about Sphinx C--, which, as it turns out, is not the same as cminusminus.org's C--. Ignore my last link, try this instead:

http://www.goosee.com/cmm/
ups, i didn't read this ;)