Language Design: Sibilance

Schol-R-LEA · Post by **Schol-R-LEA** » Sat Feb 21, 2004 4:17 pm

Since there seems to be a lot of interest in systems programming in languages other than C or C++ in the OS forum I thought I'd start up a thread for one of the three most common ideas: a member of th Lisp family (the others, a Basic derivativeand an object-oriented language similar to SmallTalk, will have separate threads). I have given this putative language the tenative name 'Sibilance'.

To start things off, we need to know what people want in the language, and to consider which of those 'wish list items' is practical for the sort of low-level language we want. Does anyone have anything to start with?

Also, is there a specific dialect (Common Lisp, Scheme, Arc, XLISP, etc.) that you wish to base the design on, or any compromises between dialects? Keep in mind that only a very minimal subset will be possible, and that there will, of necessity, be some considerable changes and additions; In effect, this will be a new dialect in and of itself.

Reference Material:
The ANSI Common Lisp HyperSpec

Revised(5) Report on the Algorithmic Language Scheme

Arc: a New Lisp

Schol-R-LEA · Post by **Schol-R-LEA** » Mon Feb 23, 2004 9:46 pm

Memory management is the key issue, IMAO. Without automatic memory allocation and garbage collection, it is hard to have a Lisp-like language at all; the syntax may be in s-expressions, but it's semantics will probably be quite different.

Despite this, a reasonably Lisp-like language can be created using ordinary stack discipline for local environments, provided certain limitations are followed. First, all cons cell must be allocated in the environment they are used, either implicitly created with every by declaring them explicitly.

Next, each variable on the stack will need to have a tag with it to indicate type; furthermore, it would be useful if all items had the same stack 'footprint'. Since there will likely be a limit on the sizes and types of numbers, this is not a serious problem, though it is an unfortunate one. A more serious problem is how to work with strings under such constraints.

Also, since local constructs truly are local, you will not be able to create list structures and return them functionally unless all of the items in the list (both the the cons cells and the values they point to) are declared ahead of time in the global environment. Finally, some standard functions will have to limited, modified or avoided.

A part of my proposal is to have a set of very-low-level primitive from which the more conventional forms can be constructed as library functions and macros. Here is one prospective approach, where the low-level functions are all prefixed with a dollar sign:

[tt]($at item)[/tt]
Returns the memory address of [tt]item[/tt].

[tt]($ref addr)[/tt]
Returns the integer value of memory address [tt]addr[/tt].

[tt]($save addr item)[/tt]
Saves the value of [tt]item[/tt] to memory address [tt]addr[/tt].

[tt]($lambda args body)[/tt] or [tt]($fn args body)[/tt]
Creates a basic function, with the arguments stored in a stack frame.

[tt]($let declarations body)[/tt]
Creates an environment stack frame for [tt]declarations[/tt], and a block of code from [tt]body[/tt].

[tt]($set-car! cell new-car)[/tt]
Takes an existing cons cell [tt]cell[/tt] and changes the first pointer of the pair to the address of [tt]new-car[/tt].

[tt]($set-cdr! cell new-cdr)[/tt]
Takes an existing cons cell [tt]cell[/tt] and changes the second pointer of the pair to the address of [tt]new-cdr[/tt].

More on this later. Alternative formulations would be appreciated.

mystran · Post by **mystran** » Sun Feb 29, 2004 9:57 am

What I'm doing currently, is that I have a compiler under development for my own dialect (similar to Scheme) which features a special form called "inline".

The compiler works by first transforming the Lisp code into intermediate code which is quite similar to what you described. The idea of "inline" special form is that you can pass hand-crafted intermediate code similar to inline-assembly "through" the first translation process.

Then it "linearizes" it by generating continuations which explicitly bind temporaries to symbols. You end up with code that only has primitive functions, which take named variables as arguments, and continuations, which take the result of a single primitive function and bind it to a variable.

This is done explicitly, because it allows the next-transformation to transform the code into continuation passing style, where every primitive takes a continuation to call with the result of it's evaluation as an argument. At this point we never return, but instead explicitly call a "return-continuation". Sounds strange but is simple enough.

After this phase the code is feed to an optimizer, which can do things like eliminate duplicate variables, transform lambda (which explicit environment argument) into simple continuation (when possible) and whatever.

Finally, the code is "simplified" by transforming things like $apply into more primitive ops, until it's basicly assembly with a strange continuation-passing style syntax.

Got this compilation strategy really from Realistic Compilation by Program Transformation and it seems really nice. It's fairly easy to write (although my version is not quite as pure as the one described in the paper) and the structure is very simple, yet allows one to easily make modifications like adding new optimizations.

The "inline" special form (which is at this point remains unproven since the compiler is not ready yet) is my own addition, and my plan is to use it for implementing stuff like garbage collection with the language itself. You can still benefit from all the above translations and optimizations, and inline return a value just like everything else (until it's continualized ofcourse) so you are not limited in where you can put the intermediate code.

Given the presence of "inline", it would also be fairly easy to add frontends for other languages (as long as they use s-expressions) as normal lisp macros. So like usually in Lisp, you could mix several special-purpose languages with generic Lisp, but this time those special-purpose languages need not be limited by what the Lisp happens to provide, as they can access backend compiler directly.

I'll report (with code and all) later when I get this stuff bootstrapped.

Schol-R-LEA · Post by **Schol-R-LEA** » Tue Mar 30, 2004 12:41 pm

I kept meaning to get back to this but didn't get around to it until now. Also, what I have to say is mostly issues yuo are already aware of, Mystrans, and I was hesitant to go repeating the obvious. Still, I thought I bring the issues up before the thread vanished completely out of sight

Anyway, while I like the approach described in the paper you mention - in fact, it very neatly matches certain design ideas I've been considering, and I thank you for pointing it out to me - it still doesn't really address the problem issues of memory management and compile-time type determination. You still would have to change the way memory is handled (or not handled, as the case may be). For example, while this design does use stack frames for the local environments, AFAICT the individual variables are still represented by pointers to the actual objects in heap memory.

Mind you, there is a Scheme dialect already designed for systems work, called Pre-Scheme; it was designed for implementing the Scheme48 interpreter. If you haven't already, you might want to look at the paper which describes the dialect. It discusses at length the issues involved in modifying Scheme for use in low-level environments. I think it is interesting that both papers were primarily the work of Rich Kelsey, and that the Pre-Scheme paper came out almost a decade later; indeed, the PreScheme paper even references the earlier work. I suspect that the actual PreScheme compiler uses at least some of the techniques in question. Interestingly enough, the PreScheme compiler produces C code, as opposed to assembly, presumably for the sake of machine independence.

I say that we should take what we can from both of these papers, and apply what we can from each of them. You seem to be quite well along in doing just that, from what you've said. I'd be very interested in hearing how you've progressed with this.

mystran · Post by **mystran** » Thu Apr 01, 2004 2:06 pm

Indeed, I do know Pre-Scheme, and there's even s-expressions assembly language called COMFY-65 which is basicly structured assembly directly, and which I don't remember an url for.. (could search but too lazy).

In any case, I actually have to admit that I'm slightly on side-tracks currently. I'm working for "fool"-interpreter written in C, which I'm planning to pack into a library so you could embed it into any C-program easily. Notable features would be not including anything not necessarity use it in the first place (so files and other os-stuff comes as libraries..)

There are two reasons: firstly, I want a scripting-lisp to embed, and a few friends have even asked for a small one with customizable set of primitives with very small core (even librep has too much) and working with the original interpreter/compiler/whatever was tedious because it tooks days to compile simple stuff, thanks to slow interpreter running a slow compiler compiling everything to intermediate to be run with the same slow interpreter running on top mzscheme, which is interpreter, so I needed something faster anyway to work with.

Fool so far has about 1/3rd of the core primitives, (basicly pairs and integers work), handles tail-recursion, first-class continuations and functions, and lexical scoping (except set/def currently always use the global environment even when trying to set a local variable, would need to fix that next).

To add to the mix, I'm also writing a simple bug-tracking tool with PHP, since all open source bug-tracking tools are so BAD. So I can't promise the initial release of fool yet.. the cvs is accessible though, so send me mail if you are interested (i'll not give the address here, sorry).

In any case, once I'm done with Wastebug (the bug-tracker) to the point that it is usable, I'll finish fool to the point it's meaningful to release it, and once that's done I'll resume some lisp-coding and hopefully get back to compilation research too.

A final point I wanted to make, pay attention to what Paul Wadler has to say. He uses Haskell, which I don't like that much, but the monad stuff is quite interesting, and once you get it it's quite simple too, and quite usable for lisp as well.

mystran · Post by **mystran** » Tue May 04, 2004 8:45 pm

Ok, boosting this thread once more, and sorry for writing so scientifically, without explaining what stuff means.

A report: ?L has a new compiler, which doesn't have a code-generator yet, but performs the basic "required set" of optimizations, including evaluation of constant expressions, constant propagation, and limited form of partial-evaluation: ?-reductions are done to arguments that are symbols, conditional expressions (this is necessary for proper constant-propagation and boolean short-circuiting) or lambda-expressions only used once.

The basic idea is still converting stuff to CPS first, then doing source-to-source transformations, although I dropped the idea of separate cont/proc continuations (might re-introduce them in additional pass though) and had to try about 20 methods and read about 10 selected papers well before I got it to really work the way I wanted. There's still an issue with compiling local recursion into loops, because I want it to work with several mutually recursive local procedures too, but I have an idea how to fix it too.

The language is properly tail-recursive, as long as the code-gen keeps that property. If code-gen permits, the language also has first-class continuations.

My current plan is to allocate everything from the heap, and use a generational stop-and-copy collector, since that makes closure-analysis basicly a non-issue. It should be possible to out-perform any manual allocation system, and call/cc is free for everyone to use as they see fit without any performance penalty what-so-ever.

The core ?L is currently defined as:

(quote form)
- meanins is same as in almost every existing Lisp dialect
(lambda args . body)
- meaning is same as in Scheme.
(set var value)
- works like set! in Scheme, except returns the value.
(if pred then else)
- again like in Scheme (else is optional), but '() is considered false and everything else true, like is common in other Lisp's. If else is missing, '() is assumed.
(def var value)
- pretty much like a Scheme (define var val). Internally transformed into a form equivalent to a letrec by the compiler.
(begin . expr)
- again like Scheme.

Everything else is really just macros, which work like defmacro in common-lisp (except argument list is Scheme-style), and primitives, which are named with shorter policy than in Scheme.

The major difference with core Scheme is the notion of booleans, and the use of unhygienic macros.

At this point, I'd claim that while it's probably not realistic to write the low-level memory manager, all runtime, or the kernel-stubs for interrupts in ?L, I'd claim that pretty much everything else, including things like thread-scheduling could be done just fine. GC pauses (on major collections) are one question, but different heaps for different modules could make all heaps small-enough to be collectable fast. Futher, one could make some specific parts in C/asm to handle large structures that live long, but need to be teared down fast.

mystran · Post by **mystran** » Tue May 04, 2004 10:24 pm

Oh, I'd like to add that after thinking a bit, I'm considering writing an even more simple compiler, with essentially the same set of features, but without CPS conversion, and doing all the optimization stuff with a macro system. What I'm thinking of could be called an "optimizing interpreter". The idea then is that since everything fed to the interpreter is already optimized, compilation can be done in a VERY straight-forward way (without any special optimizations), and still produce great code.

Also, I'd like to try some stuff like lambda-lifting and such, but we'll see. Anyway, if everything fed to the compiler is already optimized, injecting some inline assembly there will become much more nice.

Schol-R-LEA · Post by **Schol-R-LEA** » Mon Aug 02, 2004 1:15 pm

Ihad an odd brainwave last night, which could well have been from lack of sleep getting to me, but...

Many of the current compilers for Lisp-like languages - Scheme in particular - use standard C as their target language, rather than assembly language, for the sake of portability. All well and good; it makes the compilers very portable, and their is a lot material now on efficiently compiling into C using continuation-passing style.

However, one of the things I've considered using in my OS designs is a system similar to the 'slim binaries' idea in Oberon: rather than compile to either full native code, or to a p-code, these versions of Oberon effectively stops the compilation at the stage of generating an abstract parse tree, which is the saved as a portable program file and used to generate native code at run time. While the specific design used in Oberon is too specific to that language, the idea is essentially sound; it has the portability and compactness of a p-code, while retaining the original program structure for efficient optimization.

However, it was pointed out to me that this would be meaningless in a Lisp - Lisp programs effectively are parse trees!

What has occurred to me is that compiling to C first involves an abstraction inversion - you are transforming the equivalent of a parse tree into code, then back again into a parse tree (during the C compilation, as an intermediate step). Why? In principle, the C code is no more portable than the Lisp code is; less so, in fact. It's just that C has been ported to virtually every system in the universe, and it is easier to piggyback on C than it is to reinvent the wheel. Still, the whole thing stikes me as somewhat peculiar.

Part of what struck me about this is that, assuming that one could view the abstract parse tree created by a C compiler (assuming it generates one - not all compilers do, though most produce something equivalent, such as an intermediate bytecode), it could be represented as a kind of Lisp variant; which means that there must be a way to have a lisp-like structure that can work with low-level operations as C does. It may not be a form that is convenient to program in, but if the logic of it holds, it must at least exist.

Am I making any sense at all, and if I am, does this have any useful implications beyond those already inherent in Pre-Scheme? I think that the idea of a system where C compiles into a form of Lisp is a hilarious idea, but that isn't really the point... what I'm trying to determine is, could this insight be used in the design of a low-level Lisp family language? Any thoughts or rebuttals?

OSDev.org

Language Design: Sibilance

Language Design: Sibilance

Re:Language Design: Sibilance

Re:Language Design: Sibilance

Re:Language Design: Sibilance

Re:Language Design: Sibilance

Re:Language Design: Sibilance

Re:Language Design: Sibilance

Re:Language Design: Sibilance