A (prototype) functional language for system programming
Posted: Tue Sep 22, 2009 7:27 pm
Over the last few weeks, I've been experimenting with different programming language paradigms. I've always really liked functional languages, because they can often express ideas so much more simply and clearly than imperative ones. I also like many of the features of Lisp, like code usable as data and vice versa, effective integration of new features and macros, and easy recursion over linked lists.
It would be great to have the expressive power of a functional language on the OS level, and I know there are hobby OSes written in Scheme, Haskell, etc. However, most functional languages are not great for osdev: they usually require garbage collection, bignum libraries, and complex runtimes; they also lack pointer arithmetic.
So I decided to design a language that both has full functional features and can be run easily on the bare hardware. I also added many features useful for both kernel and user level programming, such as iteration over variable sized arrays, less error prone manual memory allocation, and a subset of OO that I'll explain later. What resulted is probably best described as a semi-statically typed Lisp dialect with a heavy C influence - I'm currently calling it SLIP: SLIP has Lambdas, Iterators, and Pointers.
The type system is not complex on the surface, much like C's: (the comments are Lisp style)
However, every variable is actually an implicit pointer, and also has an implicit size - this is important to SLIP's iterators, and the extra stack space it requires is reasonable due to the low number of variables that should be used. An int type is actually an array of any number of integers (which are all 8 bit). That means you can do this:
You can compose types as constant-sized arrays of other types, which is why the integer type is only 8 bit. Functions can operate on int[4]s as atomic types, as well as int[42]s or whatever else is desired, as long as the arithmetic operators can handle it. The compiler will use special operators for above-native-sized integers, so pretty much anything will work. This differs from a bignum in the fact that it can overflow, doesn't dynamically allocate (and therefore can be on the stack), and should be at least somewhat faster. Composed integers can be implicitly casted to each other. It doesn't work well or at all for non-integers though.
One useful thing this allows is manipulation of all binary structures (e.g. GDT entries, bitmaps) as coherent integer types.
Pointers and arrays will be discussed later.
SLIP, like I claimed, has a very Lispy syntax, as well as anonymous functions:
The main difference is the (imo) better lambda syntax and static typing, which is barely noticeable. In fact, SLIP can infer types, at least for return values, because there is one return point.
SLIP is also able to work imperatively. The function seq is similar to Lisp's begin, and allows procedures. Variables are usually constant, but if the type is suffixed with an "!", it makes the variable actually variable:
SLIP also has iterators. These allow fast and flexible manipulation of arrays, while maintaining purity. The "seq" and "par" functions call a given function for every index of an array, sequentially and in parallel, respectively. Because each array also has its size stored with it, iterators become very flexible, and allow functions to manipulate aribtrary amounts of data. Note that seq is actually doing the same thing here as in the imperative style example - it runs each of the things in the list in front of it sequentially, whether they be ints or functions.
Of course, SLIP also has pointer arithmetic. Although all variables are actually pointers, the "=" operator (when used infix; = as prefix is equality) copies data instead of setting pointer positions. Instead, the "@" operator sets pointers:
Arrays/pointers have slices, just like in Python, which can be used for manipulating both pointers and the size of arrays.
Return values can be compound data types and arrays, provided their size is fixed:
This requires no dynamic memory allocation - the space for the return array is stored in the stack frame of the caller, and is written to by the callee. Just like the = operator copies any amount of data within a function, it also copies any amount of data from other functions. The EBP register is used for this purpose in the calling convention.
SLIP also has structures similar to those in C, but with an important and very useful modification. The two keywords "mod" and "acc" (modifier and accessor) are used for a feature I call "variable masquerading" (from IP masquerading and its similar concept) - a way to invisibly implant a function into a variable access. They define functions on top of structure elements, and act like "gatekeepers", modifying or even redirecting memory accesses, just like accessors and modifiers sometimes do in Java or other OO languages. However, these functions are not necessary, and will not cause wasted cycles like with implicit modifiers and accessors. Here's an example:
This feature allows the kind of changes to structure/object interfaces previously only possible with tedious generic modifiers and accessors factored into class design. The inside of a SLIP structure can be completely and invisibly refactored. There can also be different modifiers and accessors used by different files of code, by taking advantage of a header system.
I'd really like to know what people think of this prototype design. I think it would be a useful language in a lot of different situations, and a very expressive one. I'm not going to be rewriting my kernel in it, but I've designed it as if I were, and I may use it or a modification of it for other parts of my system. Its REPL may also be useful as a system shell, if some syntactic sugar were added to remove the outside parentheses, and a pipe system added. Any questions or comments are welcome; I haven't put too much actual work into this, so I won't be sad if it isn't implemented.
Also, if it wasn't clear, this is intended to be a compiled language with a Lisp-like REPL.
It would be great to have the expressive power of a functional language on the OS level, and I know there are hobby OSes written in Scheme, Haskell, etc. However, most functional languages are not great for osdev: they usually require garbage collection, bignum libraries, and complex runtimes; they also lack pointer arithmetic.
So I decided to design a language that both has full functional features and can be run easily on the bare hardware. I also added many features useful for both kernel and user level programming, such as iteration over variable sized arrays, less error prone manual memory allocation, and a subset of OO that I'll explain later. What resulted is probably best described as a semi-statically typed Lisp dialect with a heavy C influence - I'm currently calling it SLIP: SLIP has Lambdas, Iterators, and Pointers.
The type system is not complex on the surface, much like C's: (the comments are Lisp style)
Code: Select all
(def int:x = 0) ;; defines an integer x as type int
(def x = int:0) ;; same as before
(def (int x) = 0) ;; also the same; ":" constructs a list like in Haskell, and so does ()
(def x = (int 0)) ;; also also the same, for the same reason
Code: Select all
(def x = int:(1 2 3 4)) ;; similar to the C code "int *x = {1, 2, 3, 4};"
(size x) ;; equal to 4
Code: Select all
(type int[2] int16) ;; defines the type "int16" composed of two ints
(def int16:x = (0x1234 0x4321 0x1234))
(~ x) ;; equal to (0xEDCB 0xBCDE 0xEDCB) and 0xEDCBBDCEEDCB
Pointers and arrays will be discussed later.
SLIP, like I claimed, has a very Lispy syntax, as well as anonymous functions:
Code: Select all
;; Common Lisp:
(defun fact (n)
(* n (fact (- n 1)))
(defun apply (n f a)
(if (= n 0) a
(f (apply (- n 1) f a))))
(apply 3 (lambda (n) (* n 2)) 2) ;; should be 2^4 = 16
;; SLIP:
(fun fact (int:n)
(* n (fact (- n 1))))
(fun apply (int:n (int):f int:a)
(if (= n 0) a
(f (apply (- n 1) f a))))
(apply 3 (int:n):(* n 2) 2) ;; should be 2^4 = 16
SLIP is also able to work imperatively. The function seq is similar to Lisp's begin, and allows procedures. Variables are usually constant, but if the type is suffixed with an "!", it makes the variable actually variable:
Code: Select all
;; wrong
(fun int:main ()
(seq
(def int:x = 0)
(x = 5)))
;; right
(fun int:main ()
(seq
(def int!:x = 0)
(x = 5)))
Code: Select all
(fun inc_all (int:a)
(seq a (i):(+ a[i] 1)))
(inc_all int:(1 2 3 4)) ;; returns (2 3 4 5)
(inc_all int:(5 4 3 2 1)) ;; returns (6 5 4 3 2)
Code: Select all
(def hello = "hello, world!")
(def hello_ptr @ hello)
(def hello_ptr2 @ hello[2])
(def fixed_addr @ ptr:0xF8000000) ;; points to 0xF8000000, size is undefined
(print hello) ;; prints "hello, world!"
(print hello_ptr) ;; prints "hello, world!"
(print hello_ptr2) ;; prints "llo, world!"
Code: Select all
(def hello = "hello, world!")
(def hello2 @ hello[:5]) ;; contains "hello"
(def hello3 @ hello[7:]) ;; contains "world!"
Code: Select all
(fun char[5]:cake? ()
(just "a lie"))
(def x = cake?)
(print "the cake is %s\n" x) ;; prints "the cake is a lie"
SLIP also has structures similar to those in C, but with an important and very useful modification. The two keywords "mod" and "acc" (modifier and accessor) are used for a feature I call "variable masquerading" (from IP masquerading and its similar concept) - a way to invisibly implant a function into a variable access. They define functions on top of structure elements, and act like "gatekeepers", modifying or even redirecting memory accesses, just like accessors and modifiers sometimes do in Java or other OO languages. However, these functions are not necessary, and will not cause wasted cycles like with implicit modifiers and accessors. Here's an example:
Code: Select all
;; normal structure
(struct foo
(int a)
(int b))
(def x = foo!:(1 2)) ;; contructs a "foo" structure
(x.a = 77)
(print x.a) ;; prints "77"
;; partially masqueraded structure
(struct bar
(int a)
(int b))
(mod bar:a (n) (* n 2)) ;; all modifications to a in bar are multiplied by 2
(def y = bar!:(1 2)) ;; contructs a "bar" structure
(y.a = 77) ;; y.a actually set to 154
(print y.a) ;; prints "154"
;; more masqueraded structure
(struct baz
(pseudo int a) ;; the element a doesn't actually exist in memory, it is only masqueraded
(int! b))
(mod baz:a (n) (b = n)) ;; all modifications are redirected to b
(acc baz:a (n) (42)) ;; all accesses to a return 42
(def baz!:z)
(z.a = 66) ;; sets z.b to 66
(print z.a) ;; prints "42"
I'd really like to know what people think of this prototype design. I think it would be a useful language in a lot of different situations, and a very expressive one. I'm not going to be rewriting my kernel in it, but I've designed it as if I were, and I may use it or a modification of it for other parts of my system. Its REPL may also be useful as a system shell, if some syntactic sugar were added to remove the outside parentheses, and a pipe system added. Any questions or comments are welcome; I haven't put too much actual work into this, so I won't be sad if it isn't implemented.
Also, if it wasn't clear, this is intended to be a compiled language with a Lisp-like REPL.