Page 1 of 1

System programming subsets of Lisp, Smalltalk, Python et. al

Posted: Sat Jan 03, 2004 5:37 pm
by Schol-R-LEA
I have been giving quite a bit of thought into the question of how one would use a very-high-level language such as Scheme or Smalltalk could be used effectively for OS development. Assuming one didn't want to use C or assembly to implement an interpreter-kernel (the classic approach used in Smalltalk-80, the UCSD p System, and JavaOS) , the alternative is to develop a compiled subset of the language that avoids the high-level constructs such as garbage collection, dynamic message-passing, or continuations. Other questions remain, however:
  • Is it feasible to do this?
  • Would doing so lose the desired qualities of the language?
  • Could a FORTH style threaded interpreter be used in place of a compiler?
  • Would efficiency issues that would make this unworkable?
  • Could the subset be dynamically extended to support the full language after the kernel is in place, by means of libraries, system calls, etc., or would their have to be two separate implementations, a subset for kernel-level work and a full version for the rest?
I have been considering my own approach to these issues, and I will try to formulate them coherently RSN. I would like to hear if anyone else has been thinking along these line, and what conclusions they've made about it.

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Sun Jan 04, 2004 12:02 am
by Jamethiel
Schol-R-LEA wrote: I have been giving quite a bit of thought into the question of how one would use a very-high-level language such as Scheme or Smalltalk could be used effectively for OS development.
I've been thinking about Common Lisp, myself.
the alternative is to develop a compiled subset of the language that avoids the high-level constructs such as garbage collection, dynamic message-passing, or continuations. Other questions remain, however:
I don't see why you would neccessarily need to avoid garbage collection. I can see why it would be bad inside something critical like an interrupt handler, but you could probably get away with it for most of the rest of the kernel.
  • Is it feasible to do this?
Absolutely. Google for 'frodef movitz', and take a look.
  • Would doing so lose the desired qualities of the language?
That would depend on how you went about it and which qualities you desire.
  • Could a FORTH style threaded interpreter be used in place of a compiler?
Well, the instant you compile down to token threaded code you have a traditional bytecode interpreter.

And I have spent quite a number of hours going nuts trying to write a Lisp compiler in Forth. It might have gone better if I had a filesystem to store the Lisp code in...
  • Would efficiency issues that would make this unworkable?
Efficiency? On today's multi-hundred-MHz boxes? Please. Maybe on the 12MHz systems of yesterdecade, but not now.
  • Could the subset be dynamically extended to support the full language after the kernel is in place, by means of libraries, system calls, etc., or would their have to be two separate implementations, a subset for kernel-level work and a full version for the rest?
I think you'd need to restrict yourself when dealing with, say interrupt handlers. But having the full language available when writing, say, filesystem drivers would be a major plus. I'm looking towards having a small core of the language permanently in memory and the rest of it be pagable, and having a loader with enough smarts to get that small core into memory in the first place. Then I just have to make sure that the system can handle a page fault without choking, and I'm all set.
I have been considering my own approach to these issues, and I will try to formulate them coherently RSN. I would like to hear if anyone else has been thinking along these line, and what conclusions they've made about it.
One of the things I've been thinking about is how to write a Lisp system entirely in Lisp. From the memory manager through the garbage collector through all the device drivers through the compiler to the UI.

What I figure is that a simple task-switching kernel could be written in ASM that just saves off all the registers into the current task structure, switches out the task structure, and loads up the registers again. Trigger that automatically on interrupts and exceptions, and then handle most of the control logic in Lisp.

Admittedly, a lot of this is at the stage of 'thought experiment', but it's one I've been thinking on for a while.

--Jamethiel

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Sun Jan 04, 2004 12:25 am
by darklife
I wonder if a piggy-back OS made in QBasic is a radically stupid idea. I understand that you couldn't make any useful OS doing this but could write a virtual machine and manage threads within the QBasic code. A lot of DOS can be forgotten about and maybe even implement a home grown file system using more than one disk partition. DOS could reside on a small FAT16 partition to boot the QBasic fake OS and the QB OS could then manage its own file system in the bigger partition. Multitasking and multi-user based system could be possible using the virtual machine. Better yet, code the virtual machine in assembler, but that wouldn't fit the subject :)
A few things would be low level code, like the partition handling, unless QB file functions are only used and I'll bet that would be slow ;D
Sorry, I'm sure the name QBasic makes everyone here sick feeling :-X

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Sun Jan 04, 2004 4:47 am
by Solar
Jamethiel wrote: Efficiency? On today's multi-hundred-MHz boxes? Please. Maybe on the 12MHz systems of yesterdecade, but not now.
You'd be surprised!

One, today PDA's and Smartphones outnumber desktop PC's by orders of magnitude. Space and time are an issue on those thingies.

Two, let me tell you from personal experience that quite many people from the OS development background consider compiled C++ to be a non-option because it's less efficient than compiled C. I'll not re-start that thread here, but those are the people who probably suffer cardiac arrest when you tell 'em that your XYZ interpreter (or whatever) is nice for system work and that efficiency doesn't matter...

;-)

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Sun Jan 04, 2004 5:02 am
by Candy
Jamethiel wrote: Efficiency? On today's multi-hundred-MHz boxes? Please. Maybe on the 12MHz systems of yesterdecade, but not now.
Well, for a number of places it would be acceptable, if properly engineered. Users don't want a fast system after all, they want a responsive system. Windows might be fast, but it responds like a 500-ton freight ship cruising at 50 knots (if you don't get it, try to turn THAT around).

As for responsiveness,if you think it through, the lisp/smalltalk/etc systems can be lots better than the "old" C systems, if only because you a. thought about it and b. you simply have less code.

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Sun Jan 04, 2004 4:49 pm
by darklife

Efficiency? On today's multi-hundred-MHz boxes? Please. Maybe on the 12MHz systems of yesterdecade, but not now.
And the land fills just keep filling up... lol.
I like the idea of making old computers useful again. This is because I started out with a slow computer when I got started. 66MHz iAPX486 when at the time 300MHz systems were new. Of course my system is up to date now, but there are unlucky people out there who have old systems still.
I guess it comes down to this-
Leave the fast systems software to the old computers and
make slow systems software for the new (XP anyone?) ;D

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Sun Jan 04, 2004 6:05 pm
by mystran
I've tried to start a thread somewhat like this a few times now but failed. Anyway..

I've been thinking about this a lot, and I've currently thinking about a possible solution. I've got a basic interpreter for a Lisp dialect quite a lot like Scheme, and I've been playing around with compilation, and come to the conclusions that:

With garbage-collected language, you need something else to support the garbage-collection, namely direct access to memory internals. If the memory allocator knows how to get more memory even when garbage-collector runs out of mem, then there's no reason why it wouldn't be possible write even the garbage-collector with the garbage-collected language itself. For example, if the lisp dialect can guarantee that "cons" never calls GC if called from within GC, then you could write the GC itself in that Lisp, as long as it can access the raw allocation structures.

Once you have GC, list-structured memory is no problem, and you can abandon normal stack pretty much for good (except possibly for optimizing) and work with a list-structured stack instead, giving you continuations pretty much for free. Once you do this, the only reason to actually interpret something is to support "eval", which however can be replaced with a "compile&call" cycle, if you package everything into functions. This is pretty basic stuff ofcourse. So if you want "eval" you need either compiler or interpreter there, but that shouldn't be a problem.

So the question is really, "what is this system programming".
There needs to be enough basic code to start-up the memory-environment to a point where GC can be done. This could be written in ASM or C ofcourse. The language needs ways to access memory around the GC, but that's just a set of primitives or special forms or something, which are the problem of the compiler pretty much, as inlining is probably a good idea.

Is this feasible in kernel? As I see it, there is a problem: if thread scheduling is done with a garbage collected language, you can pretty much forget real-time processing, and could have a problem with some drivers that need low latency. This could be circumvented with some system that doesn't allow GC to kick in at some parts and/or allows pre-emption of the GC, but I'm currently thinking of another idea, which comes down to having a microkernel written in ASM/C. You could then run GC'd code on kernel level too if you wanted. With a clean design it probably wouldn't matter a lot (except for IPC time).

As for performance, I think that compiled code is a good idea, even if the compiler mostly does the parsing and constructs a sequence of calls to primitives, as you don't need to do some much analyzing at runtime. You can write optimizing compiler for Lisp/Scheme. I'd go with something simplier than Common Lisp though. My own approach is something similar to Scheme:

The compiler basicly has to generate calls for immediates, variable lookups, assignments and definitions, quote, lambda and if special forms, and applications. Everything else can be done with macros at compile time. This doesn't necessarily have to mean slow code, but it means quite small compiler, even with optimizations, which means that it might be feasible to put that compiler even into the kernel..

..just my random thoughts though..

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Mon Jan 05, 2004 10:40 pm
by MD
Solar wrote:
Jamethiel wrote: Efficiency? On today's multi-hundred-MHz boxes? Please. Maybe on the 12MHz systems of yesterdecade, but not now.
You'd be surprised!

One, today PDA's and Smartphones outnumber desktop PC's by orders of magnitude. Space and time are an issue on those thingies.

Two, let me tell you from personal experience that quite many people from the OS development background consider compiled C++ to be a non-option because it's less efficient than compiled C. I'll not re-start that thread here, but those are the people who probably suffer cardiac arrest when you tell 'em that your XYZ interpreter (or whatever) is nice for system work and that efficiency doesn't matter...

;-)
Well, they don't know nothing about FORTH, where it come from and where is still mostly used today (embedded systems with very scarce resources, real-time systems and such; there are also FORTH implementations for PDA, palmtops and such). Threaded code FORTH implemetations (vs native code) can be fast and very compact. For example, at Forth Inc. you can read the Application Notes "Evolution of FedEx Package Trackers" (http://www.forth.com/Content/Stories/FedEx.htm) about an hand-held tracking devices programmed with chipFORTH (http://www.forth.com/Content/Products/cFData.htm) a reentrant threaded code implementation with extremely fast real-time multitasking and multi-user executive (no need for an host OS).

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Tue Jan 06, 2004 2:45 am
by Solar
MD wrote:
Well, they don't know nothing about FORTH...
...and precious little about C++, but that doesn't change the fact that they perceive it to be a problem. A problem perceived is almost as bad as a real problem, since almost no-one will make the effort to put something to the test which he "knows" to be inferor.

And there's somewhat of a difference between FORTH and the like as Scheme and Smalltalk, now is it? ;-)

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Wed Jan 07, 2004 1:24 am
by MD
Solar wrote:
MD wrote: Well, they don't know nothing about FORTH...
...and precious little about C++, but that doesn't change the fact that they perceive it to be a problem. A problem perceived is almost as bad as a real problem, since almost no-one will make the effort to put something to the test which he "knows" to be inferor.
Oh, I understand your point now and, yes, I agree. But putting in front of their noses an example to demonstrate that their assumptions are wrong, can improve this situation. So I hope that our friend Schol-R-LEA will succeed.
Solar wrote: And there's somewhat of a difference between FORTH and the like as Scheme and Smalltalk, now is it? ;-)
Yes, somewhat. But there are both Scheme and Smalltalk implementations based on stack VMs and threaded code, like FORTH. OO extensions for FORTH (for the reader not familiar with FORTH: it is an extensible language, extensible within the language itself), some class-based as Smalltalk other prototype-based as Self or Java(ECMA)Script. There is a FORTH implementation, HolonFORTH (http://holonforth.com/), with an environment (browser, editor, ..) similar to Smalltalk, but with a text-based interface (thus don't expect a Squeak http://squeak.org/ in FORTH). There are FORTH extensions for list processing.

At least both a Scheme and a Smalltalk implementations in FORTH (IIRC, the Smalltalk presented in a book and the Scheme no more available on the 'net), not to mention Prolog. Remember that implementing a language on top an extensible language like FORTH (or Lisp) is different than implementing the same language in, say, C.

An example of such a sophisticated multilanguage environment is Poplog http://www.poplog.org/ (POP-11, Common Lisp, Prolog, ML), originated at AI dept. of University of Edinburgh, it was also an expensive commercial product used for AI applications; now is free software (after that the commercial distributor was acquired by SPSS, which was interested solely in the data mining product, Clementine http://www.spss.com/spssbi/clementine/, written under Poplog). The host language is POP-11, which incidentally is an extensible language similar to FORTH (incidentally because was developed in parallel and without knowing the existence of FORTH), a sort of high-level FORTH with a smell of Lisp, with a little Algol-like (think Pascal) syntactic sugar, but an open stack, list processing facilities, GC and such.

The POP family of languages is particular notable if you are interested in system level programming: there was an entire OS written in POP-2, the Multipop Timesharing System see Integrating Operating System Design with Language Design http://www-robotics.cs.umass.edu/~pop/OS.html, written by prof. Robin Popplestone http://www-robotics.cs.umass.edu/~pop/home.html and colleagues, with quite interesting and original (at least nowadays) system-level implementation techniques running on hardware with limited resources (no hardware memory protection, for example: a certain level of security was granted by the language).

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Wed Jan 07, 2004 2:25 am
by Solar
All fine and well. But today's kernel code slingers know their C, and usually cry bloody murder if you ask them to learn anything else just to help with your work. It's a matter of return-on-investment.

For example, there are various OS projects around I'd be interested in, one of them being Clicker. (Hi, Pype! ;-) ) But he's using some pseudo-dialect he's implemented for C, which I don't know, and being notoriously short on time I am not in a mind of learning it, so no Clicker for me.

Bottom line, the farther away your language is from C, and the less well known, the harder it is to coax people into helping you out. (And the smaller the prewritten code base you could tap into.)

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Wed Jan 07, 2004 3:55 pm
by voodoo1man
mystran wrote: With garbage-collected language, you need something else to support the garbage-collection, namely direct access to memory internals. If the memory allocator knows how to get more memory even when garbage-collector runs out of mem, then there's no reason why it wouldn't be possible write even the garbage-collector with the garbage-collected language itself. For example, if the lisp dialect can guarantee that "cons" never calls GC if called from within GC, then you could write the GC itself in that Lisp, as long as it can access the raw allocation structures.
This sounds very much like how the GC was written for the T dialect of Scheme. See Olin Shiver's fascinating account of the project (it also has some pretty good pointers to OSes in high level languages). Then there's the approach of a specially-defined subset of a language used for bootstrapping. Scheme48 is based on such a dialect, called "pre-scheme", and apparently at one time it was quite popular for programming robots. Richard Kelsey, who made the thing, also wrote a paper on it; you can find it here.
So the question is really, "what is this system programming".
There needs to be enough basic code to start-up the memory-environment to a point where GC can be done. This could be written in ASM or C ofcourse. The language needs ways to access memory around the GC, but that's just a set of primitives or special forms or something, which are the problem of the compiler pretty much, as inlining is probably a good idea.
Which is why the original poster's question should have been about extensions, not subsets of HLLs for system programming.
Is this feasible in kernel? As I see it, there is a problem: if thread scheduling is done with a garbage collected language, you can pretty much forget real-time processing, and could have a problem with some drivers that need low latency. This could be circumvented with some system that doesn't allow GC to kick in at some parts and/or allows pre-emption of the GC, but I'm currently thinking of another idea, which comes down to having a microkernel written in ASM/C. You could then run GC'd code on kernel level too if you wanted. With a clean design it probably wouldn't matter a lot (except for IPC time).
Well, the most logical way to do this seems to me to have the GC itself run as a separate thread (this might later help to implement parallel GC on multiprocessor systems), and the thread scheduler be written to be non-consing (whether through a special coding style in the HLL or assembler or something (although I much prefer the former)). This wouldn't be too different from an explicit exokernel.

(Sorry, apparently I'm being too verbose and I'll have to split my post into two pieces).

Re:System programming subsets of Lisp, Smalltalk, Python et.

Posted: Wed Jan 07, 2004 3:59 pm
by voodoo1man
(here's the second part)
As for performance, I think that compiled code is a good idea, even if the compiler mostly does the parsing and constructs a sequence of calls to primitives, as you don't need to do some much analyzing at runtime. You can write optimizing compiler for Lisp/Scheme. I'd go with something simplier than Common Lisp though. My own approach is something similar to Scheme:
Well, the bulk of CL is just an add-on library, and between the various free implementations in their early boot-strap stages, I bet you can find a lot of it already written. Most of the control structures can be implemented meta-circularly in terms of a restricted set of constructs. Henry Baker wrote an excellent paper on the subject.
The compiler basicly has to generate calls for immediates, variable lookups, assignments and definitions, quote, lambda and if special forms, and applications. Everything else can be done with macros at compile time. This doesn't necessarily have to mean slow code, but it means quite small compiler, even with optimizations, which means that it might be feasible to put that compiler even into the kernel.
Steele used the same set of assumptions for his Rabbit Scheme compiler, but there's a limit to how much efficiency you can get out of this for the amount of work you do. For the (meager) benefits you get, I suspect that you're better off using a VM instead, which immediately gives you some nice benefits like (slightly better) portability, introspection, virtualization and easy persistence. Otherwise, re-using an existing optimizing compiler (or building your own - there's a lot of good new ideas out there right now that make it worthwhile) and using that in user-space (because quite likely you'll want to do a lot of optimizations) is the best bet. Of course, this all depends on how much run-time code generation you want to do, and there's no reason you can't mix any combination of the three approaches.

I just recently found out about Movitz, which is an extension of Common Lisp for doing systems programming. It seems to be in very preliminary stages right now. It has it's own (cross) compiler, and doesn't yet try to deal with garbage collection (which I think is a good thing, because in the abscence of a host OS and need for interfacing with C, a more novel GC strategy is likely to be far more efficient).