anon19287473 wrote:
If you want to write an OS in a non-native language, at least use LISP, then you OS can rewrite itself!
Arguably, the only native language for a processor is it's own machine code. One could even claim that when you break each byte into two 4-bit parts in order to display it in a hexadecimal format, you are no longer working in the truly native format, though in this case the required conversion in trivial and bijective.
Once you introduce a macro-assembler which automatically selects best possible (usually shortest) encoding for a given logical instruction (such as using shortest possible offset encoding for a relative jump), you lose this bijection; now several "native" encodings would map to same logical encodings in the assembler language upon disassembly.
You can ofcourse regain bijection by only considering the semantics of the native encoding, instead of it's physical form or alternatively you could establish a bijection between native encoding generated by said assembler, and it's disassembly, provided your assembler is deterministic. At this point stuff is still pretty theoretical and uninteresting....
Now, what this has to do with rewriting a system on the fly? Well, when you introduce offline components like a linker, stuff gets more interesting: consider a linker, which takes care of replacing labels with real addresses, and links the result in a flat binary. Now, starting from the original source with labels, assembling and linking, we have no way to regain that information when disassembling... unless ofcourse we store debugging symbols or some such...
However, for the purposes of dynamically replacing a system, we not only want to be able to disassemble the system, but also modify it, then assembling it again. Ofcourse if we keep the original source, we can cheat somewhat by modifying said source, and replacing previous compiled version with the newly modified and compiled version. However, if the system contains information which is not accessible at runtime, then modifying such information at runtime will likewise be impossible.
The nature of the language used is not really important. Rather, in order to actually avoid rewriting native code on the fly, we want to use late-binding for everything that is a candidate for runtime modification. If stuff like procedure addresses are hardcoded into other procedures, then we need to modify those procedures in order to replacing another procedure... or implement some sort of kludges in order to reroute the calls. If on the other hand, there is an indirection with said calls, then simply modifying this indirection will be enough.
If Lisp is good for this, it is simply because (and only when) said Lisp is very late-binding. Smalltalk could easily do the same. There is even research on extending C++ like vtables to support dynamic late-binding object models, which allow methods to be replaced on the fly.
Doing a complete replacement of a system is ofcourse not quite trivial, since one will normally need at least small amount of "black box" runtime code to get any higher level language (including C) running, but if said runtime is only needed to get the actual system running, and any parts of it necessary during execution use some form of late-binding, then it becomes possible to replace those parts after startup and we can consider the system able to rewrite itself as a whole.
--------------
I am actually doing some research on this stuff right now. The current approach I'm looking into basicly considers a "byte code" form to be the "native" or canonical representation of code. With a fairly simple indirection mechanism one should be able to arrange functions to be JIT compiled to native form the first time they are called. At any point of time, one can throw the native representation away, and instead resort to evaluating the bytecode directly, or JIT compile it again.
So why bother? Well, the point is to be able to garbage collect code. Now, if the native code is nothing but a cached version of the bytecode, and the bytecode is nothing but data, then one avoids having to disassemble native code to see what data is refers to. It also means that if something is statically bound at JIT compilation time, if those assumptions (like locations) change later, all we have to do is throw the native code away, and it'll get recompiled automatically as the system attemps to execute it.
Finally, dumping the whole system state into a file becomes a lot easier if you don't need to try to dump native code. Instead just have the heap with the bytecode dumped, and have the loader contain an interpreter. JIT compiler can be bytecode in the heap just like everything else.
....
Well, then the idea is to throw a REPL on it, get it to run in priviledged mode on bare hardware, without relying on static runtime after startup, give it direct access to memory, and have fun.
And after that add a bytecode validator which can figure out if running some code would be dangerous, and you got language based security.
And yeah, my current plan is to use something like Lisp. Actually Scheme without first-class continuations would be closer.