A VM os

salil_bhagurkar · Post by **salil_bhagurkar** » Tue May 06, 2008 5:32 am

Hey all!

I have thought of designing a vm based operating system. I already have a small working kernel. Instead of making programs for the target processor and compiling them i wish to have a native code (like assembly opcodes) in my os for programs in which i will code applications.

How feasible is this idea? Has anyone of you tried it..
Compiling for the native code could be tough (making the code and writing the assembler and all...) but i have a workaround for that. What i am concerned about is what would be performance of such a system.

We have JVMs, but still they are slow...So will such a system be also too slow?

Ready4Dis · Post by **Ready4Dis** » Tue May 06, 2008 5:56 am

Well, look into singularity or a CIL based OS (I know someone is working on one). Similar in concept, except instead of using a virtual machine, you actually compile the intermediate code to native code when the program is loaded/run (or during run-time, or on the first run and save the binary for consecutive runs, etc). This means the OS will be very fast (as it's native code) and as safe as the compiler wants to be (without even needing much security at the low level). A full OS that just runs as a VM will be slower than an OS written for the platform, but can still be relatively fast if done properly.

grover · Post by **grover** » Tue May 06, 2008 9:48 am

Is your perception of slowness backed by measurements or by story telling?

Bytecode, P-Code, Intermediate Language (or whatever you want to call them) based languages and operating systems have lots of advantages over classical operating systems written in C or Assembly. These may not be obvious and do take a lot of effort to achieve, but are possible.

I would not suggest to develop something like that from scratch. Instead join one of the many operating system efforts based in current technology:

- Java: JNode
- .NET: SharpOS, Ensemble, Cosmos (I'm sure there are others around.)

Additionally members from SharpOS, Ensemble and JNode have started an open discussion forum dubbed Managed Operating System Group/Alliance. To have a discussion forum for specific "problems" found while developing managed operating systems.

Dex · Post by **Dex** » Tue May 06, 2008 1:07 pm

tonyMac a member of "Team DexOS" is working on such a project, see here:
http://dex.7.forumer.com/viewtopic.php?t=272

JamesM · Post by **JamesM** » Tue May 06, 2008 2:43 pm

Dex wrote:tonyMac a member of "Team DexOS" is working on such a project, see here:
http://dex.7.forumer.com/viewtopic.php?t=272

Hmm, out of idle curiosity - your team member says he is creating a RISC-style VM instruction set, yet decides to implement only 4 general purpose registers.

What, out of pure interest, was the logic behind that? Although RISC insn. sets are not defined as having lots of GPRs per-se, the fact that they lack complex addressing modes seems to (merely from observing real life examples like SPARC and MIPS) mean large register files are created to allow for this and increase performance.

I was just wondering why the implementor decided to go with a simple addressing mode, small register file design?

Cheers,

James

Dex · Post by **Dex** » Tue May 06, 2008 3:01 pm

JamesM wrote:
Dex wrote:tonyMac a member of "Team DexOS" is working on such a project, see here:
http://dex.7.forumer.com/viewtopic.php?t=272
Hmm, out of idle curiosity - your team member says he is creating a RISC-style VM instruction set, yet decides to implement only 4 general purpose registers.

What, out of pure interest, was the logic behind that? Although RISC insn. sets are not defined as having lots of GPRs per-se, the fact that they lack complex addressing modes seems to (merely from observing real life examples like SPARC and MIPS) mean large register files are created to allow for this and increase performance.

I was just wondering why the implementor decided to go with a simple addressing mode, small register file design?

Cheers,

James

I will PM him a link to your ? .

Schol-R-LEA · Post by **Schol-R-LEA** » Tue May 06, 2008 4:45 pm

JamesM wrote:
Dex wrote:tonyMac a member of "Team DexOS" is working on such a project, see here:
http://dex.7.forumer.com/viewtopic.php?t=272
Hmm, out of idle curiosity - your team member says he is creating a RISC-style VM instruction set, yet decides to implement only 4 general purpose registers.

What, out of pure interest, was the logic behind that? Although RISC insn. sets are not defined as having lots of GPRs per-se, the fact that they lack complex addressing modes seems to (merely from observing real life examples like SPARC and MIPS) mean large register files are created to allow for this and increase performance.

I was just wondering why the implementor decided to go with a simple addressing mode, small register file design?

While I cannot answer for the DexOS team, I will point out that, unlike with hardware systems, the difference in speed between registers and main memory in a pseudo-machine is negligible - may not exist at all, in fact, depending on the implementation - and a smaller register set in a p-code is usually slightly faster on individual accesses (since the named registers can be implemented as global variables, rather than as elements in an array as would most likely be the case for larger register file). Note also that many p-machines are implemented as pure stack machines, including both JVM and the classic UCSD Pascal p-system (based in part on the even earlier Pascal-P compiler): since there are no speed benefits in using 'registers', they can simplify things greatly this way.

I would say, however, that a RISCy bytecode is probably counter-productive, especially if you are designing it to be targeted for a specific language al la JVM. One of the advantages of p-machines is that if you know what high-level operations are likely to be used heavily, you can implement them in the bytecode set; this let's you leverage the real machine implementation for both memory and and speed savings.

It should be remembered that the original impetus for p-machines back in the 1940s was to reduce memory use by encoding common operations (specifically floating-point operations, back when even the most basic FP operations were difficult to implement) in a more compact form. Bytecode does not need to be limited to the same restrictions as hardware; the RISC principles - load and store architecture, simplified addressing modes, limited instruction set providing only the most frequently used primitives - don't apply - were aimed in part at eliminating the need for microcoding, but the whole point of a bytecode machine is to be a higher-level encoding.

Most of the other advantages of RISC are in reducing the complexity of compilers, by making the target code more regular - but if a bytecode is designed to support the language that is being compiled, it can make it simpler still by providing just those facilities the language relies on. If the bytecode is meant to be a general-purpose facility, such as CIL, it doesn't make as much of a difference - though encoding common primitives for, say, I/O, would still make a faster and more compact p-machine.

(And yes, I said 1940s - pseudo-machines don't just predate modern languages, they predate assemblers. Code was usually written in an assembly-like notation, but then translated into hex or octal by hand - with machines as expensive and cranky as they had, using the machine to translate it's own programs was considered terribly wasteful, especially since it would have taken longer than the MTBF for anything non-trivial. Hence the need for a higher-level pseudo-machine when it came to something as complicated as FP.)

Ready4Dis · Post by **Ready4Dis** » Tue May 06, 2008 8:44 pm

grover wrote:Is your perception of slowness backed by measurements or by story telling?

Bytecode, P-Code, Intermediate Language (or whatever you want to call them) based languages and operating systems have lots of advantages over classical operating systems written in C or Assembly. These may not be obvious and do take a lot of effort to achieve, but are possible.

I would not suggest to develop something like that from scratch. Instead join one of the many operating system efforts based in current technology:

- Java: JNode
- .NET: SharpOS, Ensemble, Cosmos (I'm sure there are others around.)

Additionally members from SharpOS, Ensemble and JNode have started an open discussion forum dubbed Managed Operating System Group/Alliance. To have a discussion forum for specific "problems" found while developing managed operating systems.

My perception is from testing. I have written multiple VM's with benchmarkings, if you want one to play with it I think I still have code laying around, I even wrote an assembler for it. I ran many tests with a couple of different methods for op-code calling conventions (function table, switch statement, switch with inlines, etc) to find the fastest method of getting the work done. I could go further and optimize it to be in assembly, but, in general it is still MUCH slower than a CPU running. Now, as I mentioned, if the inetermediate language is compiled to machine code, then it will obviously have a big speed benefit, also the higher level calls don't get affected as heavily (if you have a lot of high level calls, rather than the risc type system described).

JamesM · Post by **JamesM** » Wed May 07, 2008 1:29 am

Schol-R-LEA wrote:
JamesM wrote:
Dex wrote:tonyMac a member of "Team DexOS" is working on such a project, see here:
http://dex.7.forumer.com/viewtopic.php?t=272
Hmm, out of idle curiosity - your team member says he is creating a RISC-style VM instruction set, yet decides to implement only 4 general purpose registers.

What, out of pure interest, was the logic behind that? Although RISC insn. sets are not defined as having lots of GPRs per-se, the fact that they lack complex addressing modes seems to (merely from observing real life examples like SPARC and MIPS) mean large register files are created to allow for this and increase performance.

I was just wondering why the implementor decided to go with a simple addressing mode, small register file design?
While I cannot answer for the DexOS team, I will point out that, unlike with hardware systems, the difference in speed between registers and main memory

in a pseudo-machine is negligible - may not exist at all, in fact, depending on the implementation - and a smaller register set in a p-code is usually slightly faster on individual accesses (since the named registers can be implemented as global variables, rather than as elements in an array as would most likely be the case for larger register file). Note also that many p-machines are implemented as pure stack machines, including both JVM and the classic UCSD Pascal p-system (based in part on the even earlier Pascal-P compiler): since there are no speed benefits in using 'registers', they can simplify things greatly this way.

I would say, however, that a RISCy bytecode is probably counter-productive, especially if you are designing it to be targeted for a specific language al la JVM. One of the advantages of p-machines is that if you know what high-level operations are likely to be used heavily, you can implement them in the bytecode set; this let's you leverage the real machine implementation for both memory and and speed savings.

It should be remembered that the original impetus for p-machines back in the 1940s was to reduce memory use by encoding common operations (specifically floating-point operations, back when even the most basic FP operations were difficult to implement) in a more compact form. Bytecode does not need to be limited to the same restrictions as hardware; the RISC principles - load and store architecture, simplified addressing modes, limited instruction set providing only the most frequently used primitives - don't apply - were aimed in part at eliminating the need for microcoding, but the whole point of a bytecode machine is to be a higher-level encoding.

Most of the other advantages of RISC are in reducing the complexity of compilers, by making the target code more regular - but if a bytecode is designed to support the language that is being compiled, it can make it simpler still by providing just those facilities the language relies on. If the bytecode is meant to be a general-purpose facility, such as CIL, it doesn't make as much of a difference - though encoding common primitives for, say, I/O, would still make a faster and more compact p-machine.

(And yes, I said 1940s - pseudo-machines don't just predate modern languages, they predate assemblers. Code was usually written in an assembly-like notation, but then translated into hex or octal by hand - with machines as expensive and cranky as they had, using the machine to translate it's own programs was considered terribly wasteful, especially since it would have taken longer than the MTBF for anything non-trivial. Hence the need for a higher-level pseudo-machine when it came to something as complicated as FP.)

Hi,

A good point well made - For some reason, even though I program an architecture translator as my day job, I'd forgotten that every emulated register lives in an abstract register bank and as such 'register' loads and stores actually write to memory (as well as a target register, possibly).

I was a little tired after a long day at work when I wrote that!

And, wrt the 1940's, back then they had grad students to do the assembly for them

Cheers,

James

grover · Post by **grover** » Wed May 07, 2008 9:33 am

Ready4Dis wrote:My perception is from testing. I have written multiple VM's with benchmarkings, if you want one to play with it I think I still have code laying around, I even wrote an assembler for it. I ran many tests with a couple of different methods for op-code calling conventions (function table, switch statement, switch with inlines, etc) to find the fastest method of getting the work done. I could go further and optimize it to be in assembly, but, in general it is still MUCH slower than a CPU running. Now, as I mentioned, if the inetermediate language is compiled to machine code, then it will obviously have a big speed benefit, also the higher level calls don't get affected as heavily (if you have a lot of high level calls, rather than the risc type system described).

Of course any sort of interpreter is slower than a Jit. This can obviously be seen by the languages/platforms I quoted as each of them has at least one Just-In-Time compiler implementation.

The jit has system knowledge, which no classical compiler has. Its optimizations can easily surpass most ahead of time compilation scenarios.

Ready4Dis · Post by **Ready4Dis** » Wed May 07, 2008 12:54 pm

grover wrote: Of course any sort of interpreter is slower than a Jit. This can obviously be seen by the languages/platforms I quoted as each of them has at least one Just-In-Time compiler implementation.

The jit has system knowledge, which no classical compiler has. Its optimizations can easily surpass most ahead of time compilation scenarios.

That's what I said in my original statement! Why the argument? So instead of trying to argue and tell me my statements are incorrect, please read it again and realize I said that compiling the intermediate code to native is very fast (if done right), and if you implement a true VM it will be slower. I don't see why everyone on this board feels the need to come and try to undermine someone without fully reading (understanding?) what they've said.

tonymac32 · Post by **tonymac32** » Wed May 07, 2008 1:16 pm

Hello all,

I'm the programmer making the VM, I'm surprised to see it mentioned here. In any case, my intent is to make a simple VM, mostly as a learning experience, and hoping it could be useful to someone else for similar use. I've kept the number of registers down to make it possible to run a VM on something other than X86, such as various micros.

Thank you for the constructive criticism, I'm open to any and all suggestions. My schedule is a bit tight right now to to college, but I may get some work done on it here after bit. I'm also working on a BASIC-like abstraction layer for FASM and some general purpose I/O routines, so my time gets divided throughout.

Ready4Dis · Post by **Ready4Dis** » Mon May 19, 2008 5:11 am

tonymac32 wrote:Hello all,

I'm the programmer making the VM, I'm surprised to see it mentioned here. In any case, my intent is to make a simple VM, mostly as a learning experience, and hoping it could be useful to someone else for similar use. I've kept the number of registers down to make it possible to run a VM on something other than X86, such as various micros.

Thank you for the constructive criticism, I'm open to any and all suggestions. My schedule is a bit tight right now to to college, but I may get some work done on it here after bit. I'm also working on a BASIC-like abstraction layer for FASM and some general purpose I/O routines, so my time gets divided throughout.

Well, I have a suggestion for working on the VM. I would not use registers, they limit you to much, and the point of a VM is to remove the limits imposed by the host machine. Simply use variables/pointers. If it is being interpreted, a read/write from/to a memory address is the same speed as a read/write from/to a register since they are both simply in memory on the host. Now, if you later decide to write a compiler for your VM -> x86 so you can run it natively, your backend compiler can easily assign a register to any variable it deems necessary (or with hinting of sorts). Then, later you decide to change to a microchip that has very few, and differently used registers, there is no reason to change your source code to fix the problem, the backend compiler for the target chip would assign the appropriate variables to registers. I really see no benefit from even simulating registers in a VM unless you plan on making hardware from it and are writing the VM for this specific reason. The only other reason i can think of is to save a bit on code size (for example, a push reg1 may only be a 1 byte operation, while push var1 takes 5 bytes, 1 for the opcode and 4 for the variable, or maybe more if you're supporting 64-bit machines (although i'd use another opcode for a 32-bit variable push and 64-bit variable push, so that may not be an issue).

SpooK · Post by **SpooK** » Wed May 21, 2008 10:02 pm

Imagine performing multiple passes on RISC-type VM code for native compilation. The compiler can profile which simpler instructions combine to make the most efficient series of more complex instructions. If you happen to be targeting RISC architecture, you'll have a better 1:1 correlation between your VM and the underlying machine. Moreover, atomic operations gain precision. Hard disk space is cheaper than RAM or caches.

Also, the ideas that drive the discussion of VM registers in this thread seem to suffer from "inside the box" thinking. Imagine using process-global memory (or even stack space) to hold a small (< 4KB) virtual register space. Let the native compiler optimize when real registers are used during instruction operations. If done right, you'll end up with a nice and tidy cache-friendly native binary.

I would like to reiterate, that the above ideas are centered around native compilation as a favored practice. If you intend on running a strictly software-based VM OS, all I can recommend is to read up on JIT compilation, and use it whenever you can.