OSDev.org

Posted: **Thu Sep 13, 2012 12:03 pm**

For you, what is the place of virtualization in today's OS's ?

I think this is one of the best ways to abstract the architecture away.
How about an OS where (almost) all programs are written in managed code and the OS includes a VM for their execution (as do LLVM, .NET or Java).

With a well-made enough managed language, and APIs designed with this goal in mind, I am sure that it is possible to allow executing the same application whatever the platform is.

Android already demonstrated this, and another example is how JavaScript in web pages runs under almost any Operating System, whatever the architecture and processor happen to be.

So, is it really possible, and is it worth ?

EDIT :
- Maybe that 'managed' is not the appropriate term. I meant any language that can be semi-compiled (as an interpreted language would be to slow and wouldn't allow closed-source programs) as Java does (or simply LLVM)
- I have already planned what the architecture would look like. I still have to work on it, but should get it finished soon.

Posted: **Thu Sep 13, 2012 12:43 pm**

AbstractYouShudNow wrote:as an interpreted language would be to slow and wouldn't allow closed-source programs

I am not an expert when it comes to this topic but I think that there are some ways to solve the closed source issue. The source code could be encrypted or something like that. I am actually surprised that it has not been implemented very much. Of course it may also be that I am not aware of it.

Posted: **Tue Sep 18, 2012 11:44 am**

The problem of encryption is that it will require additionnal effort (time and resources) to decrypt it, and it would be easy to get the decrypted source code.

But I don't really bother of code having to be open-source. The actual issue that I found is that it would be very easy for one to modify the program to introduce malwares, and that parsing the code would be slow, thought it could be cache-compiled, as does the Python interpreter...

Posted: **Tue Sep 18, 2012 12:45 pm**

Do not mix encryption with checksum., they are for different purpose.
To protect against code modification the usual (and quite working) way is code sign (ie. checksum).

Posted: **Tue Sep 18, 2012 8:26 pm**

An interpreted language couldn't be encrypted, because it would need to be decrypted before being run, which means the machine running it would need the decryption key, which could be easily extracted by the user. However, using a bytecode instead of a pure interpreter (who uses a pure interpreter nowadays anyway?) would mean you could obfuscate the source as well as you would have been able to by compiling it to machine code.

@bluemoon: encryption and signature algorithms are distinct, but definitely related. "checksum" usually refers to a non-cryptographic checksum/hash, and you also need more than even a cryptographic hash to implement a proper signature or message authentication code.

Posted: **Tue Sep 18, 2012 9:28 pm**

Hi,

One way of improving performance is to find things that could be done less often and do them less often. For example, you might shift something from inside a loop to outside the loop. With this in mind; here's a set of empty loops:

Code: Select all

    for each time executable is installed {
        for each time executable is executed {
            for each piece of code in the executable {
            }
        }
    }

Here's the least possible overhead:

Code: Select all

    compile_source();
    link_libraries();
    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            load_the_file();
            for each piece of code in the executable {
                /* Do nothing - CPU does this itself */
            }
        }
    }

Here's the most possible overhead:

Code: Select all

    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            load_the_file();
            for each piece of code in the executable {
                interpret_piece_of_code();
            }
        }
    }

Here's the typical way native executables are handled:

Code: Select all

    compile_source();
    link_some_libraries();
    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            load_the_file();
            link_remaining_libraries();
            for each piece of code in the executable {
                /* Do nothing - CPU does this itself */
            }
        }
    }

Here's Java:

Code: Select all

    compile_source_to_byte_code();
    for each time executable is installed {
        if( file_not_in_cache ) copy_the_file();
        for each time executable is executed {
            load_the_file();
            for each piece of code in the executable {
                if(piece_not_compiled) compile_byte_code_for_piece();
                execute_piece();
            }
        }
    }

Here's what I'd be tempted to do (note: intended for a distributed system where different computers may be different):

Code: Select all

    compile_source_to_byte_code();
    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            if( native_file_not_present ) {
                check_checksum();
                compile_byte_code_to_native();
            }
            load_the_file();
            for each piece of code in the executable {
                /* Do nothing - CPU does this itself */
            }
        }
    }

Now it's your turn - add whatever you like to the "set of empty loops" at the top!

Cheers,

Brendan

Posted: **Tue Nov 06, 2012 12:40 pm**

Bytecode was actually the way I was thinking about the concept.

@Brendan : You don't even have to bother about that, do you know the LLVM library ? It does all these things for you. You create your language (or choose an existant one) and write a wrapper that will generate LLVM bytecode for each construct in the language. Then, you can load this bytecode and LLVM will read it itself and generate native code for the target of your choice. By the way, it also has a parallel project, Clang, that is a C compiler that can compile (Objective) C/C++ into that bytecode (It is a very good cross-compiler !!!). It is used by many well-known and various projects : A video decoder, virtual machines, programming language compilers/developments (Adobe's Hydra, a second Python implementation, a graphics engine of Google used on Android). Also, Apple uses clang's predecessor, LLVM-gcc (actually an adaptation of GCC that uses LLVM) as its default C compiler ( on a mac, $(CC) is set to llvm-gcc, I read this in the wiki chapter about cross compilers).

For the checksum, it's an excellent idea I hadn't thought of. Sorry, but I don't yet know much about executable formats

Posted: **Tue Nov 06, 2012 2:16 pm**

AbstractYouShudNow wrote:@Brendan : You don't even have to bother about that, do you know the LLVM library ?

I don't believe Brendan has anything that he working on and do not know. He always do enough research.
By the way, I guess Brendan is looking for (or waiting for) something that cannot or difficult to be done with traditional approach, that's why he's exploring new ways.

Posted: **Sun Nov 11, 2012 1:14 am**

LLVM, despite the misnomer of its name, is not a virtual machine. The IR that is generated represents a machine with no concept of registers, or memory*.

While it does contain an interpreter and JIT as components, neither are really meant to fill the roll of the JVM or the Python interpreter, for example. Neither component is designed to be heavily relied on - for example, the JIT has no JVM-like hotspot tracing. This means that if you're relying on fast JIT, then you miss opportunities for the optimizer to run.

Also, the bitcode it generates is not portable between architectures. It might be for a particular output, but that would be a fluke. And, the emitted bitcode is also not inherently memory safe, so you also don't get any wins from reduced hardware memory security checks (nor is it directly amenable to analysis, since if the bitcode is from C, them it's as memory-unsafe as C.)

*strictly speaking it has some concept of memory... "memory is a big undifferentiated blob of space" is the memory model, more or less. Not very restrictive, and therefore not good for JIT or virtual machines.

Posted: **Mon Nov 19, 2012 12:42 pm**

Sorry, I didn't know LLVM in that depth. But I'm pretty sure one can take LLVM architecture as a basis and develop a clone with all the additionnal features we need, such as platform-independant bytecode, or a real VM.

Also, if every process runs in a VM, won't that definitively kill any hope of efficience ? I know Android systems already do that without losing much speed, but applications for PCs are much more complex than ones for phones. I was actually thinking of a system having a VM for each type of program, and starting the appropriate VM, as a layer. My design is obviously much more complex than that, since I just love difficulty (that's why I'm using GAS instead of NASM

) and it's just theory for the moment. Since the design is very complicated, I need to provide megabytes of code to make even one step. But the cool thing is that when built the heavy foundation, everything goes by itself. That's a design form I use very much and proves to be very productive, though requiring a strong mind. It is just like building complicated electrical components, and then connecting them with wires. Then, it allows anyone to plug in new components through new wires to extend the functionnality.

On the other hand, LLVM is a quite big piece of code, and its analysis may be a hard time, but I'm sure we can finally get a good basis from it. Don't you ?

Posted: **Mon Nov 19, 2012 1:32 pm**

But I'm pretty sure one can take LLVM architecture as a basis and develop a clone with all the additionnal features we need, such as platform-independant bytecode, or a real VM.

I'm pretty sure you can't. LLVM does not manage code and is not a virtual machine. It is the IR for a compiler and thus is simply the original source code in another form. It will not provide the basis for virtualisation any more than the original C code or its GCC-compiled equivalent would.

Posted: **Mon Nov 19, 2012 9:24 pm**

gerryg400 wrote:
But I'm pretty sure one can take LLVM architecture as a basis and develop a clone with all the additionnal features we need, such as platform-independant bytecode, or a real VM.
I'm pretty sure you can't. LLVM does not manage code and is not a virtual machine. It is the IR for a compiler and thus is simply the original source code in another form. It will not provide the basis for virtualisation any more than the original C code or its GCC-compiled equivalent would.

You might be able to develop a semi-managed bytecode as a backend target for the LLVM toolchain, although changing the IR to do this would be silly, and it's sort of independent of LLVM (even though it would be easiest to develop using LLVM's tools.)

OSDev.org

The place of virtualization in Operating Systems

The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems

Re: The place of virtualization in Operating Systems