The place of virtualization in Operating Systems

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
AbstractYouShudNow
Member
Member
Posts: 92
Joined: Tue Aug 14, 2012 8:51 am

The place of virtualization in Operating Systems

Post by AbstractYouShudNow »

For you, what is the place of virtualization in today's OS's ?

I think this is one of the best ways to abstract the architecture away.
How about an OS where (almost) all programs are written in managed code and the OS includes a VM for their execution (as do LLVM, .NET or Java).

With a well-made enough managed language, and APIs designed with this goal in mind, I am sure that it is possible to allow executing the same application whatever the platform is.

Android already demonstrated this, and another example is how JavaScript in web pages runs under almost any Operating System, whatever the architecture and processor happen to be.

So, is it really possible, and is it worth ?

EDIT :
- Maybe that 'managed' is not the appropriate term. I meant any language that can be semi-compiled (as an interpreted language would be to slow and wouldn't allow closed-source programs) as Java does (or simply LLVM)
- I have already planned what the architecture would look like. I still have to work on it, but should get it finished soon.
Antti
Member
Member
Posts: 923
Joined: Thu Jul 05, 2012 5:12 am
Location: Finland

Re: The place of virtualization in Operating Systems

Post by Antti »

AbstractYouShudNow wrote:as an interpreted language would be to slow and wouldn't allow closed-source programs
I am not an expert when it comes to this topic but I think that there are some ways to solve the closed source issue. The source code could be encrypted or something like that. I am actually surprised that it has not been implemented very much. Of course it may also be that I am not aware of it.
AbstractYouShudNow
Member
Member
Posts: 92
Joined: Tue Aug 14, 2012 8:51 am

Re: The place of virtualization in Operating Systems

Post by AbstractYouShudNow »

The problem of encryption is that it will require additionnal effort (time and resources) to decrypt it, and it would be easy to get the decrypted source code.

But I don't really bother of code having to be open-source. The actual issue that I found is that it would be very easy for one to modify the program to introduce malwares, and that parsing the code would be slow, thought it could be cache-compiled, as does the Python interpreter...
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: The place of virtualization in Operating Systems

Post by bluemoon »

Do not mix encryption with checksum., they are for different purpose.
To protect against code modification the usual (and quite working) way is code sign (ie. checksum).
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: The place of virtualization in Operating Systems

Post by NickJohnson »

An interpreted language couldn't be encrypted, because it would need to be decrypted before being run, which means the machine running it would need the decryption key, which could be easily extracted by the user. However, using a bytecode instead of a pure interpreter (who uses a pure interpreter nowadays anyway?) would mean you could obfuscate the source as well as you would have been able to by compiling it to machine code.

@bluemoon: encryption and signature algorithms are distinct, but definitely related. "checksum" usually refers to a non-cryptographic checksum/hash, and you also need more than even a cryptographic hash to implement a proper signature or message authentication code.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: The place of virtualization in Operating Systems

Post by Brendan »

Hi,

One way of improving performance is to find things that could be done less often and do them less often. For example, you might shift something from inside a loop to outside the loop. With this in mind; here's a set of empty loops:

Code: Select all

    for each time executable is installed {
        for each time executable is executed {
            for each piece of code in the executable {
            }
        }
    }
Here's the least possible overhead:

Code: Select all

    compile_source();
    link_libraries();
    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            load_the_file();
            for each piece of code in the executable {
                /* Do nothing - CPU does this itself */
            }
        }
    }
Here's the most possible overhead:

Code: Select all

    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            load_the_file();
            for each piece of code in the executable {
                interpret_piece_of_code();
            }
        }
    }
Here's the typical way native executables are handled:

Code: Select all

    compile_source();
    link_some_libraries();
    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            load_the_file();
            link_remaining_libraries();
            for each piece of code in the executable {
                /* Do nothing - CPU does this itself */
            }
        }
    }
Here's Java:

Code: Select all

    compile_source_to_byte_code();
    for each time executable is installed {
        if( file_not_in_cache ) copy_the_file();
        for each time executable is executed {
            load_the_file();
            for each piece of code in the executable {
                if(piece_not_compiled) compile_byte_code_for_piece();
                execute_piece();
            }
        }
    }

Here's what I'd be tempted to do (note: intended for a distributed system where different computers may be different):

Code: Select all

    compile_source_to_byte_code();
    for each time executable is installed {
        copy_the_file();
        for each time executable is executed {
            if( native_file_not_present ) {
                check_checksum();
                compile_byte_code_to_native();
            }
            load_the_file();
            for each piece of code in the executable {
                /* Do nothing - CPU does this itself */
            }
        }
    }
Now it's your turn - add whatever you like to the "set of empty loops" at the top! :)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
AbstractYouShudNow
Member
Member
Posts: 92
Joined: Tue Aug 14, 2012 8:51 am

Re: The place of virtualization in Operating Systems

Post by AbstractYouShudNow »

Bytecode was actually the way I was thinking about the concept.

@Brendan : You don't even have to bother about that, do you know the LLVM library ? It does all these things for you. You create your language (or choose an existant one) and write a wrapper that will generate LLVM bytecode for each construct in the language. Then, you can load this bytecode and LLVM will read it itself and generate native code for the target of your choice. By the way, it also has a parallel project, Clang, that is a C compiler that can compile (Objective) C/C++ into that bytecode (It is a very good cross-compiler !!!). It is used by many well-known and various projects : A video decoder, virtual machines, programming language compilers/developments (Adobe's Hydra, a second Python implementation, a graphics engine of Google used on Android). Also, Apple uses clang's predecessor, LLVM-gcc (actually an adaptation of GCC that uses LLVM) as its default C compiler ( on a mac, $(CC) is set to llvm-gcc, I read this in the wiki chapter about cross compilers).

For the checksum, it's an excellent idea I hadn't thought of. Sorry, but I don't yet know much about executable formats :)
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: The place of virtualization in Operating Systems

Post by bluemoon »

AbstractYouShudNow wrote:@Brendan : You don't even have to bother about that, do you know the LLVM library ?
I don't believe Brendan has anything that he working on and do not know. He always do enough research.
By the way, I guess Brendan is looking for (or waiting for) something that cannot or difficult to be done with traditional approach, that's why he's exploring new ways.
User avatar
linuxfood
Member
Member
Posts: 38
Joined: Wed Dec 31, 2008 12:22 am

Re: The place of virtualization in Operating Systems

Post by linuxfood »

LLVM, despite the misnomer of its name, is not a virtual machine. The IR that is generated represents a machine with no concept of registers, or memory*.

While it does contain an interpreter and JIT as components, neither are really meant to fill the roll of the JVM or the Python interpreter, for example. Neither component is designed to be heavily relied on - for example, the JIT has no JVM-like hotspot tracing. This means that if you're relying on fast JIT, then you miss opportunities for the optimizer to run.

Also, the bitcode it generates is not portable between architectures. It might be for a particular output, but that would be a fluke. And, the emitted bitcode is also not inherently memory safe, so you also don't get any wins from reduced hardware memory security checks (nor is it directly amenable to analysis, since if the bitcode is from C, them it's as memory-unsafe as C.)



*strictly speaking it has some concept of memory... "memory is a big undifferentiated blob of space" is the memory model, more or less. Not very restrictive, and therefore not good for JIT or virtual machines.
AbstractYouShudNow
Member
Member
Posts: 92
Joined: Tue Aug 14, 2012 8:51 am

Re: The place of virtualization in Operating Systems

Post by AbstractYouShudNow »

Sorry, I didn't know LLVM in that depth. But I'm pretty sure one can take LLVM architecture as a basis and develop a clone with all the additionnal features we need, such as platform-independant bytecode, or a real VM.

Also, if every process runs in a VM, won't that definitively kill any hope of efficience ? I know Android systems already do that without losing much speed, but applications for PCs are much more complex than ones for phones. I was actually thinking of a system having a VM for each type of program, and starting the appropriate VM, as a layer. My design is obviously much more complex than that, since I just love difficulty (that's why I'm using GAS instead of NASM :D) and it's just theory for the moment. Since the design is very complicated, I need to provide megabytes of code to make even one step. But the cool thing is that when built the heavy foundation, everything goes by itself. That's a design form I use very much and proves to be very productive, though requiring a strong mind. It is just like building complicated electrical components, and then connecting them with wires. Then, it allows anyone to plug in new components through new wires to extend the functionnality.

On the other hand, LLVM is a quite big piece of code, and its analysis may be a hard time, but I'm sure we can finally get a good basis from it. Don't you ?
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: The place of virtualization in Operating Systems

Post by gerryg400 »

But I'm pretty sure one can take LLVM architecture as a basis and develop a clone with all the additionnal features we need, such as platform-independant bytecode, or a real VM.
I'm pretty sure you can't. LLVM does not manage code and is not a virtual machine. It is the IR for a compiler and thus is simply the original source code in another form. It will not provide the basis for virtualisation any more than the original C code or its GCC-compiled equivalent would.
If a trainstation is where trains stop, what is a workstation ?
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: The place of virtualization in Operating Systems

Post by NickJohnson »

gerryg400 wrote:
But I'm pretty sure one can take LLVM architecture as a basis and develop a clone with all the additionnal features we need, such as platform-independant bytecode, or a real VM.
I'm pretty sure you can't. LLVM does not manage code and is not a virtual machine. It is the IR for a compiler and thus is simply the original source code in another form. It will not provide the basis for virtualisation any more than the original C code or its GCC-compiled equivalent would.
You might be able to develop a semi-managed bytecode as a backend target for the LLVM toolchain, although changing the IR to do this would be silly, and it's sort of independent of LLVM (even though it would be easiest to develop using LLVM's tools.)
Post Reply