Why VMs are good, mmmmkay

Colonel Kernel · Post by **Colonel Kernel** » Sat Apr 02, 2005 11:47 pm

AR wrote:I am not an "everything should be open source" fanatic, what I ment is that you could pre-compile for different targets and include multiple binaries for different targets.

Ah, my bad. I misunderstood your point the first time.

I may not be aware how libraries function but I was under the impression that they had .text (read-only) and local instance .data/.bss (read-write) and a global data section shared between all instances. If this isn't how it's done then, again that's a flaw in the design.

That is how it's done, but in addition to that the heap is typically global to the entire process, and it plus the thread stacks are all read/write. Big problems occur when this memory gets stomped on.

And what part of the system is going to enforce the use of this mechanism...?

I imagine the Kernel would. You provide a buffer in Kernel space to the library which the library can interact with using a pipe. If that is the only way the library can share data across instances then the programmer is going to have to use it.

What you're describing is more like the client/server model in a microkernel rather than shared libraries. The good thing about this model is that it's quite robust... after all, that's why processes have separate address spaces. Unfortunately, performance would suffer way too much if every library had to communicate with the others and with the app via message-passing. The right balance between the two approaches is needed, and the balance shifts depending on the requirements of the application or system as a whole.

It is impossible for machine code to be obsolete unless you're going to invent a system where the code is interpreted by something that doesn't require hardware.

I was talking about the way applications are pre-compiled to machine code, distributed as binaries targeted to the specific platform, and run in a manner that gives the language run-time environment very little (or no) supervision over their execution and access to their own memory space.

Colonel Kernel · Post by **Colonel Kernel** » Sun Apr 03, 2005 12:10 am

Anyway, here's more fuel for the fire (possibly the last):

Why I like VMs (managed run-time environments? I'm not sure what to call them any more) #3: Reflection

Reflection allows for some pretty neat features at very little cost to the developer. Most of them involve the ability of the system to explore the structure of your types at run-time and use this information to transform instances of those types in some way.

A typical use of reflection is serialization. Let's say you have a big tree of objects that you want written out to a file. In C++, you could do something like make each class derive from some Serializable base class and use the Visitor pattern to traverse the tree, calling serialize() on each node, passing it an ostream. That's a lot of work for the person writing all those classes.

With a run-time environment that supports reflection and type annotation, it becomes possible to make your entire class hierarchy serializable just by marking each class declaration with a "Serializable" tag (and usually each class needs a default constructor as well, so they can be de-serialized). If you have a tree made up of objects of such annotated classes, you can create some kind of Serializer object provided by the run-time libraries, passing it the type ID of the class for the root of the tree (let's call that class Thing for the sake of this example). Upon construction, the Serializer will reflect over Thing, and all the types of its fields, all the while generating code that will serialize any Thing instance to any kind of Stream (file, network, whatever). All the developer who wrote Thing had to do was provide a default constructor and mark it with Serializable.

That's binary serialization, but there are other possibilities as well. This could be used to bind objects in an object-relational database to their counterparts in memory. It can be (and is) used to serialize object graphs to XML so they can be sent in SOAP messages (and de-serialized at the other end). Typicially all that is required to make this work on the part of the developer is to write those little tags.

Another really neat use for reflection is in the automatic generation of proxy/stub code for remote calls (RPC, RMI, whatever you want to call it). Same kind of thing -- you have some type Thing. You tell the system's remoting facilities about Thing, and it will generate the code for a Thing proxy for you automatically. No more need for an explicit IDL compilation step to pre-generate proxy/stub binaries. Plus, all code generated by the run-time is verified by the run-time for type safety as an added bonus.

All of this stuff is a big time-saver for someone trying to write an app. It makes powerful infrastructure very easy to use. I think this sort of thing is very difficult to achieve without the help of a sophisticated run-time/VM/whatever you want to call it, and this is why I think such things are the future for most application development.

Now, let the tennis match resume. ;D

Pype.Clicker · Post by **Pype.Clicker** » Sun Apr 03, 2005 2:25 am

Colonel Kernel wrote: Anyway, here's more fuel for the fire (possibly the last):

Why I like VMs (managed run-time environments? I'm not sure what to call them any more) #3: Reflection

Reflection allows for some pretty neat features at very little cost to the developer. Most of them involve the ability of the system to explore the structure of your types at run-time and use this information to transform instances of those types in some way.

A typical use of reflection is serialization. Let's say you have a big tree of objects that you want written out to a file. In C++, you could do something like make each class derive from some Serializable base class and use the Visitor pattern to traverse the tree, calling serialize() on each node, passing it an ostream. That's a lot of work for the person writing all those classes.

Beware ... here comes Pype-the-long-ears again with his pre-processor-hammer

If your environment doesn't have reflection, nothing prevents you to add symbolic information or generate serializer/deserializer at compile time. I mean, no human is required to translate the list of members into a raw stream (be it ASN1, XML, be-encoded or whatever you want).

Am i wrong thinking that from times to times, you still have to write code for unserialization support (e.g. when you need to *reconstruct* state rather than storing it).

AR · Post by AR » Sun Apr 03, 2005 6:26 am

Colonel Kernel wrote:
AR wrote:I may not be aware how libraries function but I was under the impression that they had .text (read-only) and local instance .data/.bss (read-write) and a global data section shared between all instances. If this isn't how it's done then, again that's a flaw in the design.
That is how it's done, but in addition to that the heap is typically global to the entire process, and it plus the thread stacks are all read/write. Big problems occur when this memory gets stomped on.

And what part of the system is going to enforce the use of this mechanism...?

I imagine the Kernel would. You provide a buffer in Kernel space to the library which the library can interact with using a pipe. If that is the only way the library can share data across instances then the programmer is going to have to use it.
What you're describing is more like the client/server model in a microkernel rather than shared libraries. The good thing about this model is that it's quite robust... after all, that's why processes have separate address spaces. Unfortunately, performance would suffer way too much if every library had to communicate with the others and with the app via message-passing. The right balance between the two approaches is needed, and the balance shifts depending on the requirements of the application or system as a whole.

The point of this design is not that the local instance won't crash, the program would have crashed anyway but the other instances will not crash as the global data is not present in the address space to be corrupted to begin with. Global data should not be frequently operated on so this should not be too severe. You can also build an "automatic backup of the previous file version" into the filesystem manager to prevent any after affects. Although it is prudent to point out that a .Net app can 'crash' with a logic error and spew rubbish into the output as well.

It is impossible for machine code to be obsolete unless you're going to invent a system where the code is interpreted by something that doesn't require hardware.
I was talking about the way applications are pre-compiled to machine code, distributed as binaries targeted to the specific platform, and run in a manner that gives the language run-time environment very little (or no) supervision over their execution and access to their own memory space.

This brings me back to my original point in the original thread, you do not need a VM to supervise execution, why? One question: How does the program do anything (ie. draw, read files, get user input, etc)? Answer: OS APIs [I certainly wouldn't want to write software on Windows if they removed Kernel32.dll, gdi32.dll and user32.dll, no UI or file access...]. If the security model built into the APIs is properly designed with capabilities then the same level of security can be achieved.
BTW. Windows does provide ways for other programs to randomly probe and modify a processes memory, that's how "trainers" for games work.

Something interesting I remembered, the memory protection from not directly accessing memory is in VB6 as well, you "Set C = new Class1" to instantiate it and then it is automatically destroyed when it goes out of scope, like pre-compiled garbage collection. VB6 doesn't have pointers (and I remember cursing at it at times where I wanted to directly manipulate the bits) so that "advantage" also exists. So really what it comes down to is VM-enabled languages feature a certain set of abilities that are considered "required" but you do not need the VMs for those abilities. I could possibly create a VB.Net/C# "native code" compiler if I wanted to, the language not the environment determines the development time and the number/severity of bugs. A proper OS security model protects the system, a pointerless language reduces potential memory corruption, so apart from reflection (although I've never found it useful in the projects I experimented with) and ease of portability across platforms [at the expense of being forced to program for the lowest common denominator], what else is left?

Solar · Post by **Solar** » Sun Apr 03, 2005 7:28 am

Colonel Kernel, one thing about the "safeness" of garbage collection. This one has not been verified by me (since I don't do Java ATM), but comes from a source I trust:

When you build a Swing application, register an event handler for some widget, and then let the widget go out-of-scope without un-registering the event handler, you have a memory leak in your Java VM. The event handler is still registered and hinders the widget (and the containing frame) to be GC'ed. You also no longer have a handle on your widget, so you can't un-register the event handler.

That was true, AFAIK, up to and including JVM 1.4 - not sure about 1.5.

That is meant to say, GC's are still a long way from perfect.

Add to that that they - and VM's in general - add significant overhead, and you get a clear picture IMHO. VM's (and GC) are a very nice thing, especially for the industrial-level RAD done today. However, there will always be a break-even point at which the overhead becomes a headache. I'd agree that this break-even point has moved much, much lower in recent years, what with even mobile phones having the oomph to run a Java VM. But they aren't the cure to all diseases, even more so as they add another level of abstraction (and possible failure) between HW and application.

Bottom line: I agree that VM's will play an even bigger role in the future. But they aren't the cure-all, just like the internet, thin clients, and XML were just that: Good solutions for a given problem, but not the solution to solve all problems.

Candy · Post by **Candy** » Sun Apr 03, 2005 8:50 am

As with all the things that are proclaimed to be silver bullets and kill vampires (only good), they also kill people (do bad things).

Colonel Kernel · Post by **Colonel Kernel** » Sun Apr 03, 2005 11:24 am

I forgot to mention that reflection allows for some really sophisticated development tools as well (designers, debuggers, etc.). If I'm looking at a class I didn't write in the watch window of my debugger, it's really useful to be able to peek inside all its fields, and to be able to write arbitrary expressions (including method calls) in the watch window and see what they evaluate to. For the person writing the debugger, a lot of this comes for free from the run-time.

Anyway...

This brings me back to my original point in the original thread, you do not need a VM to supervise execution, why? One question: How does the program do anything (ie. draw, read files, get user input, etc)? Answer: OS APIs ...snip... If the security model built into the APIs is properly designed with capabilities then the same level of security can be achieved.

I'm not talking about security. We left that behind in the other thread. I'm talking about type-safety, which is a different animal. I would hate to have to make a system call every time I want my program to write to its own memory.

What I'm talking about is preventing this:

Code: Select all

    Thing* thing = new Thing();
    Foo* foo = reinterpret_cast<Foo*>( thing );
    foo->letsCrashRightNow();

It would be stupid to do this on purpose, but I've seen plenty of cases where things along these lines happen by accident.

Something interesting I remembered, the memory protection from not directly accessing memory is in VB6 as well, you "Set C = new Class1" to instantiate it and then it is automatically destroyed when it goes out of scope, like pre-compiled garbage collection.

VB6 uses COM under the hood, so those references are really pointers to reference-counted objects. Reference counting is ok, but has its problems (cycles of objects can't be freed, there is overhead whenever references are copies around, etc.). Also, do any real work in VB6 you usually end up having to write a COM object in C++ anyway, and then things get ugly (been writing COM objects in C++ for 4 years... maybe I should not put it on my resume if I don't want to do it anymore.

).

Remember, I never said GC requires a VM, just that it's one feature of these VMs that is quite compelling.

...continued...

Colonel Kernel · Post by **Colonel Kernel** » Sun Apr 03, 2005 11:30 am

...

If your environment doesn't have reflection, nothing prevents you to add symbolic information or generate serializer/deserializer at compile time.

Sure, but doing it with a pre-processor is a pain in the ***.

Plus, why have that code pre-generated if it doesn't need to be? Let's say you're writing class Thing and selling it as part of your spiffy 3rd party library. Perhaps you can't predict whether or not people will care about (de-)serializing Things, so you make it Serializable just in case. Your binary is now maybe a few bytes bigger. If you pre-compiled all that functionality in, it would be a lot more, all for a feature that people may or may not use. Generally speaking, doing this kind of stuff at compile-time takes away flexibility.

Am i wrong thinking that from times to times, you still have to write code for unserialization support (e.g. when you need to *reconstruct* state rather than storing it).

It depends on what kind of state is being reconstructed. If all the fields of your object can be automatically serialized, then they can be automatically deserialized as well. But if you have a transient field that doesn't get serialized, like say an open File or something, then a freshly-deserialized object would need to open the File for itself (although I would argue that objects that require these kinds of resources probably shouldn't be serialized in the first place).

When you build a Swing application, register an event handler for some widget, and then let the widget go out-of-scope without un-registering the event handler, you have a memory leak in your Java VM. The event handler is still registered and hinders the widget (and the containing frame) to be GC'ed. You also no longer have a handle on your widget, so you can't un-register the event handler.

The bit about being unable to unregister the event handler is expected, but if the widget itself isn't being GC'd after going out of scope, that sounds like a bug for sure.

The way it should work is: Thing registers a handler with Widget. Now Widget refers to Thing, and something else refers to Widget. If all references to Widget are dropped first, then it can be GC'd, even though it refers to Thing. If all (other) references to Thing are dropped first, it will stay alive as long as Widget is referred to, but as soon as Widget becomes candidate for collection, then Thing is too. If this isn't what happens, then I hope the fix it. In my experience (2+ years developing with .NET), this kind of thing has never happened to me.

Bottom line: I agree that VM's will play an even bigger role in the future. But they aren't the cure-all, just like the internet, thin clients, and XML were just that: Good solutions for a given problem, but not the solution to solve all problems.

We're approaching the rational happy medium.

I didn't say VMs were a cure-all or the solution to all problems, but I think the class of problems they solve is rapidly expanding.

Mainly, I was just tired of hearing that VMs are for the "lazy" or less capable developers. Then again, the people that say this are the same people who would probably write everything in assembler if they could.

As with all the things that are proclaimed to be silver bullets and kill vampires (only good), they also kill people (do bad things).

;D

AR · Post by AR » Sun Apr 03, 2005 7:53 pm

Mainly, I was just tired of hearing that VMs are for the "lazy" or less capable developers. Then again, the people that say this are the same people who would probably write everything in assembler if they could.

If you are again referring to my original post, that was referring to the OS Developer (ie. Microsoft) who wouldn't bother fixing their shoddy OS.

VB6 uses COM under the hood, so those references are really pointers to reference-counted objects. Reference counting is ok, but has its problems (cycles of objects can't be freed, there is overhead whenever references are copies around, etc.). Also, do any real work in VB6 you usually end up having to write a COM object in C++ anyway, and then things get ugly (been writing COM objects in C++ for 4 years... maybe I should not put it on my resume if I don't want to do it anymore. ).

Remember, I never said GC requires a VM, just that it's one feature of these VMs that is quite compelling.

I was not implying that GCs did require it, I have been reading your entire posts. In regards to VB6 using COM, you are picking apart something that is not related to the language but the implementation. Let's take REALBasic then, syntax is more OOP like but still very similar, it is also cross platform (Linux/Mac/Windows) and doesn't rely on COM (although it does still use reference counting), the featureset is practically the same again. The point here is that a virtual machine is not required for memory protection, only a decent language.

That pretty much sums up all my points really, a VM is practically an emulator, all the features it provides do not require the VM, sure the features may be easier to provide but that is the language implementers problem. Anything that can be done in a VM can be done directly with the hardware, "Virtual Machine"="Emulated Physical Hardware" nothing more, nothing less, the features of the languages and runtime environment do not require the VM to exist.

OSDev.org

Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay

Re:Why VMs are good, mmmmkay