Cross Platform Virtual Machine

zeitue · Post by **zeitue** » Sun Jul 28, 2013 2:10 am

dozniak wrote:If I was starting my first VM I'd decide it's not going to be production final design and go with RISC fixed-width instructions first just for simplicity of it, then I'd evaluate how it performs and adjust the design according to my measurements.

OK, that's actually what I was leaning towards. I'm just unsure on a lot of things
After reading this Comparison_of_CPU_architectures I'm thinking SuperH or Mips like.
Do you know any resources on how to build a VM or on the instruction sets of the current VMs like Davik, Dis, Parrot,....?

dozniak · Post by **dozniak** » Sun Jul 28, 2013 2:30 am

google.com is quite good at finding those.

MIPS is probably a nice instruction set to start with, but since you're designing your own you can just use it as a reference, no need to copy it entirely.

zeitue · Post by **zeitue** » Mon Jul 29, 2013 1:13 am

dozniak wrote:google.com is quite good at finding those.

MIPS is probably a nice instruction set to start with, but since you're designing your own you can just use it as a reference, no need to copy it entirely.

The SuperH is also quite nice.
I've found Dalvik, which is the Android virtual machine to be quite nice though it's a process virtual machine, however it still uses a virtual instruction set with opcodes and the whole bit.

NickJohnson · Post by **NickJohnson** » Mon Jul 29, 2013 2:06 am

You still seem to be mixing virtual machines like the JVM, Dalvik, etc. in with real architectures like MIPS. This is not a good idea, because these "machines" are designed for very different purposes and under very different constraints. When you design a hardware architecture, you are concerned about things like how the pipeline is divided, what kind of arithmetic/bitwise operations you can perform with reasonable transistor cost, and how many registers you can afford; when you design a virtual "architecture" meant only to run in software, you are concerned about what is needed from the underlying OS, what sort of high-level language features will be built in (like garbage collection), and how easily the bytecode can be JIT compiled to different real architectures.

Kevin · Post by **Kevin** » Mon Jul 29, 2013 2:31 am

zeitue wrote:I need to make sure that this will run fast on most major architectures
I'm thinking a CISC machine might run RISC slower because it might have to do more operations that it's not used to?
but a RISC machine my also be slow running CISC because of the conversion to more instructions?

You want high-level operations so that your code generator has more context and can produce good code for either CISC or RISC hosts. Trying to reassemble CISC instructions from a series of RISC ones is much harder (and wouldn't perform well, I guess).

zeitue · Post by **zeitue** » Mon Jul 29, 2013 11:43 am

NickJohnson wrote:You still seem to be mixing virtual machines like the JVM, Dalvik, etc. in with real architectures like MIPS. This is not a good idea, because these "machines" are designed for very different purposes and under very different constraints. When you design a hardware architecture, you are concerned about things like how the pipeline is divided, what kind of arithmetic/bitwise operations you can perform with reasonable transistor cost, and how many registers you can afford; when you design a virtual "architecture" meant only to run in software, you are concerned about what is needed from the underlying OS, what sort of high-level language features will be built in (like garbage collection), and how easily the bytecode can be JIT compiled to different real architectures.

I was thinking that there are somethings that are similar between the two, I also figured that my CPU is only going to be virtual never real so I would have less to worry about. Am I wrong?
Additionally I figured the instruction sets would be simiilar as well, but I expected higher level features of course from virtual machines like the JVM.

zeitue · Post by **zeitue** » Mon Jul 29, 2013 12:12 pm

Kevin wrote:
zeitue wrote:I need to make sure that this will run fast on most major architectures
I'm thinking a CISC machine might run RISC slower because it might have to do more operations that it's not used to?
but a RISC machine my also be slow running CISC because of the conversion to more instructions?
You want high-level operations so that your code generator has more context and can produce good code for either CISC or RISC hosts. Trying to reassemble CISC instructions from a series of RISC ones is much harder (and wouldn't perform well, I guess).

I'm really not sure it could go either way really?
It could run better as a CISC on all machine though I think it depends the conversion of instructions.

An example would be doing to same operation in RISC and CISC

CISC wrote:mov ax, 10
mov bx, 5
mul bx, ax

RISC wrote:mov ax, 0
mov bx, 10
mov cx, 5
Begin add ax, bx
loop Begin

The total clock cycles for the CISC version might be:
(2 movs × 1 cycle) + (1 mul × 30 cycles) = 32 cycles

(3 movs × 1 cycle) + (5 adds × 1 cycle) + (5 loops × 1 cycle) = 13
cycles

Looking at this the RISC looks faster but the real question would be converting the instruction sets through.

Kevin · Post by **Kevin** » Mon Jul 29, 2013 12:37 pm

The point is that getting from your CISC code with the mul instruction to your RISC code is trivial. Doing the opposite is hard.

You want to produce code for both CISC and RISC architecture from your IR, so assuming that one of CISC or RISC are a direct mapping (which I don't think is a useful assumption, because I think the IR should be even more high-level, but you're insisting on it, so let's just assume it for the moment) you have to do one conversion. And the only reasonable option is to use CISC as the source and generate RISC from it. The other way round is just insane, so you would end up using RISC code patterns even for CISC target platforms.

Means CISC (or preferably, as I said, an even higher level representation) is the clearly superior option for an IR.

zeitue · Post by **zeitue** » Mon Jul 29, 2013 3:16 pm

Kevin wrote:The point is that getting from your CISC code with the mul instruction to your RISC code is trivial. Doing the opposite is hard.

You want to produce code for both CISC and RISC architecture from your IR, so assuming that one of CISC or RISC are a direct mapping (which I don't think is a useful assumption, because I think the IR should be even more high-level, but you're insisting on it, so let's just assume it for the moment) you have to do one conversion. And the only reasonable option is to use CISC as the source and generate RISC from it. The other way round is just insane, so you would end up using RISC code patterns even for CISC target platforms.

Means CISC (or preferably, as I said, an even higher level representation) is the clearly superior option for an IR.

That's a good point. I see what you mean going RISC to CISC seems way to complex much smarter doing a CISC to RISC; so my virtual machine will be a CISC.

What do you think on Endianess? Big, Small, BI? I'm kinda leaning towards Big or BI.

Also what about Instruction and length? should it be fixed or variable? I'm thinking 32 bit fixed width instructions.

Owen · Post by **Owen** » Mon Jul 29, 2013 7:29 pm

Time has shown, I think, that the core of x86 has just the right amount of CISCiness. Things like one memory operand per instruction are relatively easy to make fast, and improve code density which reduces memory bandwidth pressure.

If I were to design a hardware architecture today, it would be a "one memory address", three operand architecture with 16/32-bit fixed-variable length instructions. That, I think experience dictates, gives the best code size and decoder area tradeoff

Don't think of multiply our divide instructions as CISC. Every architecture will have them somewhere, even if they're not implemented on smaller implementations

NickJohnson · Post by **NickJohnson** » Mon Jul 29, 2013 7:36 pm

Owen wrote: Time has shown, I think, that the core of x86 has just the right amount of CISCiness.

...except that modern x86 chips dynamically recompile their instructions to RISC-like microcode internally. The x86 ISA may be dense, but it is not easy to parse, since the length of the instruction is not known until several of the bits are examined. And good luck feeding some of those crazy operand arrangements into an out-of-order scheduler in one piece.

Owen · Post by **Owen** » Mon Jul 29, 2013 8:10 pm

NickJohnson wrote:
Owen wrote: Time has shown, I think, that the core of x86 has just the right amount of CISCiness.
...except that modern x86 chips dynamically recompile their instructions to RISC-like microcode internally. The x86 ISA may be dense, but it is not easy to parse, since the length of the instruction is not known until several of the bits are examined. And good luck feeding some of those crazy operand arrangements into an out-of-order scheduler in one piece.

Actually, part of the reason it's a nightmare to parse is that it's not dense!

Saying they recompile it is stretching things a lot. One instruction goes in, two or theee uops come out. The CISC encoding is essentially a compression scheme.

Its notable that the highest clocked commercially available processor, the IBM z196 is also CISC in a similar way to x86.

Forget segmentation and some of the Ill thought through instructions like enter, aam and such. The core two operand ISA is remarkably clean and not too badly coded... at least before all the later concocted prefixes show up to the party

Let's not forget that I wasnt suggesting anything like the x86 encoding.

zeitue · Post by **zeitue** » Mon Jul 29, 2013 10:29 pm

Owen wrote:Time has shown, I think, that the core of x86 has just the right amount of CISCiness. Things like one memory operand per instruction are relatively easy to make fast, and improve code density which reduces memory bandwidth pressure.

I think the problems with the X86 though is that it has a lot of legacy code like 16 bit real mode.

Owen wrote: If I were to design a hardware architecture today, it would be a "one memory address", three operand architecture with 16/32-bit fixed-variable length instructions. That, I think experience dictates, gives the best code size and decoder area tradeoff

What do you mean by one memory address? are you referring to main memory like physical memory?
How can it be fixed and variable at the same time? or are there two different types of instruction set?

Owen wrote: Don't think of multiply our divide instructions as CISC. Every architecture will have them somewhere, even if they're not implemented on smaller implementations

Of course things can be broken down from or built up out of multiple operations as I see it. though I think some instructions on CISC machines can't be broken down?

zeitue · Post by **zeitue** » Mon Jul 29, 2013 10:37 pm

NickJohnson wrote:
Owen wrote: Time has shown, I think, that the core of x86 has just the right amount of CISCiness.
...except that modern x86 chips dynamically recompile their instructions to RISC-like microcode internally. The x86 ISA may be dense, but it is not easy to parse, since the length of the instruction is not known until several of the bits are examined. And good luck feeding some of those crazy operand arrangements into an out-of-order scheduler in one piece.

dynamically recompilation is pretty much what QEMU does in software, so what you're saying is that the new X86 CPUs like i5 and i7 so on are like an ARM(RISC) CPU with a dynamic layer on top? would that not make them less effective? and if they are built like this is there a way to access the RISC layer underneath it? Or would that not be possible because it is done in microcode?

Owen · Post by **Owen** » Tue Jul 30, 2013 5:18 am

zeitue wrote:
Owen wrote: If I were to design a hardware architecture today, it would be a "one memory address", three operand architecture with 16/32-bit fixed-variable length instructions. That, I think experience dictates, gives the best code size and decoder area tradeoff

What do you mean by one memory address? are you referring to main memory like physical memory?

How can it be fixed and variable at the same time? or are there two different types of instruction set?

x86 is a one memory address architecture. (Nearly) every instruction can use a memory operand

zeitue wrote:
Owen wrote: Don't think of multiply our divide instructions as CISC. Every architecture will have them somewhere, even if they're not implemented on smaller implementations
Of course things can be broken down from or built up out of multiple operations as I see it. though I think some instructions on CISC machines can't be broken down?

They're all on modern implementations broken down. The tricky ones are things like system control opcodes where you have to serialize things around them - i.e. the whole pipeline gets drained, then they get executed, then the pipeline refilled

zeitue wrote:
NickJohnson wrote:
Owen wrote: Time has shown, I think, that the core of x86 has just the right amount of CISCiness.
...except that modern x86 chips dynamically recompile their instructions to RISC-like microcode internally. The x86 ISA may be dense, but it is not easy to parse, since the length of the instruction is not known until several of the bits are examined. And good luck feeding some of those crazy operand arrangements into an out-of-order scheduler in one piece.
dynamically recompilation is pretty much what QEMU does in software, so what you're saying is that the new X86 CPUs like i5 and i7 so on are like an ARM(RISC) CPU with a dynamic layer on top? would that not make them less effective? and if they are built like this is there a way to access the RISC layer underneath it? Or would that not be possible because it is done in microcode?

The instructions are broken down into "RISC like" micro-operations internally, that is, say, "inc dword ptr [eax]" would get broken down to

r0 <- LOAD DWORD EAX
r1 <- ADD r0, $1
STORE DWORD EAX, r1

before being issued to the out-of-order execution core.

Internally these instructions look nothing like RISC instructions - when they're decoded both RISC and CISC instructions baloon into wide things (128-bit or more) which are essentially just a bundle of decoded control signals.

OSDev.org

Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine