Cross Platform Virtual Machine

zeitue · Post by **zeitue** » Wed Jul 31, 2013 12:46 am

Owen wrote: The instructions are broken down into "RISC like" micro-operations internally, that is, say, "inc dword ptr [eax]" would get broken down to

r0 <- LOAD DWORD EAX

r1 <- ADD r0, $1

STORE DWORD EAX, r1
before being issued to the out-of-order execution core.

Internally these instructions look nothing like RISC instructions - when they're decoded both RISC and CISC instructions baloon into wide things (128-bit or more) which are essentially just a bundle of decoded control signals.

Thanks for the explanation.

Question as far as my machine being a CISC type do you think I can still go with a fixed width instruction set?
How many instructions do you think would be suitable? The X86 has over 1000 and MIPS only has 64.
Also what Endianess would you say is better suited? From what I've been reading I would say Big or Bi.

dozniak · Post by **dozniak** » Wed Jul 31, 2013 1:00 am

You will have a hard time with big endian given that most architectures are/support little-endian.

Little endian has a nice propert to access only lower (byte/word) of a bigger entity you simply change the size of your read, without doing extra arithmetic - this may come in handy.

The little/big-endian question only matters much for the fixed-size opcodes, as you will need to convert to native endian to decode. For a variable length opcodes you will still be decoding byte-by-byte. Then only particular arguments will need to be byteswapped (if you go for a larger opcode size and hence word-by-word decode, then you will have to byteswap every word you decode, which is even slower).

zeitue wrote:How many instructions do you think would be suitable?

As many as necessary for representing the required functionality. It can be anywhere between 1 and infinity.

zeitue · Post by **zeitue** » Wed Jul 31, 2013 1:37 am

dozniak wrote:You will have a hard time with big endian given that most architectures are/support little-endian.

Little endian has a nice propert to access only lower (byte/word) of a bigger entity you simply change the size of your read, without doing extra arithmetic - this may come in handy.

The little/big-endian question only matters much for the fixed-size opcodes, as you will need to convert to native endian to decode. For a variable length opcodes you will still be decoding byte-by-byte. Then only particular arguments will need to be byteswapped (if you go for a larger opcode size and hence word-by-word decode, then you will have to byteswap every word you decode, which is even slower).

zeitue wrote:How many instructions do you think would be suitable?
As many as necessary for representing the required functionality. It can be anywhere between 1 and infinity.

I'm planning for a fixed size instruction set and opcodes.
Can you explain a bit more why little endian is a better choice if you don't mind?

Thank you for the help

A bit of CPU information
8 Big, 6 Little, 8 Bi.

Architecture, Endianness
Alpha, Bi
ARM, Bi
ARM, Bi
eSi-RISC, Bi
M32R, Bi
MIPS, Bi
SuperH (SH), Bi
Itanium (IA-64), Bi (selectable)
MC68K, Big
System/360 /System/370 /z/Architecture, Big
AVR32, Big
DLX, Big
Mico32, Big
MMIX, Big
PA-RISC(HP/PA), Big
SPARC, Big → Bi
PowerPC, Big/Bi
Series 32000, Little
VAX, Little
x86, Little
x86-64, Little
S+core, Little
Blackfin, Little

zeitue · Post by **zeitue** » Wed Jul 31, 2013 2:23 am

The Endianess of the system. It seems like the Big endian is easier to work with but slower to emulated which means I should go with the Little endian because my main is for speed. Please correct me if I'm wrong?

Each byte-order system has its advantages. Little-endian machines let you read the lowest-byte first, without reading the others. You can check whether a number is odd or even (last bit is 0) very easily, which is cool if you're into that kind of thing. Big-endian systems store data in memory the same way we humans think about data (left-to-right), which makes low-level debugging easier.

But why didn't everyone just agree to one system? Why do certain computers have to try and be different?

Let me answer a question with a question: Why doesn't everyone speak the same language? Why are some languages written left-to-right, and others right-to-left?

Sometimes communication systems develop independently, and later need to interact.

dozniak · Post by **dozniak** » Wed Jul 31, 2013 2:30 am

zeitue wrote:
Each byte-order system has its advantages. Little-endian machines let you read the lowest-byte first, without reading the others. You can check whether a number is odd or even (last bit is 0) very easily, which is cool if you're into that kind of thing. Big-endian systems store data in memory the same way we humans think about data (left-to-right), which makes low-level debugging easier.

But why didn't everyone just agree to one system? Why do certain computers have to try and be different?

Let me answer a question with a question: Why doesn't everyone speak the same language? Why are some languages written left-to-right, and others right-to-left?

Sometimes communication systems develop independently, and later need to interact.

This quote is pretty good for the explanation.

Big endian is not easier to work with, just the bytes in memory dump would look like a normal hex number as you would write it. With proper debugging tools you can just as easily work with PDP-endian numbers. This is a question of setting up tooling and knowing how to use it.

Most of the systems you will be emulating on are little endian by default (this includes x86/ARM and some MIPSes). I'm not expecting you to run the emulator on IBM zSeries or SPARC any time soon.
And if you empoy AOT/JIT it doesn't really matter.

zeitue · Post by **zeitue** » Wed Jul 31, 2013 2:48 am

dozniak wrote: Big endian is not easier to work with, just the bytes in memory dump would look like a normal hex number as you would write it. With proper debugging tools you can just as easily work with PDP-endian numbers. This is a question of setting up tooling and knowing how to use it.

Couldn't the bytes in the memory dump be then converted to Big endian to gain the same advantages?

dozniak wrote: Most of the systems you will be emulating on are little endian by default (this includes x86/ARM and some MIPSes). I'm not expecting you to run the emulator on IBM zSeries or SPARC any time soon.
And if you empoy AOT/JIT it doesn't really matter.

Agreed through I do intend to be able to run on all systems and make this kind of a universal compatibility layer(dreaming big on that one

)
So my machine will be a CISC little Endian. I think it should have 16 or 32 registers too because that seems to be the common for everyone but the X86-32 bit and Blackfin. though are more registers useful even if they highly exceed the host's?

dozniak · Post by **dozniak** » Wed Jul 31, 2013 5:35 am

Well, some of the best VMs employ infinite virtual register file, so think for yourself.

I really recommend you study VM designs first. To the point where you can reason about efficiencies and deficiencies of each one. Of course this implies you implement something yourself too, but mostly as POCs and small tests.

zeitue · Post by **zeitue** » Wed Jul 31, 2013 10:30 pm

dozniak wrote:Well, some of the best VMs employ infinite virtual register file, so think for yourself.

I really recommend you study VM designs first. To the point where you can reason about efficiencies and deficiencies of each one. Of course this implies you implement something yourself too, but mostly as POCs and small tests.

Thanks for the advice. I've been studying many types of virtual machines.
I'm currently looking for books that might help me.
As far as number of registers, I think should be a fixed number for two reasons.

1. I want to have fixed width instructions with a max of three operands, and I'm not sure how I can represent infinity out of only 32 for example?
2. real machines don't have infinity from what I know currently me if I'm wrong?

OSDev.org

Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine

Re: Cross Platform Virtual Machine