Yet another bytecode design (Antti's bytecode)

embryo · Post by **embryo** » Mon Jul 14, 2014 1:54 am

Antti wrote:I am planning to design and implement a simple bytecode specification. Here is the initial plan.

I'd prefer to design VM first. If there is a VM then the bytecode can have some solid groung and many bytecode design decisions can be argued using VM's requirements. But if there is a bytecode first, then it will impose severe constraints on a potential VM and it looks like not very efficient.

But as an exercise the bytecode design is interesting, would it be first or second.

max · Post by **max** » Mon Jul 14, 2014 9:08 am

embryo wrote:
Antti wrote:I am planning to design and implement a simple bytecode specification. Here is the initial plan.
I'd prefer to design VM first. If there is a VM then the bytecode can have some solid groung and many bytecode design decisions can be argued using VM's requirements. But if there is a bytecode first, then it will impose severe constraints on a potential VM and it looks like not very efficient.

But as an exercise the bytecode design is interesting, would it be first or second.

imho, the vm should have to implement the bytecode specification and not vice versa. the bytecode shouldn't be adapted to fit some special needs that the vm has. This later allows executing the bytecode on anything that implements the specification.

AndrewAPrice · Post by **AndrewAPrice** » Mon Jul 14, 2014 10:17 am

max wrote:imho, the vm should have to implement the bytecode specification and not vice versa. the bytecode shouldn't be adapted to fit some special needs that the vm has. This later allows executing the bytecode on anything that implements the specification.

It depends on what you're trying to do.

If you're trying to design something like the .Net runtime, then you would design the language and at least the requirements of the VM first, and figure out a bytecode that could.

If you're trying to design some portable bytecode that can have many implementations and source languages, then you'd start with the bytecode.

In my case, I started with a source programming language then figured out how I could represent it in a compiled bytecode format. I designed the language first, then the bytecode (keeping in mind to make it easy to interpret), then the VM. I made my bytecode a low-level representation of the high level language - I replaced loops with conditional jumps, but I kept the type system.

Schol-R-LEA · Post by **Schol-R-LEA** » Mon Jul 14, 2014 1:15 pm

There are still plenty of questions to resolve about the bytecode ISA in question, or at least which you haven't explained to us even if you have. For example:

What is the total addressing range?
Are instruction addresses absolute or relative?
What is the total size of the register file? Are all the registers general-purpose, or are some dedicated to specific purposes (e.g., instruction pointer, stack pointer, frame/base pointer, etc)?
Will the special-purpose registers, if any, be mapped to the general register file, or will they be separate?
If the special-purpose registers are separate, would you require them to be accessed only by special-purpose instructions?
If all the registers are general, will there be any conventions for how they are used (like in MIPS and to a lesser extent ARM)?
Will you have a Zero register (that is, a register which is permanently set to zero), like in MIPS?
How will you move data from register to register? From register to memory, and vice versa? From memory to memory?
Will you have any immediate format instructions, and if not, how will you initialize memory values?
Will you use a load/store architecture, or will you allow arbitrary memory instructions? (That is to say, will all of the arithmetic and logical operations be done only in registers, or will they be able to work to and from memory directly?)
How will you handle nilary (zero-operand) and unary (one-operand) instructions in the instruction stream?
How are signed integers represented? (2's-complement would be the obvious solution, but it isn't a given.)
How will you indicate CPU conditions, if at all?
How will you handle arithmetic overflows and underflows?
Will multiplication and division require pairs of registers, or will you have a separate double-size register for those operations?
Will you have special instructions and/or a special memory range for I/O (a la the x86), or will it all be memory-mapped (like most newer designs)?
Will you have anything like a (simulated) interrupt mechanism? Software interrupts (traps)? Exception interrupts (e.g., division by zero)?

As has already been said, it would be a good idea to look into existing Instruction Set Architecture designs other than the x86, both real (MIPS, ARM, M68K) and virtual (p-code, JVM, LLVM, .NET CLI), to get a broader idea of what can be done in an ISA, and get a feel for what works and what doesn't and why.

Finally, consider this: most bytecode systems are stack-based (that is, the majority of the instructions operate directly on the values at or near the top of the stack), but hardly any hardware implementations are. Why? What impact does the use of memory to simulate registers (which in a hardware CPU is usually an order of magnitude or more faster than memory access) have on this design issue? What impact does the design and compilation of high-level languages (e.g., Pascal, Java, Python) have on the choice to use stack machine bytecodes instead of register-machine bytecodes? And how does designing a VM versus a real CPU alter the decision to use complex-action instructions (e.g., memory-to-memory block moves), rather than relying on simpler ones?

Antti · Post by **Antti** » Tue Jul 15, 2014 3:51 am

I will answer the questions after I have the draft specification. It should not take too long. I was planning to have two different address spaces: code and data.

Code: Select all

    CODE    (2 * 0x100000000) bytes
    DATA    (4 * 0x100000000) bytes

CODE is not even readable. The addressable unit is 32-bit wide for data.

Code: Select all

    MOV R0, 0x01
    MOV R1, [R0]    ; R1 = 32-bit unsigned value from byte address 0x000000004 (DATA)

    XOR R2, R2
    JMP R0+R2       ; Jump to byte address 0x0000000002 (CODE)

This is a little bit scary...

embryo · Post by **embryo** » Tue Jul 15, 2014 4:16 am

max wrote:imho, the vm should have to implement the bytecode specification and not vice versa. the bytecode shouldn't be adapted to fit some special needs that the vm has. This later allows executing the bytecode on anything that implements the specification.

Bytecode is a slave entity, it is not used by a programmer directly. But it's purpose is to provide a well defined interface for it's interpreters. The last fact can mislead you about the importance of the bytecode. Without understanding of the environment, in which the bytecode will be used, it is impossible to design useful bytecode. But it is still possible to design a nice bytecode.

Schol-R-LEA · Post by **Schol-R-LEA** » Tue Jul 15, 2014 6:25 am

Antti wrote: I was planning to have two different address spaces: code and data.

You might want to look up Harvard Architecture to get an idea of the advantages and disadvantages of this approach.

Antti · Post by **Antti** » Wed Jul 16, 2014 12:32 am

embryo wrote:Bytecode is a slave entity, it is not used by a programmer directly.

I will use it. Of course I am not writing it manually but using assembly. A high-level language will be introduced much later.

One good thing is that I cannot fail. Even if this did not work, I would still learn a lot and I am more prepared for the second attempt. Like I said, I try to make this ready at one go. This is not a project that evolves once it is ready. There will be no version 1, version 2, etc. There are exactly two versions: draft and final.

embryo · Post by **embryo** » Wed Jul 16, 2014 4:01 am

Antti wrote:I will use it. Of course I am not writing it manually but using assembly. A high-level language will be introduced much later.

One good thing is that I cannot fail. Even if this did not work, I would still learn a lot and I am more prepared for the second attempt.

Yes, I agree. As a learning attempt such exercise is really interesting and should bear some useful fruits! And future will show the bigger picture when next learning attempt will touch a language or VM. It's almost standard for many people learning by trial and error and without it there wouldn't be a new Einstein.

onlyonemac · Post by **onlyonemac** » Thu Jul 17, 2014 2:20 pm

MessiahAndrw wrote:You could memory map your ALU, your logic unit (e.g. 0x1234 equals 0x1233 if 0x1232 is true else it equals 0x1231), your instruction pointer, your memory indirection unit (reading/writing 0x87 reads and writes at the address stored in 0x88), etc. and have a fully functioning computer. I think you would have efficiency issues with dealing with functions that can be recursive, as everything would be based around absolute addresses.

Actually, I'm going to register-map my ALU. My instruction set has two instructions: one to copy a value from one register to another (the registers are numbered, and some implement special functions like addition) and another to copy one register to the memory location specified by another. That way I don't have to use memory-mapped registers and all the absolute addressing inefficiencies.

MessiahAndrw wrote:...including a stack register that pops/pushes on reads and writes.

That's exactly how I was going to implement the stack! Plus I would need another "stack pointer" register to allow direct modification of the stack pointer (such as during context switches and so on).

OSDev.org

Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)

Re: Yet another bytecode design (Antti's bytecode)