bytecode design

Mikemk · Post by **Mikemk** » Fri Apr 12, 2013 10:59 am

One of the main design aspects of my os is compatibility, and, more importantly, portability. As such, i find it logical to use a bytecode which can be interpreted on both intel and arm processors for applications.
My questions are do you have any suggestions, and what resources are their for making the compiler? (books, tutorials, ...)
My current idea is:
Each operation is 16 bytes:
PIIIRRRRDDDDDDDD
P - Prefix, set special conditions on the instruction (for example, a call instruction could use an external prefix for functions in external files, a native prefix for functions written in the processor's bytecode, etc.)
I - Instruction, see below
R - register, most operations will need to use at least one register (similiar to a pointer). Function definitions can be addressed as a register.
D - either a 64 bit immediate value, or two 32 bit registers, depending on instruction and prefix.

My design for the interpreter is the following psuedocode (in nasm):

Code: Select all

dd instruction 0 ; to filled in at runtime
dd register 0
dq data 0
bytecode_interpret:
call load_next_instruction
mov eax, [instruction] ; read the instruction
and eax, 0xFFFFFF ; and just the instruction
cmp eax, <maxvalue> ; make sure it's a valid opcode
jg .error ; invalid opcode
shl eax, 1 ; two bytes per jump
add eax, instructions ; add offset to base address
call far cs:eax ; execute the instruction
.error:
;do something and end the program.

instructions:
jmp i1
jmp i2
...

i1:
; do something
retf
i2:
; do something else
retf

The file format will be something along the lines of:

Code: Select all

-----------------------
| Header              | 512 bytes
-----------------------
| Symbol table        | length defined in header, defines registers needed at program start (such as main())
-----------------------
| Text section        | contains definitions for code registers, read/execute only
-----------------------
| Data section        | contains definitions for data registers, read/write unless otherwise specified
-----------------------
| Comments            | contains comments and file attributes not needed for execution (Such as version, program name, etc.)
-----------------------

iansjack · Post by **iansjack** » Fri Apr 12, 2013 11:51 am

16 bytes seems a bit long (no pun intended). Do you really need 3 bytes to specify an instruction? That's quite a number of distinct instructions. And 4 bytes to specify a register? Again, an incredibly large number of registers.

bluemoon · Post by **bluemoon** » Fri Apr 12, 2013 11:58 am

m12 wrote:One of the main design aspects of my os is compatibility, and, more importantly, portability.

I'm just curious what make "source level portability" not an option?

For cross-target support, there are more than byte code solution:
1. source code level portability, obviously
2. byte code, eg java
3. also byte code solution, but uses LLVM and do install-time machine code generation
4. bundle prebuilt modules, don't laugh at it, mac used this approach and the executable is usually small compared to other resources anyway.
5. package manager and download needed version, this usually deal with updates as well [however the core of the OS may be shipped as (4)]

Mikemk · Post by **Mikemk** » Fri Apr 12, 2013 1:12 pm

bluemoon wrote:
m12 wrote:One of the main design aspects of my os is compatibility, and, more importantly, portability.
I'm just curious what make "source level portability" not an option?

For cross-target support, there are more than byte code solution:
1. source code level portability, obviously
2. byte code, eg java
3. also byte code solution, but uses LLVM and do install-time machine code generation
4. bundle prebuilt modules, don't laugh at it, mac used this approach and the executable is usually small compared to other resources anyway.
5. package manager and download needed version, this usually deal with updates as well [however the core of the OS may be shipped as (4)]

if you use a higher level language such as java or c# to achieve 1, it'll already output a bytecode. Most potential application developers aren't going to want to be using c or c++ to write their apps
How is interpreting the java bytecode any more efficient than one actually meant to work with the os?
3 is a good idea - I may have my bytecode be recompiled to the native processor bytecode on install. One of the reasons I chose to use a bytecode over the native machine code is because it's potentially more secure. However, if my os outputs the machine code, then I'll have a somewhat higher degree of control.
I'm already intending to use 5. I don't quite understand 4 (How am I supposed to know what software the user will want?)

bluemoon · Post by **bluemoon** » Fri Apr 12, 2013 1:40 pm

For (4) I meant bundle all possible (or supported) architectures, aka universal binary. One of the headache from install-time optimization is that the compiler should be very stable, you want to avoid the embarrassing situation that the user upgraded the compiler, fixed some bug and breaks any nasty application - on the other hand a pre-built application should always work.

dozniak · Post by **dozniak** » Fri Apr 12, 2013 4:40 pm

Why not take LLVM and the bitcode? At higher levels it can be machine-independent. The whole framework is there, with the optimization passes etc.

Mikemk · Post by **Mikemk** » Fri Apr 12, 2013 4:50 pm

I'd rather not include third party objects in the os.

FallenAvatar · Post by **FallenAvatar** » Fri Apr 12, 2013 6:28 pm

m12: I'm looking to develop my OS to use bytecode as well. However, I'm not sure how close your goals are to mine. So i will describe those briefly, then describe my plan to meet those goals, followed by listing my advice for how you should go about designing your bytecode.

Goals:

Protect the user from viruses (and trojans, and hacks, and blah blah blah)
Not prevent an intelligent user from doing anything
Complete Process Isolation (From kernel, other processes, hardware, etc)
Portability

My Plan:
I will use Install-time-compilation for ALL executables (with signing, verification, etc) to compile byte code to machine code. The reason for this instead of bundled executables or any other option is that:

Verify that the executable does not contain any "protected" instructions
Prevent "self-mutating" executables (Virus is the main reason here)
Allowing vendor verification (That the executable is from who it claims to be, without modification)

The idea is that "technically" everything could be run in ring-0 (for performance reasons) because we trust our own installer/compiler. Also, the signing and such will be optional, but only for "development" builds, with a very BIG warning to the user. The idea is security, with minimal performance loss for the user, very little productivity loss for dev's, and a relatively secure system.

Advice:
I have yet to start designing my bytecode, however, I have some places you can look for help along the way, and maybe some extra thoughts/knowledge that might help you. For bytecode design, the obvious places to look first would be MSIL and JVM specs. See how they define an instruction. Do they use a fixed length? Variable length? (I'm betting variable) What instructions do the actually support?

I can speak about MSIL much more reliably here, but I would assume JVM is the same or similar in most regards. MSIL has a very distinct set of instructions, that generally, mimic assembly instructions, with several additions. The reason for this, as I understand it, is that assembly is usually 1 assembly instruction = 1 op-code for the CPU. MSIL tries to follow this as closely as possible for performace, minimization of surface area for bugs, and their own sanity.

So MSIL has a lot of instructions that translate 1-for-1 with op-codes for the above reasons. Anything beyond that is too support the CLR and DLR of .Net. The main idea to take away from this is the following:

Save yourself as many headaches as possible (Don't reinvent the wheel, ADD a, b means a+b)
Look at your LCD (Lowest common denominator) What instruction do x86 and ARM both support? This is your baseline
What will devs need a lot? (64-bit math on a 32-bit machine, 128-bit floating point arithmetic, etc) Start adding instructions to support
Figure out what each platform supports natively (SSE, SSE2,3,4, FPU, etc.) to make these 1-to-1 conversions
Figure out how to emulate the above instructions on other CPUs that don't support them natively.
For everything left over, figure out why CPU vendors do not support it (Overhead to create vs. ease of "doing it" [maybe 2-3 instructions instead of 1?])
If you must add a completely new "feature" figure out how to do it (Feature = op-code in your bytecode)

Once you have that list, count how many instructions you need. Round up to a power of 2, and if less than some % (up to you) of op-codes availble, increase by a power of 2. This is your base instruction set, and this will determine the size of your binary files/op-codes/etc. Once you have that, then you can start saying, "Well, we have 129 Op-Codes, so we clearly need 8 bits to fully declare the op-code." That is your base op-code size.

Now start figuring out how many ways you can reference data "in" an op-code. You cleary have x number of "registers" (emulated, or physical) so you must be able to reference those. What about literal values? Should a literal be 8-bit? 16? 32? 64? What about relative addressing? Maybe the next word could define a signed literal? Or maybe the JMP op-code supports jumping to a 48bit address? etc. These are things you need to evaluate after you have your "op-code list" semi-done.

For an example of how to do things, I really recommend checking out the Specs for the DCPU-16. Not because its "perfect", its far from it. But it provides a good "virtual" op-code system that does the bare minimum required. Look at how the instructions are formed. Use it where it makes sense, throw it away where it doesn't.

Just my 2-cents,
- Monk

Mikemk · Post by **Mikemk** » Fri Apr 12, 2013 7:17 pm

@tjmonk, add ease of ui customization to that list, and our goals are pretty close to identical.
Also, thank you for the really good pointers. My bytecode design is actually based on the dalvik vm bytecode, with the fixed instruction length of arm. (Not sure if dvm is fixed length)
I think the basic instructions I need to support are:
move, swap, add, subtract, multiply, divide, jump, conditional jump, call
so, 9 necessary operations, round up to 16, plus whatever I choose to add.
one byte should be sufficient for the first version, but better yet, 4 bit prefix, and twelve bits for the instruction.
I can then chop off a byte from registers, and chop off the third register field, shortening the bytecode to eight bytes. For immediate values, I can define a special register, indicating that the next instruction is immediate value data.
For 150000 lines of code, assuming roughly 3 operations each, I'm looking at about 5.14MB of per application, plus data (icons, readmes, etc.)
15 MB isn't too bad for a 2D tablet game is it? Of course, that's a best case scenario, but...

FallenAvatar · Post by **FallenAvatar** » Fri Apr 12, 2013 7:59 pm

m12: I see a number of problems with your response (Feel free to justify a problem I see and say it is necessary, that's the game we play

)

m12 wrote:@tjmonk, add ease of ui customization to that list, and our goals are pretty close to identical.

This is indeed a design decision I plan to solve as well. But it is just that, a Design Decision, not an architecture one.

m12 wrote: I think the basic instructions I need to support are:
move, swap, add, subtract, multiply, divide, jump, conditional jump, call
so, 9 necessary operations, round up to 16, plus whatever I choose to add.

If that is all your bytecode can support, I would recommend just using assembly + actual op-codes + verification. And you are missing some crucial ones, namely cli/sti (for time critical code), int (for interrupts/syscalls), among many others, unles you have a CLR type enviroment, where all syscalls are inside an library(s) provided by your OS that has access to protected instructions.

Also, I missed a goal of mine, but it is that "untrusted" executables from other OS can be executed, namely Linux (GNOME only to start with) and Windows (Win32 api only to start with) Where binary apps can be run. These can be secure by allowing a "registry" that is not global as Micro$oft's is. But, continuing...

m12 wrote: one byte should be sufficient for the first version, but better yet, 4 bit prefix, and twelve bits for the instruction.

You have 9 op-codes atm. Why plan for 65536???? (4+12 = 16 bits 2^16....)

m12 wrote: I can then chop off a byte from registers, and chop off the third register field, shortening the bytecode to eight bytes. For immediate values, I can define a special register, indicating that the next instruction is immediate value data.

I think you are missing the point. For a VM/bytecode implementation, you don't fix the size of op-codes. You make then as small as possible in most cases, and in the "ab-normal" cases make them bigger. NOW, in CPU creation, the design is done such that the circuits such as those that are used the most are fastest (<=1 cycle) and others are 2 or more cycles. You need to understand this and build it into your design.

If you take an instruction that is 1 cycle to compute, but make it 4 bytes big in your code (when the CPU only needs 1 or 2 bytes) you are adding memory accesses on top of the compute time. (A single instruction might be 1 cycle, but a RAM [not cache] access might be 100+ cycles. Don't even mention HD or network access here hahaha) So you have to keep all of this in mind. And even more is that you need to realize hardware is designed in such a way that it handles the common case far better than the ab-normal case.

Chainging the "playing field" drastically is possible, but you must do it in a way that convinces others to follow along... (This is how M$ has the market share they do...)

m12 wrote: For 150000 lines of code, assuming roughly 3 operations each, I'm looking at about 5.14MB of per application, plus data (icons, readmes, etc.)
15 MB isn't too bad for a 2D tablet game is it? Of course, that's a best case scenario, but...

A 5.14MB App??? o.O I would not worry about that, And more importanly, 1 line of code != 1 line/instruction of assembly. In reality, it is incredibly common that one line of code >= 3 lines of assembly.

[[All numbers are "pulled from my rear" but based on experience/knowledge/reality as best as I am capable of....]]

- Monk

Mikemk · Post by **Mikemk** » Fri Apr 12, 2013 9:35 pm

tjmonk15 wrote:m12: I see a number of problems with your response (Feel free to justify a problem I see and say it is necessary, that's the game we play )

OK

m12 wrote:@tjmonk, add ease of ui customization to that list, and our goals are pretty close to identical.
This is indeed a design decision I plan to solve as well. But it is just that, a Design Decision, not an architecture one.

true

m12 wrote: I think the basic instructions I need to support are:
move, swap, add, subtract, multiply, divide, jump, conditional jump, call
so, 9 necessary operations, round up to 16, plus whatever I choose to add.
If that is all your bytecode can support, I would recommend just using assembly + actual op-codes + verification. And you are missing some crucial ones, namely cli/sti (for time critical code), int (for interrupts/syscalls), among many others, unles you have a CLR type enviroment, where all syscalls are inside an library(s) provided by your OS that has access to protected instructions.

No, it's just the bare necessities. The environment will be more like clr or jre.

Also, I missed a goal of mine, but it is that "untrusted" executables from other OS can be executed, namely Linux (GNOME only to start with) and Windows (Win32 api only to start with) Where binary apps can be run. These can be secure by allowing a "registry" that is not global as Micro$oft's is. But, continuing...

I'd like to implement this as well, but I have a different method. I'll allow copyright holders to provide a wrapper layer for their apis in my bytecode, and I'll include a port command to convert executables to the bytecode. This however isn't immediately relevant, and probably needs to be thought out better.

m12 wrote: one byte should be sufficient for the first version, but better yet, 4 bit prefix, and twelve bits for the instruction.
You have 9 op-codes atm. Why plan for 65536???? (4+12 = 16 bits 2^16....)

It's 4096 opcodes. The prefix serves a completely different (and necessary) purpose. I'm not allowed to prepare for the future?

m12 wrote: I can then chop off a byte from registers, and chop off the third register field, shortening the bytecode to eight bytes. For immediate values, I can define a special register, indicating that the next instruction is immediate value data.
I think you are missing the point. For a VM/bytecode implementation, you don't fix the size of op-codes. You make then as small as possible in most cases, and in the "ab-normal" cases make them bigger. NOW, in CPU creation, the design is done such that the circuits such as those that are used the most are fastest (<=1 cycle) and others are 2 or more cycles. You need to understand this and build it into your design.

If you take an instruction that is 1 cycle to compute, but make it 4 bytes big in your code (when the CPU only needs 1 or 2 bytes) you are adding memory accesses on top of the compute time. (A single instruction might be 1 cycle, but a RAM [not cache] access might be 100+ cycles. Don't even mention HD or network access here hahaha) So you have to keep all of this in mind. And even more is that you need to realize hardware is designed in such a way that it handles the common case far better than the ab-normal case.

I'll admit I didn't think of this.

Really, it depends on how I implement it what the best case would be. if I choose to at install time convert it to native machine code, the bytecode size will hardly matter. if I choose to run the bytecode, though, then yes, this would be a good thing to keep in mind.

Chainging the "playing field" drastically is possible, but you must do it in a way that convinces others to follow along... (This is how M$ has the market share they do...)

<offtopic>
Have you taken a look recently at the market shares? For the devices I'm targeting, it's Google 75%, Apple 15%, Microsoft 2%, Others 8%
</offtopic>
I'm not gonna give an actual comment just yet on this.

m12 wrote: For 150000 lines of code, assuming roughly 3 operations each, I'm looking at about 5.14MB of per application, plus data (icons, readmes, etc.)
15 MB isn't too bad for a 2D tablet game is it? Of course, that's a best case scenario, but...
A 5.14MB App??? o.O I would not worry about that, And more importanly, 1 line of code != 1 line/instruction of assembly. In reality, it is incredibly common that one line of code >= 3 lines of assembly.

[[All numbers are "pulled from my rear" but based on experience/knowledge/reality as best as I am capable of....]]

I was also making my best guess. Instructions such as

Code: Select all

x %= x - function(x, y, z, class.staticfunction(z, y, x), new int[6][7][8]); // or your similiar line of code

would most certainly take up several (at least thirteen in this case) instructions.
There's no reason that

Code: Select all

x = y;

should take up more than one instruction though: mov x, y

Thank you for the discussion, btw. And for the (very) valid points.

Opcode · Post by **Opcode** » Fri Apr 12, 2013 11:02 pm

How about building a huffman tree based on the occurances of the opcodes in the executable. Then at the beginning of this file you store the table so you can walk the bytecode. Or you could define a universal tree based on average opcode "hits". This will of course take a performance hit, but I don't know what you Size v. Speed envelope is.

Mikemk · Post by **Mikemk** » Fri Apr 12, 2013 11:21 pm

Opcode wrote:How about building a huffman tree based on the occurances of the opcodes in the executable. Then at the beginning of this file you store the table so you can walk the bytecode. Or you could define a universal tree based on average opcode "hits". This will of course take a performance hit, but I don't know what you Size v. Speed envelope is.

greater speed is preferable.

But a tree based design may work
possible opcodes for each level include 0 through ef. opcodes f0 through ff represent that the next byte forms a new level, allowing a variable number of bytes, with the number of opcodes for each level being 240 * 16^n where n is the level number starting at 0

I will now make a table of opcodes.

dozniak · Post by **dozniak** » Sat Apr 13, 2013 3:20 am

Check out inferno/dis from plan9, I think they've been solving a lot of this for years.

jnc100 · Post by **jnc100** » Sat Apr 13, 2013 3:25 am

Will you insist on safe access to data or allow dereferencing abritrary pointers? If the former, you will need a concept of an object model at the bytecode level, and support for instructions such as loading and storing to array locations, fields of an object and static data, and creation and deletion of objects and arrays. If the latter, I don't see that there is anything to be gained from an additional level of interpretation/compilation where each instruction basically has a 1:1 mapping with the relevant machine code anyway.

Regards,
John.

OSDev.org

bytecode design

bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design

Re: bytecode design