My new project thing: EPBP
Posted: Fri Sep 26, 2008 10:28 am
I have begun a new project. I am wishing to get some opinions on it.
The name of it is EPBP. Expanded out as EPBP Portable Bytecode Project. The website is at http://epbp.earlz.biz.tm
It is a bytecode arch. designed to represent and be similar to "real" processors, and as such is rather low-level. It is even designed to replace Java, but not to be anything really like it. It works at a low level, but it is optimized to be emulated(or [JIT] compiled)
It has 256 registers (due to a sudden realization, this may be 128 registers later) and in this early version is 32 bit. Floating point instructions are mingled with the integer instructions. There is no separation of the float instructions from the integer instructions. (This is opposite the x86 which has fmov for floats and mov for ints, I have mov for floats and ints)
[edited]
The registers are not like in hardware CPUs though. The register mechanism is really only a "shortcut" to a range of memory. Basically, the register bank address is set and then the registers are offset to this bank address. Such as, if the register bank was 0x1000, the memory address of register 4 will be 0x1000+(4*SIZE_REGISTER)
[/edited]
There are 256 floating point registers are well. They addressed the same as the general registers.
For emulation optimization, all math operations do exactly what they say, only math. No conditional flags are set or evaluated (except for if true and if false instructions, discussed later).
For getting outside of the CPU, rather than using in/out or special memory areas, there is only a thing called xcall. xcall takes two arguments, one is module number, the other is function number. This enables for the virtual machine to be extensible quite easily. Module number 0 and 1 is reserved. It is used for the core EPBP "library." It includes things such as simple printf/getc and malloc and basically just what almost every application expects to need.
For conditionals, rather than having multiple flags, and branching based on these flags, there is only one flag register in EPBP, call the Truth Register(TR). There are compare instructions that compare if a condition is true between two values and if so then it sets the truth value. This is easiest to show by example so here:
This will jump to _greater if r1 is greater than r3, and if not greater than jump to _not_greater
Note: Instructions work similar to x86. They are from right to left:
such as to move r1 to r3 you would do
It is that way with the cgt instruction because it's almost as if you could just replace the commas of all the opcodes with a math operator:
cgt r1 > r3 and mov r1=r3
The EPBP arch is like that of the x86 and is little endian.
The opcode memory is separate from the data memory. Anything can be done with data memory, but the code memory is neither readable nor writable. Function pointers are very cumbersome to do, but in a hackish way, possible. This is a clear disadvantage, but I feel the trade off is worth it. This will make it so that it can very easily by compiled once into native code and run without having to carry the virtual machine or a JIT compiler with it. As long as xcalls are emulated in an expected way and any endianess problems are checked. This also makes the EPBP arch more secure by design. No invalid call addresses can be possible, so therefore, there is no need to check for them.
Now having an invisible code segment has one inherit problem. What about call's and ret's? If they push and pop values off of the programmer visible stack, can't these values just be hacked and ret to any address? Well, I thought about that, and after a long session of thinking of all the possible ways, I came up with the idea of the Call Stack.
The call stack is programmer semi-invisible. It is only writable by special instructions and only by immediate, constant values. This makes it so the interpreter/compiler can check these constant values and be sure they point to valid code addresses.
Whenever a call is done, a value is pushed on the call stack. Whenever a ret is done, a value is pushed off the stack. It is vaguely possible to edit this stack and cause your program to crash by screwing up the program flow. There are instructions such as popc (pop code stack) and pushc (push code stack) and what not, which would make it very possible to mess up a program's flow very quickly. But all these code addresses are checked to be valid at runtime initialization. This means, your program might go crazy, but your not going to get a GPF.
If you have any ideas or opinions about this project, please post it! I'm dying to hear some outside info about it.
The name of it is EPBP. Expanded out as EPBP Portable Bytecode Project. The website is at http://epbp.earlz.biz.tm
It is a bytecode arch. designed to represent and be similar to "real" processors, and as such is rather low-level. It is even designed to replace Java, but not to be anything really like it. It works at a low level, but it is optimized to be emulated(or [JIT] compiled)
It has 256 registers (due to a sudden realization, this may be 128 registers later) and in this early version is 32 bit. Floating point instructions are mingled with the integer instructions. There is no separation of the float instructions from the integer instructions. (This is opposite the x86 which has fmov for floats and mov for ints, I have mov for floats and ints)
[edited]
The registers are not like in hardware CPUs though. The register mechanism is really only a "shortcut" to a range of memory. Basically, the register bank address is set and then the registers are offset to this bank address. Such as, if the register bank was 0x1000, the memory address of register 4 will be 0x1000+(4*SIZE_REGISTER)
[/edited]
There are 256 floating point registers are well. They addressed the same as the general registers.
For emulation optimization, all math operations do exactly what they say, only math. No conditional flags are set or evaluated (except for if true and if false instructions, discussed later).
For getting outside of the CPU, rather than using in/out or special memory areas, there is only a thing called xcall. xcall takes two arguments, one is module number, the other is function number. This enables for the virtual machine to be extensible quite easily. Module number 0 and 1 is reserved. It is used for the core EPBP "library." It includes things such as simple printf/getc and malloc and basically just what almost every application expects to need.
For conditionals, rather than having multiple flags, and branching based on these flags, there is only one flag register in EPBP, call the Truth Register(TR). There are compare instructions that compare if a condition is true between two values and if so then it sets the truth value. This is easiest to show by example so here:
This will jump to _greater if r1 is greater than r3, and if not greater than jump to _not_greater
Code: Select all
cgt r1,r3 ;is r1 greater than r3 ? if so, then set TR, else reset TR
jit _greater ;Jump If True
jif _not_greater ;Jump If False
such as to move r1 to r3 you would do
Code: Select all
mov r1,r3
cgt r1 > r3 and mov r1=r3
The EPBP arch is like that of the x86 and is little endian.
The opcode memory is separate from the data memory. Anything can be done with data memory, but the code memory is neither readable nor writable. Function pointers are very cumbersome to do, but in a hackish way, possible. This is a clear disadvantage, but I feel the trade off is worth it. This will make it so that it can very easily by compiled once into native code and run without having to carry the virtual machine or a JIT compiler with it. As long as xcalls are emulated in an expected way and any endianess problems are checked. This also makes the EPBP arch more secure by design. No invalid call addresses can be possible, so therefore, there is no need to check for them.
Now having an invisible code segment has one inherit problem. What about call's and ret's? If they push and pop values off of the programmer visible stack, can't these values just be hacked and ret to any address? Well, I thought about that, and after a long session of thinking of all the possible ways, I came up with the idea of the Call Stack.
The call stack is programmer semi-invisible. It is only writable by special instructions and only by immediate, constant values. This makes it so the interpreter/compiler can check these constant values and be sure they point to valid code addresses.
Whenever a call is done, a value is pushed on the call stack. Whenever a ret is done, a value is pushed off the stack. It is vaguely possible to edit this stack and cause your program to crash by screwing up the program flow. There are instructions such as popc (pop code stack) and pushc (push code stack) and what not, which would make it very possible to mess up a program's flow very quickly. But all these code addresses are checked to be valid at runtime initialization. This means, your program might go crazy, but your not going to get a GPF.
If you have any ideas or opinions about this project, please post it! I'm dying to hear some outside info about it.