My new project thing: EPBP

earlz · Post by **earlz** » Fri Sep 26, 2008 10:28 am

I have begun a new project. I am wishing to get some opinions on it.

The name of it is EPBP. Expanded out as EPBP Portable Bytecode Project. The website is at http://epbp.earlz.biz.tm

It is a bytecode arch. designed to represent and be similar to "real" processors, and as such is rather low-level. It is even designed to replace Java, but not to be anything really like it. It works at a low level, but it is optimized to be emulated(or [JIT] compiled)

It has 256 registers (due to a sudden realization, this may be 128 registers later) and in this early version is 32 bit. Floating point instructions are mingled with the integer instructions. There is no separation of the float instructions from the integer instructions. (This is opposite the x86 which has fmov for floats and mov for ints, I have mov for floats and ints)
[edited]
The registers are not like in hardware CPUs though. The register mechanism is really only a "shortcut" to a range of memory. Basically, the register bank address is set and then the registers are offset to this bank address. Such as, if the register bank was 0x1000, the memory address of register 4 will be 0x1000+(4*SIZE_REGISTER)
[/edited]
There are 256 floating point registers are well. They addressed the same as the general registers.

For emulation optimization, all math operations do exactly what they say, only math. No conditional flags are set or evaluated (except for if true and if false instructions, discussed later).

For getting outside of the CPU, rather than using in/out or special memory areas, there is only a thing called xcall. xcall takes two arguments, one is module number, the other is function number. This enables for the virtual machine to be extensible quite easily. Module number 0 and 1 is reserved. It is used for the core EPBP "library." It includes things such as simple printf/getc and malloc and basically just what almost every application expects to need.

For conditionals, rather than having multiple flags, and branching based on these flags, there is only one flag register in EPBP, call the Truth Register(TR). There are compare instructions that compare if a condition is true between two values and if so then it sets the truth value. This is easiest to show by example so here:
This will jump to _greater if r1 is greater than r3, and if not greater than jump to _not_greater

Code: Select all

cgt r1,r3 ;is r1 greater than r3 ? if so, then set TR, else reset TR
jit _greater ;Jump If True
jif _not_greater ;Jump If False

Note: Instructions work similar to x86. They are from right to left:
such as to move r1 to r3 you would do

Code: Select all

mov r1,r3

It is that way with the cgt instruction because it's almost as if you could just replace the commas of all the opcodes with a math operator:
cgt r1 > r3 and mov r1=r3

The EPBP arch is like that of the x86 and is little endian.

The opcode memory is separate from the data memory. Anything can be done with data memory, but the code memory is neither readable nor writable. Function pointers are very cumbersome to do, but in a hackish way, possible. This is a clear disadvantage, but I feel the trade off is worth it. This will make it so that it can very easily by compiled once into native code and run without having to carry the virtual machine or a JIT compiler with it. As long as xcalls are emulated in an expected way and any endianess problems are checked. This also makes the EPBP arch more secure by design. No invalid call addresses can be possible, so therefore, there is no need to check for them.

Now having an invisible code segment has one inherit problem. What about call's and ret's? If they push and pop values off of the programmer visible stack, can't these values just be hacked and ret to any address? Well, I thought about that, and after a long session of thinking of all the possible ways, I came up with the idea of the Call Stack.
The call stack is programmer semi-invisible. It is only writable by special instructions and only by immediate, constant values. This makes it so the interpreter/compiler can check these constant values and be sure they point to valid code addresses.

Whenever a call is done, a value is pushed on the call stack. Whenever a ret is done, a value is pushed off the stack. It is vaguely possible to edit this stack and cause your program to crash by screwing up the program flow. There are instructions such as popc (pop code stack) and pushc (push code stack) and what not, which would make it very possible to mess up a program's flow very quickly. But all these code addresses are checked to be valid at runtime initialization. This means, your program might go crazy, but your not going to get a GPF.

If you have any ideas or opinions about this project, please post it! I'm dying to hear some outside info about it.

pcmattman · Post by **pcmattman** » Fri Sep 26, 2008 5:10 pm

Sounds interesting, but...

the memory address of register 2 will be 0x1000+(4*SIZE_REGISTER)

I fail to see the logic in that...

Troy Martin · Post by **Troy Martin** » Fri Sep 26, 2008 10:59 pm

pcmattman wrote:Sounds interesting, but...

the memory address of register 2 will be 0x1000+(4*SIZE_REGISTER)
I fail to see the logic in that...

Yeah, aren't registers normally built into the processor and not in memory? Bad plan.

Will post more tommorow morning. I'm tired.

pcmattman · Post by **pcmattman** » Fri Sep 26, 2008 11:03 pm

I was meaning more the (4*SIZE_REGISTER) when it's register 2.

I actually quite like the concept of in-memory registers. Especially considering it's a bytecode project

.

earlz · Post by **earlz** » Fri Sep 26, 2008 11:10 pm

[EDIT]
yea, that would actually be 2, lmao.. Not to good at counting
[/EDIT]

Well, it is designed to be emulated well. If the host CPU has only 8 registers(ie, x86) then it would be pointless to act like EPBP had more registers to access. The main point is, hardly any CPU out there has 256 registers(or 128) so in the emulation, these must be emulated registers that are actually stored in memory. If we were to say "they are actual hardware registers" then this may lead programmers to make unneccessary attempts at optimization by using registers exclusively. It is designed to work equally well on all platforms, not just those with a lot of CPU registers. If you had ever attempted to make an emulator, you would see how registers have basically the same access time as memory in emulation. (Though some things can be faster with registers, such as no memory address checking and it can be aligned on a 4byte boundary)

and if your next question is "why have this 'fake register' thing at all?" then here is my answer:
It is usually not convenient to only have memory and immediate values to use. What if you want to access a 32bit address in several instructions? This would clearly be a waste of bytecode space. So, registers are done this way for programmer convenience and for bytecode density.

[EDIT]
Btw, the current work right now is designing the opcode map, which is much harder than I first imagined. I am not even assigning actual numbers right now and am rather creating a list of all the proposed opcodes and from there setting priorities based on how optimized the length and such of it is and if it is really needed or can leave that space open for other, better, opcodes.

About 60 proposed opcodes at the moment, only about 20 are priority level 1 though.

mystran · Post by **mystran** » Tue Oct 14, 2008 6:32 am

IMHO the idea of using a lot of registers is a sensible one for a VM. On the other hand, fixing them to given address range is not. Also allowing direct memory access to the virtual registers is not a great idea.

If you specify 256 (or whatever) registers, but only allow them to be accessed as registers (like real registers on real ISA), then a JIT compiler can map as many of them into real registers (doing register allocation ofcourse) as it can, and implement the rest in a way that is practical for the architecture it's targeting. On x86 for example, you'd probably want the rest of the virtual registers to be pushed into the stack (but not necessarily in order, rather do register allocation handling the virtual registers like if they were variables, and push them to stack to free more real registers where necessary).

edit: also consider a stack machine with offsets from the current stack-top towards the bottom.. it's often much easier to write a compiler to target such a system. It's also easier to compile such bytecode further into native.

lollynoob · Post by **lollynoob** » Tue Oct 14, 2008 8:12 pm

you know those BRILLIANT IDEAS that people(nerds) always come up with
but in reality the idea is pretty dumb

i think this is one of those ideas

Troy Martin · Post by **Troy Martin** » Tue Oct 14, 2008 8:31 pm

lollynoob, I think I speak on behalf of everyone here by saying this: You are an insensitive jerk.

that people(nerds) always come up with

Says the guy posting in an OS development forum.

01000101 · Post by **01000101** » Tue Oct 14, 2008 9:10 pm

[rant]

Troy Martin wrote:lollynoob, I think I speak on behalf of everyone here by saying this: You are an insensitive jerk.

that people(nerds) always come up with
Says the guy posting in an OS development forum.

I somewhat resent that statement. I hang around an OS development forum, but I do not consider myself a 'nerd'. I find non-tech related activities far more enjoyable such as skateboarding, BBQ's, working out, or being with my girlfriend. I, also, would not place all tech ideas as coming from programming cubical-dwellers.

I doubt you could speak for everyone in the first place, and secondly, some credit must be given to someone who speaks their mind without reserve for feelings. If everyone shielded everyone else from their real thoughts, everyone would be living in blissful ignorance.

Now, if someone says something negative about your idea/project/etc... and the majority says otherwise, then what is the worry? But if that one person sparks agreement, then maybe the idea/project should be re-evaluated/assessed to find and troubleshoot the flaws (notice I did not say quite/give up).
[/rant]

What exactly is the practical use for this project? Is it meant to aid in emulation of CPU-level instructions?

Troy Martin · Post by **Troy Martin** » Tue Oct 14, 2008 9:19 pm

[replytorant]
I think I was also trying to point out that we're not nerds, we're very computer-literate. Somehow.
[/replytorant]

I think the point of the project is similar to my "9900 CPU" project, to help aid in the learning of the inner workings of CPUs and assembly language. That and to teach the author about said things too (also a goal of my project.)

earlz · Post by **earlz** » Wed Oct 15, 2008 11:45 am

Wow, lot's of replies...

to rant/reply: no response

For the register bit, I have changed it now to where you can change where the registers are stored using a chbank(for integer registers) and chfbank(for float registers). So now you can change where all the registers are by a simple instruction(great for context switching). Also, I have decided the SR(stack register) is not included in these registers for practical reasons. Such as if you were to change the context of the registers, it would render the stack inaccessible and useless, without using abritary addresses(which are frowned upon) for "temporary storage".

Ok, and for the whole stack machine thing. I don't think that is the goal of my project, plus if I did that I would have to trash all my current code and study a cpu arch I have never used before. Also, I would like asm programming in my arch to be mildly feasible to do. I also feel that in an interpretation approach that the stack machine would be slower than a register machine.(though it would be faster in a JIT compile approach). If you can prove me wrong on this, then go ahead.. lol.. Just what I see is a waste of memory access in stack machines..

The practical use is like that of Java. The big advantage of EPBP over Java though is this:

1. Not tied to a certain language. (Java technically isn't tied to just the Java language, but the way the bytecode is designed, it very much favors it so much as to get in the way of porting other languages)

2. No copyright issues. With Java there are a numerous amount of limitations in their license. This (iirc) includes a section that basically states "if you change the code of the OpenJDK, you can not make the binary publicly accessible." which gets in the way of porting OpenJDK to a system and then distributing it as a compiled package. EPBP is under the very unrestrictive 3 clause BSD license.

Although I would like to say something about how modules and what not are more portable with EPBP, I don't know about Java's module support, so I don't know if it's better or worse.

The way EPBP's module technique is also makes it possible to use EPBP on limited devices. For instance, those without any user input methods or graphics or sound, assuming the EPBP application does not rely exclusively on the IO methods.

Also, I feel that EPBP programs should be perfectly capable of running from a browser. The Xcall method could of course be limited so that viruses could be prevented(disallow file IO, etc). The way EPBP is designed for safe-code also makes it to where 1 check could be done on the code and if it passes that check, then the program should be impossible to crash other than a bad Xcall extension.

This project is not necessarily aid in the learning of CPUs and assembly, (though that is a fun experience also.. lol) but more to help bridge the portability gap between the Windows, Mac, and Unix worlds, and to promote more development targeted to a portable world. This can make it possible to create commercial EPBP applications and them instantly work on any of these OSs, assuming they use nothing but portable non-propietary EPBP Xcall modules. (which is where there can be problems.)

SlowByte · Post by **SlowByte** » Thu Oct 30, 2008 3:07 pm

OP: If you're serious about this, study these:
http://www.parrot.org/ (a register-based virtual machine designed to be interpreted)
http://www.lua.org/ (fast register-based VM)
http://llvm.org/ (a virtual machine designed to be JITted/compiled)

Virtlink · Post by **Virtlink** » Fri Oct 31, 2008 6:08 pm

hckr83 wrote:Ok, and for the whole stack machine thing. I don't think that is the goal of my project, plus if I did that I would have to trash all my current code and study a cpu arch I have never used before. Also, I would like asm programming in my arch to be mildly feasible to do. I also feel that in an interpretation approach that the stack machine would be slower than a register machine.(though it would be faster in a JIT compile approach). If you can prove me wrong on this, then go ahead.. lol.. Just what I see is a waste of memory access in stack machines..

First of all, I want to say that what you want is probably entirely possible. However, as far as I know, Java is a stack-based language. This would also be logical, since you don't know the architecture the application will run on. In Java, the bytecode is compiled to machine code just before execution, and this compiler may optimize as much as it wants. For example, when a method uses three local variables, the compiler may decide not to put them on the stack but store them in hardware registers. Any code reading from or writing to those local variables would use the hardware registers instead of the stack.

And furthermore, say I want to add two values. How do you (or the executor/compiler/emulator thing) know which registers to read and where to put the result? When you have 255 possible registers, this means that the instruction will be four bytes long, at least. Just a thought...

And the user will be limited to using a maximum of 255 of the following: local variables, arguments to functions, of all methods together? Because once the user calls another method, you'll need a place to put the local variables and such. Most likely, this will be the stack.

Stack-based languages are actually very simple: you push two values on the stack, and when doing addition, it pops the two values, adds them together and pushes the result back on the stack. When you want to call a method, you first push the arguments on the stack just as in C.

- Virtlink

OSDev.org

My new project thing: EPBP

My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP

Re: My new project thing: EPBP