Page 1 of 1

Ideas for high level assembly language

Posted: Wed Jun 11, 2008 8:48 pm
by AndrewAPrice
I thinking writing a curly-bracket high level assembly language for my console (see my other post). It's going to be a preprocessor which generates low-level assembly.

So far I've worked out how to do:
- basic C preprocessor commands (#include #define #ifdef #ifndef #else #endif)
- comments (// /* */ ;)
- functions (parameters, local variables, pre-declaring (e.g. if they're in another file))
- inline functions
- handling signed, unsigned, float, ascii values, compile time inline calculations
- do operations with commands to =, +, -, etc (only 1 operation per line)
- structures
- if / else conditions, while and do while loops
- local variables

Here is a dump of my text file while shows my ideas for the language (not really designed to be neat or readable):

Code: Select all

First pass commands: (same a c processor)
---------------------------------------------------------------
#include
#define
#ifdef
#ifndef
#else
#endif

Comments
---------------------------------------------------------------
// comment until the end of the line
; same as //
/* comment block */

Functions
---------------------------------------------------------------
// predeclare function for later use (
declare function_name(a0, a1, a2);

//                 v-- parameters
//                      v--registers to save to stack
//                              v--temporary name for register
func function_name(a0, a1, a2 = myReg) : (a3,a4,a5)
{
} // return

// cannot predefine these
inline func function_name(a0, a1, a2) : (a3,a4,a5)
{
}

// calling functions:
function_name(a0, 483s, @3.0f) // @ means do not push the
			       // register that parameter is
			       // using to the stack

Values
---------------------------------------------------------------
48923423 - raw number, unsigned
438238s - signed number
433838u - unsigned number
49312.f - float
0x3434 - hexidecimal unsigned number
'a' - ascii character converted to unsigned number
*493 - converts to [493]

can do inline calculations - e.g. 5s * 10s / 100s ONLY if all
types are the same type (unsigned, signed, or float)

Operations
---------------------------------------------------------------
= 	move
t++	increment
t--	decrement
t+=	add
t-=	subtract

where t is u(unsigned), s(signed), or f(float)
e.g.:
	c5 += c7
	c5 = 50.0f

Structures:
---------------------------------------------------------------
struct struct_name
{
    membername : 2; // 2 = size of member in bytes
    membername2 : 1;
}

struct struct2
{
    struct1 : struct_name;
    anothervar : 1;
}

to access it you would do something like:

address = 0x550 + struct2.struct1.membername

address = struct2.struct1.memername

Conditions and loops:
----------------------------------------------------------------
// first register is to test, second register stores temporary
// value of address (at least 32-bit register recommended)
// e.g.
//  (@b5) - use register b5 but don't push old value to stack
//  (b5) - use register b5 and temporarily push it's old
//          value to the stack

if(a0) : (b5) // if it's not zero
{
}
else // if it's zero
{
}

if(!a0) : (b5) // if it's zero
{
}
else // if it's not zero
{
}

while(a0) : (b5) // loop while not zero
{
}

while(!a0) : (b5) // loop while zero
{
}

do
{
}
while(a0) : (b5) // do while not zero

do
{
}
while(!a0) : (b5) // do while zero

Scoping:
----------------------------------------------------------------
local(a0, a1 = myReg, @a3 = myReg2) // pushes those registers to
		      		    // the stack so they can be
			            // temporarily used (myReg is
				    // a keyword for the register)
{
}
Comments, suggestions, features I should include? I've looked up similar HLAs and preprocessors but none seem to look generate "neat" code (being biased here - and I'm use to reading C/C++ code).

Posted: Wed Jun 11, 2008 11:22 pm
by Ready4Dis
I think you should just stick to the 'typical' way of declaring variables, rather than the hole :1, :2 thing. It will confuse people who are used to C/C++ that use that for bit sizes rather than byte sizes. I would just make a nice clean variable naming convention, like s8, u8, s16, u16, s32, u32, etc so it is not confused with pre-existing implementatiions.

I also suggest, to stick with typical ASM behavior, to not require pre-defining functions, it's simple enough to scan the entire file for function's prior to generating code, so function a at the top of the sources could call function x that is near the bottom without having to declare it. This is how normal assemblers works, and it works fine. Also, for declaring functions, I would make it so you need to declare the variable size and type. Also, how do you plan on working with pointers rather than variables (or array's). Also, why do you need to explicitly declare which temporary register to use?

How can you pass @3.0f without using the stack? Does it assume a specific register is used instead? Why not just have methods of passing parameters as a function type declaration (like C, stdcall, fastcall, etc). A stdcall will use the stack for all values, fastcall can use eax, ebx, ecx, edx for the first 4 variables, etc.

What about doing calculations on bytes that are numerical, rather than 'a', what if I wanted to deal with single byte numbers, like 25*25. Or, x:1 = 3 How do you do that since there is no default char type, and you can't type the value 3 in ascii.

I like the overall concept, a few things to work out and make clear. If you get serious, and need any help, let me know, I actually have some code to parse most of the syntax that you use, I was thinking about doing a similar thing (which is what my parser was for), although I didn't get to far due to lack of time.

Posted: Thu Jun 12, 2008 1:38 am
by JamesM
I must say, this doesn't look like assembly to me at all. It does in fact, look an awful lot like C--...

Posted: Thu Jun 12, 2008 2:07 am
by AndrewAPrice
Ready4Dis wrote:I think you should just stick to the 'typical' way of declaring variables, rather than the hole :1, :2 thing. It will confuse people who are used to C/C++ that use that for bit sizes rather than byte sizes. I would just make a nice clean variable naming convention, like s8, u8, s16, u16, s32, u32, etc so it is not confused with pre-existing implementatiions.
I know what you mean. The reason I did it that way is because structs are basically a collection of labels and offsets defining an area of memory:
labelname : bytes until next label
labelname : bytes until next label
etc..
Ready4Dis wrote: I also suggest, to stick with typical ASM behavior, to not require pre-defining functions, it's simple enough to scan the entire file for function's prior to generating code, so function a at the top of the sources could call function x that is near the bottom without having to declare it.
I got rid of that. Basically it was in there so you could have libraries in binary form, and you could tell the assembler "this function exists, and takes these parameters", so when it assembled it put in the instructions for passing in the parameters, but left in a place holder to later put the address when being linked. But for that I need a linker, and I don't want to rewrite my assembler to store labels externally. :(
Ready4Dis wrote:Also, for declaring functions, I would make it so you need to declare the variable size and type.
You declare the registers you use (in the example a0, a1, a2 are all 16-bit registers). If you wanted the parameter to accept a 32-bit value you'd use a 32-bit register. Any register can hold an unsigned integer, signed integer, or a float (I'm using my console's architecture, so what I'm saying might not work on an others), which is why when you do maths you have to specify which type of operation: s+=, u+=, f+=. There are assembly instructions to convert the value inside a register into a different type (signed integer<->unsigned integer<->float).
Ready4Dis wrote:Also, how do you plan on working with pointers rather than variables (or array's).
The directions data between registers can move on the CPU are:
32 bit raw value->register (down converted into register's size)
register<->register
register<->memory location
register<->memory location contained inside a register
So to move a value into a pointer you'd have to do the following:

Code: Select all

 // move integer '10' into 10 bytes past label "some label"
b0 = some_label // set up pointer
b1 = 10u
b0 += b1 // increase pointer by 10

*b0 = b1 // copy 10 into the memory address contained in b0 (it will take up 32 bits since b1 is a 32 bit register)
Addresses are just treated as if they are unsigned integers.
Ready4Dis wrote:Also, why do you need to explicitly declare which temporary register to use?
For convenience you list registers you are going to use because only those will be pushed on to the stack when you enter the function. This is mostly for optimisation since the system has a lot registers, and I don't want to push each one on to the stack if I have a function which only needs to change one or two registers.

It allows the user to optimise function calls them self, which is especially important since the assembler is targeting an emulated platform and I want to make programs as efficient as possible.
Ready4Dis wrote:Why not just have methods of passing parameters as a function type declaration (like C, stdcall, fastcall, etc). A stdcall will use the stack for all values, fastcall can use eax, ebx, ecx, edx for the first 4 variables, etc.
I do this because it's high-level assembly, not a high-level language. There are no 'variables', just registers, labels to addresses, and the stack. The programmer is still in control of which registers to use at all time.
Ready4Dis wrote:How can you pass @3.0f without using the stack? Does it assume a specific register is used instead?
Yes :) In the example the parameter is using register a2, so @3.0f will be copied into a2 before entering the function. If the @ wasn't specified, a2 would be pushed before setting it's value to 3.0f then popped upon return (another speed-freak optimisation).
Ready4Dis wrote:What about doing calculations on bytes that are numerical, rather than 'a', what if I wanted to deal with single byte numbers, like 25*25. Or, x:1 = 3 How do you do that since there is no default char type, and you can't type the value 3 in ascii.
There are 8 bit registers, so you can simply move the value 3 into an 8 bit register, then copy the 8 bit register into memory.

Your in control of every register, so you can't do complex maths like
c0 = (c1 * c2) + (c3 * c4)
since it will require temporary registers to store the results of (c1 * c2) and (c3 * c4).

It's impossible to do:
c0 u+= u2
since you can only increment a register with a register. You would have to do:
c1 = u2
c0 += c1

Something like:
c0 = u25 * u25 / u2
would be possible since the preprocessor could simplify that during compile time to:
c0 = u312

EDIT: I've also designed how member functions for structs would work (basically has the same affect as a normal function with one of the parameters being the address of the struct instance we're working with). It's optional but increases object oriented design reduces naming conflicts. This could also be extended to include struct inheritance.

Posted: Thu Jun 12, 2008 5:49 am
by Ready4Dis
Ok, it makes a little more sense now realizing that you are talking about actual register when saying a0, a1, etc. I thought those were variables, which is why I was unclear on what you were trying to accomplish.

For using functions declared in another binary, you can still use declare's, or some form of [extern FuncName], which is typical in assemblers, no need to use external file for references, and no reason to force declares either.

I understand what you are saying about a collection of lables, but isn't it just as easy to use the type that you want rather than the # of bytes? It just makes more sense to people who have programmed in C (which you are making this resemble). It isn't much different, and even putting a limit to one variable per line (no comma seperation) would be better (In my opinion!).

Again, the confusion of the registers threw me off with the declaration of the functions, it makes plenty of sense now why you simply put the register name. However, I don't see why you can't have variable's, even an assembler has variables (TestVar dd 0, etc). However, making them more readable, like u32 TestVar would be much simpler and readable. And you can then do c5 = TestVar, which would just use the address of TestVar as the parameter (aka, variable).

I was saying, that you said you need declare the type of # to be processed (48923423 - raw number, unsigned ,438238s - signed number , etc), none of those can be used to 8-bit values, so how do you use 8bit values in an expression? Or does signed number simply mean it will be a signed 8-bit number (based on the register size)?

So far looks like you've got most of it planned out, so good luck!

Posted: Thu Jun 12, 2008 7:00 am
by AndrewAPrice
Ready4Dis wrote:I understand what you are saying about a collection of lables, but isn't it just as easy to use the type that you want rather than the # of bytes? It just makes more sense to people who have programmed in C (which you are making this resemble). It isn't much different, and even putting a limit to one variable per line (no comma seperation) would be better (In my opinion!).
I'm thinking of using the preprocessor to define common types:
BYTE, WORD, DWORD, QWORD or even stick with C (char, short, int, long long). I prefer the former since it emphasises that it merely defines the size and not the type of value it holds.
Ready4Dis wrote:And you can then do c5 = TestVar, which would just use the address of TestVar as the parameter (aka, variable).
That was my idea. You can assign register names in functions and in local blocks to make it appear more readable, and variables in memory already have their labels defined anyway. I haven't considered storing local variables on the stack or in memory because it is a register saturated architecture. If you need over 1kb of local variables than you can push them to the stack (using a local block), but if that was the case I would consider splitting the function in 2 anyway.
Ready4Dis wrote:I was saying, that you said you need declare the type of # to be processed (48923423 - raw number, unsigned ,438238s - signed number , etc), none of those can be used to 8-bit values, so how do you use 8bit values in an expression? Or does signed number simply mean it will be a signed 8-bit number (based on the register size)?
The type is based on it's context.

There are only 4 situations when a number can be entered:
- When declaring a default value for variable, in which case it assumes the number entered is the same size as the variable (rounds down to an 8, 16, 32, 64 bit value). e.g. defining an 8-bit variable with the value 50s will make the compiler assume it's an signed 8-bit integer.
- When parsing addresses around, in which case everything is unsigned 32-bit.
- When writing to an IO device - most values are unsigned 32-bit unless hardware says else.
- When copying a value into a register it's based on the destination register. e.g. copying 18.43f into a 32-bit register will assume you mean a 32-bit float.

Of course you can override this in the full specification (by doing 10s8, 18.43f32) but it leads to messy code.

Posted: Thu Jun 12, 2008 7:35 am
by Ready4Dis
Sounds good, I am used to x86 development, not so many registers there, and much more memory :). Although, I do dabble in microcontroller programming as well, but mostly C. I know some microcontrollers have many registers (I think most of the Atmel AVR's I use have like 32 general purpose registers, way more than the 8 in an x86-32). Like I said, sounds like you got most of it planned out pretty well, so good luck on your venture. If you must use byte, word, etc... I would use that, I am not a fan of C's char, short, int, long, etc since they don't actually define the size. I know some microcontrollers have 12-bit words, and some odd size everything really, doesn't make it clear. That's why I define my own sizes like u8, s8, u16, s16, etc, so I know exactly the size I am working on, no interpretation required! so if I am working on a microcontroller that deals with 12-bit numbers, I can tell exactly by using u12 and s12, so i know where they overflow at :). I know that's a non-specific case, but it always annoyed me with C/C++, especially when transitioning between compilers that used different sizes, the whole uint16_t, sint16_t works as well, since you KNOW the size you're using, so i don't mind using that either, but I do prefer something with the # actually in the name so there is 100% certainty of it's size.

Posted: Fri Jun 13, 2008 10:33 am
by Ready4Dis
Oh just noticed a BIG problem in your design, was thinking about it today.

You said when you declare your functions, you use @ to designate that you do not wish to save the register, the problem is, people calling the function may or may not realize this. It would be better for the USER (where they call the function) to prepend a @ saying not to save the register, however this is almost as bad, because they need the full function declaration to determine which register is the proper one in this case as well. Either way this can cause a problem, however the second approach is safer (since most people won't use it unless they know what they are donig, rather than using it without realizing what it's doing!).

Posted: Fri Jun 13, 2008 5:09 pm
by AndrewAPrice
Sorry, I intended ! and @ to be used when calling the function.

I have function calling working (300+ lines of code to parse the function declaration :shock: ), local names for registers aren't yet.

But first I'll work on the emulator and fix some bugs I just found.

Posted: Fri Jun 13, 2008 11:46 pm
by Ready4Dis
Yeah, my mistake, I read it a bit incorrectly, you did say for function calling. Sounds like you're well on your way, when you are done, any chance we could get the sources to check it out? I am interested in modifying it to work on an x86, was going to write a high level assembler myself, but no need to do the work twice since I like your design so far. One thing I was curious about in your design. Do you plan to be able to use the stack if required (or wanted)?

Posted: Sat Jun 14, 2008 6:53 pm
by AndrewAPrice
Ready4Dis wrote:when you are done, any chance we could get the sources to check it out?
sure.. The preprocessor is written in portable C++.
Ready4Dis wrote:I am interested in modifying it to work on an x86, was going to write a high level assembler myself, but no need to do the work twice since I like your design so far. One thing I was curious about in your design. Do you plan to be able to use the stack if required (or wanted)?
The stack is only used for pushing old values so registers can be modified, and when the registers aren't needed any more their old value is popped back in.

It can easily be modified so that when calling a function the parameters are pushed to the stack instead. Some changes will have to be made so local variable names represent areas in the stack rather than registers.