Page 1 of 1

Method of Assembly

Posted: Tue Feb 05, 2008 4:35 pm
by Alboin
I've been thinking of a method of assembly.

Normally, when an assembler assembles a file, it parses the file, and then uses tables and such to decide how to translate it to assembly.

What I've been thinking of doing is using a form of a context free language that then assebles the code based upon how it evaluates. eg.

Code: Select all

(reg "EAX" 1)
(reg "AX" 1) 
(reg "AL" 1)
(reg "AH" 4)

(modrm m16 (0x06 a))
(modrm reg16 (bin (0x18 5) ((reg a) 3)))

(instr "AAA" 0x37)

(instr "ADD" "AL" imm8 (0x04 a))
(instr "ADD" rm16 imm16 (0x81 (modrm a) b))
In the code, the last argument is the definition, the first is the function name, and everything in between is the function's operands. (Regular expressions, if you will. ATM, they're just types that are known by the system. (eg. m16="[0x12345]"))

The lispy-code could be compiled into bytecode, which could be called from the main assembler. eg.

Code: Select all

char *assembled_code = eval_cfg(cfl_vm, "instr", line);
(Line is a line of assembly. "add ax,33442")

Because of the overloading of the definitions, when called, eval_cfg would simply evaluate the given string, and return the result.

What do you think? Have I lost my mind? 8)
Alboin

Posted: Tue Feb 05, 2008 5:12 pm
by Combuster
You just hit an old project of mine spot-on. :D

The main idea was that you were able to add new languages to an assembler in an easy manner, preferrably without recompiling. The advantages would be that you could choose your favorite syntax for lexing purposes, and in my case, use intel syntax for all architectures that normally have AT&T syntax, and for you possibly, vice-versa.

I even envisioned that semantics could be associated with assembler instructions, that would allow a generic optimizing compiler to be built, as well as other tools allowing code to be directly ported to other architectures. Which would in turn allow us to pick a random language with a working compiler, and instantaneously have it produce optimized code for any platform imagineable. But that's plain daydreaming as I figured it wouldn't get to completion given the other projects I do beside it.

I might still even have some prototype code somewhere. At least it included a linker that was scriptable (right now it outputted PE code with a command-line provided message for when run from DOS mode)

Oh the old times when I had time to waste.

Posted: Tue Feb 05, 2008 7:48 pm
by babernat
A couple of things.

So we're on the same page, you're saying this is a CFL language because of the following lines?

Code: Select all

(modrm m16 (0x06 a)) 
(modrm reg16 (bin (0x18 5) ((reg a) 3))) 
and

Code: Select all

(instr "ADD" "AL" imm8 (0x04 a)) 
(instr "ADD" rm16 imm16 (0x81 (modrm a) b)) 
Most of the other lines seem regular to me.

Here are some of my initial thoughts after not having thought about this as much as you all have. My experience compiling these kinds of languages is non-existent but anyway...

It's an interesting concept. What would you hope to gain by doing things this way? I guess I'm asking what does this provide over simply writing a mix of C or lisp?

It may be a little difficult to write a parser that will generate correct code off of this because you have to take into account weird things like naming conventions between the code generated by artifact of going from CFL to RL. By this I mean I don't think it'll be as simple as walking a parse tree and "blindly" generating code.

On the flip side, this could be more "efficient" than C in certain cases. One thing that comes to mind is in multiprocessor/core scenarios. Because of the way the lisp-style class of languages are parsed, etc you end up with something that's easier to run in parallel. This is a known feature, but the exact name & details escape my mind right now.

Definitely an interesting concept. I'll have to think about it some more.

Posted: Tue Feb 05, 2008 8:02 pm
by Alboin
I'm sorry for my tainted mind. The lisp-ish code you see is simply the syntax I chose for my definition as it is easy to parse. (Easier than a {} language.)

Here's a maybe 'better' definition... ;)

Think C preprocessor. Now give your macros the ability to overload with regexp's. (Like C++ functions.) Now evaluate. This is essentially the idea, as assembly is essentially substitution. (ie. the same as a #define in C, only a tad more complex.)

Another way to think about it is that you're just making a very small functional scripting language, and writing the definitions for each assembly item to be assembled.

Posted: Tue Feb 05, 2008 8:08 pm
by babernat
Ah OK, that makes more sense. Thanks for clearing that up.

Posted: Tue Feb 05, 2008 9:43 pm
by iammisc
Personally, I don't think this makes much of a difference. I once implemented a simple assembly language in gas for a virtual machine I made and it was really quite easy. It might take a few more hours but it since it is in C, I think you'll probably save hours when compiling a whole lot of files.

Posted: Tue Feb 05, 2008 10:04 pm
by Alboin
iammisc wrote:Personally, I don't think this makes much of a difference. I once implemented a simple assembly language in gas for a virtual machine I made and it was really quite easy. It might take a few more hours but it since it is in C, I think you'll probably save hours when compiling a whole lot of files.
This is, IMO, a lot clean than having massive tables. (See the NASM source. They have actual perl scripts to create C sources based off of a master table.)

Moreover, it should be a lot more extensible as one could simply compile their language's definitions (To bytecode, as said above.), and use them without having to download and compile another assembler.

Also, if taken to Combuster's extent, optimizations could be taken for assembly languages not even known to the assembler.

A rock is a great tool, but a diamond toothed saw is much nicer. ;)

Posted: Wed Feb 06, 2008 12:37 am
by binutils
http://lambda-the-ultimate.org/node/2146

maybe some kind of ml-family pl(PS: coq written in ocaml), using lc to output assembly.