Page 1 of 2

Im Tired of NASM & MASM & TASM & all those other

Posted: Fri May 30, 2008 5:39 pm
by TannerGooding
I finally got so fed up with all the assemblers including NASM that won't compile my code into how i need it and the fact that i am horrible at C++ makes it even worse. So I Decided to go and write my own assembler in C# using a string parser and binary reader/writer. It's completley Intel Syntax Compatable as documented in the Intel Architecture Developer's Manual. No more compile problems for me. Just have to finish working out the bugs and get it approved by Sourceforge. Then I'll post a link to it. Tell me what you think...:)

Posted: Fri May 30, 2008 5:44 pm
by Alboin
You're going to have a hard time porting it to your OS. ;)

Posted: Fri May 30, 2008 6:33 pm
by kmcguire
So I Decided to go and write my own assembler in C# using a string parser and binary reader/writer. It's completely Intel Syntax compatible as documented in the Intel Architecture Developer's Manual. No more compile problems for me. Just have to finish working out the bugs and get it approved by Sourceforge. Then I'll post a link to it. Tell me what you think..
That is cool. Does it actually emit machine code yet? What sort of bugs does it have right now? That is awesome if you get it polished up.

Posted: Fri May 30, 2008 7:55 pm
by TannerGooding
As of now the assembler will read and write files using the System.IO.BinaryReader & System.IO.BinaryWriter classes and byte[] arrays. It has the complete Intel instruction set up to the year 1997. The instructions are accessed through two classes. The public struct Instruction and public class InstructionSet. The public struct Instruction contains... byte Opcode, string Mnemonic, byte[] Arguments, string[] ValidArgumentType, and string[] InstructionFormat... These properties allow a complete reference to all the properties of an instruction as listed in the Intel Architect Developer's Manual. The public class InstructionSet contains each INTEL instruction as a Instruction[] array because each mnemonic allows different arguments. These instructions are grouped by 7 Instruction[][] arrays, x86, x386, MMX, FPU, System, Prefix, and Pseudo instruction sets... the compiler runs through by default in the order of Pseudo, Prefix, x86, x386, FPU, MMX... This order can be changed by editing the properties file... I have written very minimalistic programs using this already. I still must allow for comments which i plan to have written to an external file, and for labels. It also contains a very minamilstic hex editor which currently only writes in decimal form...

Posted: Sat May 31, 2008 6:28 pm
by kmcguire
Wow, That sounds like the alternative to a macro-assembler. If I understand you right then we both built our assemblers by creating a large table of all the instructions (and their variations) versus the macro-assembler where each mnemonic has a function associated called a macro where code makes decisions about what bits and bytes to emit.

I really like the table idea as I feel it is a slight bit more powerful and easier instead of hard coding with macros. It just might be a tad slower, but thats okay I think.

Posted: Sun Jun 01, 2008 10:59 am
by bewing
I am beginning to write my assembler, too. I am also going the "large table" route. It took me most of a day to take the latest PDF version of the Instruction Set manual, and cut-and-paste all of the opcodes, mnemonics & usage info out of it, into a table. It seems to me that such a table can be easily shared between an assembler, a debugger, and a disassembler. And more opcodes can be very conveniently added to the table. So it's best not to hardcode the table into the assembler in any way.

Posted: Sun Jun 01, 2008 11:27 am
by suthers
That's exactly how I feel, I'm witting my own too but in C++...
Jules

Posted: Sun Jun 01, 2008 1:56 pm
by TannerGooding
http://sourceforge.net/project/showfile ... _id=229633

This is CISBA (C# Intel Syntax Binary Assembler) release 0.0.1
It contains a basic hex editor and the source code and licensing for the project. The project is still in the pre-alpha stage and currently will not assemble any files. The source code shows how the assembler will work and depending on how fast i can get the instructions added to the tables the first official beta release should be out within the next month. If you have any questions or suggestions for the code please contact me at [email protected] or just reply with a post here.

Re: Im Tired of NASM & MASM & TASM & all those o

Posted: Sun Jun 01, 2008 2:37 pm
by SpooK
TannerGooding wrote:I finally got so fed up with all the assemblers including NASM that won't compile my code into how i need it and the fact that i am horrible at C++ makes it even worse. So I Decided to go and write my own assembler in C# using a string parser and binary reader/writer. It's completley Intel Syntax Compatable as documented in the Intel Architecture Developer's Manual. No more compile problems for me. Just have to finish working out the bugs and get it approved by Sourceforge. Then I'll post a link to it. Tell me what you think...:)
Have you ever stopped to think that it may not be the assemblers... and that it may possibly be your lack of understanding???

It is easy to give up and blame things because you don't understand them, or haven't learned them completely/correctly.

NASM/FASM/TASM/MASM/GoASM/POASM are tried and true assemblers... you may want to investigate and evaluate why your code will not assemble.

Your other option is, to continue with your assembler... and deal with that fact that your lack of understanding will spill into the design and cause even more problems for you.

Your call.

Re: Im Tired of NASM & MASM & TASM & all those o

Posted: Sun Jun 01, 2008 6:56 pm
by kmcguire
SpooK wrote: Have you ever stopped to think that it may not be the assemblers... and that it may possibly be your lack of understanding???

It is easy to give up and blame things because you don't understand them, or haven't learned them completely/correctly.

NASM/FASM/TASM/MASM/GoASM/POASM are tried and true assemblers... you may want to investigate and evaluate why your code will not assemble.

Your other option is, to continue with your assembler... and deal with that fact that your lack of understanding will spill into the design and cause even more problems for you.

Your call.
Hey, slow down there. The wheel has been reinvented millions of times and there is no way you are going to convince someone not to do it again. I see no harm in someone trying to write an assembler because they feel the current selection of assemblers are not doing what they want.

And, his lack of understanding will teach him with out you mortifying his idea or plans with some moot choice speech about bad design and problems. You are currently hanging out on a operating system development forum and I expect you to waste some time lest you think writing an assembler is inferior erudition.
bewing wrote: It seems to me that such a table can be easily shared between an assembler, a debugger, and a disassembler.
Yeah, that is what I have read and figured too.

Posted: Mon Jun 02, 2008 11:53 am
by bewing
I agree with kmcguire, and disagree with SpooK. OSdeving is all about reinventing the wheel. That is what we do here.

And 30 years ago, I was using an assembler with an executable size of about 40K. The size of the source code (in C) was about the same. That is what I want for my OS. If anyone can point me to a public domain assembler that is smaller than 50K of source, I will happily use it in my OS. I don't want to spend a couple months writing an assembler if I don't have to. If anyone wants to point out to me which of the assemblers listed above fit that size criterion, I'm all ears. *listens to the silence*

Posted: Mon Jun 02, 2008 1:00 pm
by Combuster
if anyone wants to point out to me which of the assemblers listed above fit that size criterion, I'm all ears.
I've (jointly) written a java compiler that's 37k in source (it got me an A+ for languages and compilers :)), so assemblers in the same size-category are a distinct possibility.

However, i've just looked at YASM's keyword map for the x86 (the one that defines the bytes for each opcode, numbers for registers, and CPU types by identifiers) and its 43k already, which strongly suggests that the main reason it would be hard to shove an assembler into 40k is the sheer bloat of the instruction set. :(

Posted: Mon Jun 02, 2008 1:26 pm
by Korona
My assembler consists of 6.8k source lines of code.
For now it can just assemble commonly used 32 bit protected mode instruction and it has a somewhat strange semantics:

Code: Select all

assign i, 0
proc make_pd_entry ; defines a "goto label" for the assembler, not a label or something that is put into the output file
	; reserve space for the system pde
	bits32 coredat_systemTables - CORE_DISP + i * PT_BYTES + 3
assign i, i + 1
jumpif make_pd_entry, i < SYSMEM_PG_TABLES
The code is very bloated, each instruction is represented by a class

Code: Select all

final class NatAddRm32Gpr32 extends NativeInst {
	private RmSibAddress address;
	
	public constructor(RmSibAddress address) {
		this.address = address;
	}
	
_pre:ifdef VM_WITHOUT_COLLECTOR
	// inherited from Korona.Object
	public void __destruct() {
		address.__destruct();
		Collector.delete(_nat:addressof(this));
	}
_pre:endif
}
If the instruction data was stored in a more compact structure it would be possible to write an assembler for the entire instruction set in under 15k or 20k lines of source code.

Posted: Mon Jun 02, 2008 5:45 pm
by bewing
@Combuster -- yes, as said above, I created a file of all the opcodes, and it is 22K -- but also, I am intending to keep that as a completely separate data file. It will never be encoded into the assembler itself. The assembler will be data driven, from an opcode standpoint. All mnemonics will be looked up in this external file, and converted to the proper bytes according to the translation specified in the file.

@Korona -- 6.8K sounds about right to me. :D

Posted: Tue Jun 03, 2008 4:14 am
by Brendan
Hi,

It seems to me that the most obvious limitation of current 80x86 assemblers isn't the assembler itself - it's the optimizer/s. Basically "optimizing" means choosing the version of the instruction with the shortest encoding, and *nothing* else. It's no surprise that compilers frequently do a better job.

How about an assembler with a basic peephole optimizer, that will also track instruction dependencies and rearrange "basic blocks" so that I don't need to write unmaintainable spaghetti code (unless I'm working on the small part of my project where I actually want to hand-optimize)? I'm guessing most assembly language programmers know what I mean by "spaghetti code" - for e.g.:

Code: Select all

   mov eax,[foo]
   mov ebx,[bar]
   add eax,2
   add ebx,3
   shr eax,1
   shr ebx,2
Instead of:

Code: Select all

   mov eax,[foo]
   add eax,2
   shr eax,1

   mov ebx,[bar]
   add ebx,3
   shr ebx,2
Side note: Out of curiosity, how many people here realize that Intel's newest 80x86 CPU architecture (Atom) does not do "out of order" execution? :shock:

Just thinking out loud.... ;)


Cheers,

Brendan