Page 1 of 1

Writing an Assembler in C++

Posted: Mon Dec 27, 2010 7:47 pm
by brodeur235
Hello OSDevers,

I've recently decided to start work on writing an assembler. I chose C++ because that's what I'm good at, and I have completed all of the parsing code and now comes the part where I have to start learning to continue banging on the keys programming. I'm not sure where to go, but I need to know how to do symbol tables, encoding operations for different object output file formats, etc. I want to learn this; it's the reason I started this project, but I cannot find a good resource. I remember one time having a reference showing exactly what asm "nmemonic" command (such as "mov") went with each machine language one-zero combination bytes... I feel like this would be very helpful but then I also need to know about symbol tables for file formats other than flat binary. Resources, links, and any prompt help is greatly appreciated,

Brodeur235


Here is what I've done so far for the syntax parsing part of the code. It works and was fairly simple to do; I just thought I'd include it for the sake of wasting server space. (just kidding).

Code: Select all

/*
Nick's Experimental Assembler (NEA)
*/

#include <string>
#include <vector>
#include <fstream>
#include <iostream>
using namespace std;

/*
globals
*/

// cumulative stream to keep track of errors, syntax or otherwise
ofstream err_str("nea.errors");
// assists with error checking by keep track of the current line being parsed/assembled
int line_num = 0;

// prototypes
void get_next_op(ifstream& fins, vector<string>& tokens);
void parse_op(const string& op, vector<string>& tokens);
void skip_white_space(const string& op, int& index);
string get_token(const string& op, int& index);

int main()
{
        vector<string> t;
        ifstream file("source.nea");
        
        while(file.good()) {
                get_next_op(file,t);
                
                for(unsigned int i = 0; i < t.size(); i++)
                        cout << t[i] << " ";
                cout << endl;
                
                while(!t.empty()) t.pop_back();
        }
        file.close();
        
        err_str.close();
        
        return 0;
}

/*
Parsing operations
*/

/*
@param1: stream to read next op from
@param2: vector representation of tokens used as return value
*/
void get_next_op(ifstream& fins, vector<string>& tokens)
{
        string op = "";
        string line = "";
        if( fins.is_open() )
                for(int i = 0;;i++) {
                        if(i!=0) line_num++;
                        getline(fins,line);
                        op += " " + line;
                        if( line[line.size()-1] != '\\' )
                                break;
                        else
                                op = op.substr(0,op.size()-1);
                }
        parse_op(op, tokens);
        
        return;
}

/*
@param1: string representation of op to parse for operation & operands
@param2: vector representation of tokens used as return value
*/
void parse_op(const string& op, vector<string>& tokens)
{
        int index = 0;
        while(index < op.size()) {
                skip_white_space(op, index);
                if(index >= op.size()) break;
                tokens.push_back( get_token(op, index) );
        }
        return;
}

/*
@param1: string representation of op
@param2: index in op, updated to past white space for return value
*/
void skip_white_space(const string& op, int& index)
{
        while( (op[index]==',' || op[index]==' ' || op[index]=='\t') && index < op.size() )
                index++;
        return;
}

/*
@param1: string representation of op to gather token from
@param2: index in op, updated past token for return value
@return: token
*/
string get_token(const string& op, int& index)
{
        string token = "";
        
        // if it's a literal, take special action
        if(op[index]=='"') {
                token += op[index];
                index++;
                while(index<op.size()) {
                        if(index>0)
                                if(op[index]=='"' && op[index-1]!='\\')
                                        break;
                        if(index==(op.size()-1))
                                err_str << "[" << line_num << "] Literal started with a \", but never closed with another quotation." << endl;
                        token += op[index];
                        index++;
                }
                token += '"';
                index++;
                return token;
        }
        
        // Otherwise, treat as usual, e.g. whitespace and commas delimit the tokens end.
        while( op[index]!=' ' && op[index]!='\t' && op[index]!=',' && index < op.size() ) {
                token += op[index];
                index++;
        }
        return token;
}

Re: Writing an Assembler in C++

Posted: Tue Dec 28, 2010 6:45 am
by fronty
You will find lots of stuff which you won't probably care about when writing simple(ish) assembler, but it won't hurt to read one. Berkus already mentioned Aho et. al., but I wouldn't recommend it, Engineering a compiler by Cooper et. al.

Re: Writing an Assembler in C++

Posted: Tue Dec 28, 2010 12:34 pm
by Thomas
Hi,
Good resource is Aho and Ullman book.
Have you read that book at least once ?. From what i remember there is not even a single chapter about writing an assembler. The best resource is System Programming by Leland S Beck . But the author targets something known as SIC ( a hypothetical computer). But the concepts still apply. First few chapters cover the following
(a) Writing a simple 2 pass assembler
(b) Writing a single pass assembler
(c) Writing a macro processor :- The algorithm given in the text book not that great. The text book assumes that the language you use does not support recursion. But you can come up with a better algorithm quite easily.
(d) Writing an integrated macro processor + assembler
An assembler in its simplest form is little more than a simple look up table. You really do not require a complicated parser for an assembler.

I heard good reviews about this book as well : http://www.amazon.com/Systems-Programmi ... 185&sr=1-1

--Thomas