Writing an Assembler in C++
Posted: Mon Dec 27, 2010 7:47 pm
Hello OSDevers,
I've recently decided to start work on writing an assembler. I chose C++ because that's what I'm good at, and I have completed all of the parsing code and now comes the part where I have to start learning to continue banging on the keys programming. I'm not sure where to go, but I need to know how to do symbol tables, encoding operations for different object output file formats, etc. I want to learn this; it's the reason I started this project, but I cannot find a good resource. I remember one time having a reference showing exactly what asm "nmemonic" command (such as "mov") went with each machine language one-zero combination bytes... I feel like this would be very helpful but then I also need to know about symbol tables for file formats other than flat binary. Resources, links, and any prompt help is greatly appreciated,
Brodeur235
Here is what I've done so far for the syntax parsing part of the code. It works and was fairly simple to do; I just thought I'd include it for the sake of wasting server space. (just kidding).
I've recently decided to start work on writing an assembler. I chose C++ because that's what I'm good at, and I have completed all of the parsing code and now comes the part where I have to start learning to continue banging on the keys programming. I'm not sure where to go, but I need to know how to do symbol tables, encoding operations for different object output file formats, etc. I want to learn this; it's the reason I started this project, but I cannot find a good resource. I remember one time having a reference showing exactly what asm "nmemonic" command (such as "mov") went with each machine language one-zero combination bytes... I feel like this would be very helpful but then I also need to know about symbol tables for file formats other than flat binary. Resources, links, and any prompt help is greatly appreciated,
Brodeur235
Here is what I've done so far for the syntax parsing part of the code. It works and was fairly simple to do; I just thought I'd include it for the sake of wasting server space. (just kidding).
Code: Select all
/*
Nick's Experimental Assembler (NEA)
*/
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
using namespace std;
/*
globals
*/
// cumulative stream to keep track of errors, syntax or otherwise
ofstream err_str("nea.errors");
// assists with error checking by keep track of the current line being parsed/assembled
int line_num = 0;
// prototypes
void get_next_op(ifstream& fins, vector<string>& tokens);
void parse_op(const string& op, vector<string>& tokens);
void skip_white_space(const string& op, int& index);
string get_token(const string& op, int& index);
int main()
{
vector<string> t;
ifstream file("source.nea");
while(file.good()) {
get_next_op(file,t);
for(unsigned int i = 0; i < t.size(); i++)
cout << t[i] << " ";
cout << endl;
while(!t.empty()) t.pop_back();
}
file.close();
err_str.close();
return 0;
}
/*
Parsing operations
*/
/*
@param1: stream to read next op from
@param2: vector representation of tokens used as return value
*/
void get_next_op(ifstream& fins, vector<string>& tokens)
{
string op = "";
string line = "";
if( fins.is_open() )
for(int i = 0;;i++) {
if(i!=0) line_num++;
getline(fins,line);
op += " " + line;
if( line[line.size()-1] != '\\' )
break;
else
op = op.substr(0,op.size()-1);
}
parse_op(op, tokens);
return;
}
/*
@param1: string representation of op to parse for operation & operands
@param2: vector representation of tokens used as return value
*/
void parse_op(const string& op, vector<string>& tokens)
{
int index = 0;
while(index < op.size()) {
skip_white_space(op, index);
if(index >= op.size()) break;
tokens.push_back( get_token(op, index) );
}
return;
}
/*
@param1: string representation of op
@param2: index in op, updated to past white space for return value
*/
void skip_white_space(const string& op, int& index)
{
while( (op[index]==',' || op[index]==' ' || op[index]=='\t') && index < op.size() )
index++;
return;
}
/*
@param1: string representation of op to gather token from
@param2: index in op, updated past token for return value
@return: token
*/
string get_token(const string& op, int& index)
{
string token = "";
// if it's a literal, take special action
if(op[index]=='"') {
token += op[index];
index++;
while(index<op.size()) {
if(index>0)
if(op[index]=='"' && op[index-1]!='\\')
break;
if(index==(op.size()-1))
err_str << "[" << line_num << "] Literal started with a \", but never closed with another quotation." << endl;
token += op[index];
index++;
}
token += '"';
index++;
return token;
}
// Otherwise, treat as usual, e.g. whitespace and commas delimit the tokens end.
while( op[index]!=' ' && op[index]!='\t' && op[index]!=',' && index < op.size() ) {
token += op[index];
index++;
}
return token;
}