PORTABLE OBJECT CODE -- plz post ur comments
-
- Member
- Posts: 566
- Joined: Tue Jun 20, 2006 9:17 am
Dont worry......
Parsing is nt that hard....,you can make use of paser generators like
yacc or bison....,May be you can make you generic code more readable
by allowing higher level constructs(This requires a Parser)......this is what Ryndall Hyde has done with hla....But it'is written in hla itself and very diffcult to port....... , if you make something like that in C or pascal ... it will be far
easier to port .... and also compiler's can target your translator as a target
than the bare machine......This makes making portable compilers quite
easier...... I presented this as a paper ...but they did'nt like it very much.............
yacc or bison....,May be you can make you generic code more readable
by allowing higher level constructs(This requires a Parser)......this is what Ryndall Hyde has done with hla....But it'is written in hla itself and very diffcult to port....... , if you make something like that in C or pascal ... it will be far
easier to port .... and also compiler's can target your translator as a target
than the bare machine......This makes making portable compilers quite
easier...... I presented this as a paper ...but they did'nt like it very much.............
Hi,
For e.g. you'd code something like:
And the compiler might generate:
Would produce something like:
Of course this is extremely unoptimized (and not tested)...
If the language had no registers at all, then you'd produce something like this:
This is also extremely unoptimized, but it'd be easier for the compiler to see that SI, DI and BP are unused and decide to use SI:DI instead of "[variable]" and "[variable+2]".
Cheers,
Brendan
A better idea would be for the language to have no registers at all (where all variables are in memory), and for the compiler/optimizer to automatically decide to use CPU registers instead of the variables in memory where appropriate.hckr83 wrote:ok so to code for optimizations on some archs, you use 100s of registers, so on those few archs that do support, it will be fast...but what do you do when the target arch only has 16?
My idea is to use memory in order to store missing registers..
For e.g. you'd code something like:
Code: Select all
mov [variable],0xFOODBABE
add [variable],3
Code: Select all
mov eax,0xFOODBABE
add eax,3
The generic code:hckr83 wrote:lets say you wanted to program on 32bit for the generic assembly, but wanted to compile to use a 16bit arch. For example sake, we'll use 8086
Code: Select all
;generic
mov dword r0,0xFFFF FFFE
add dword r0,1
sub dword r0,2
div dword r0,3
mul dword r0,4
Code: Select all
;produced
mov ax,0xFFFF
mov bx,0xFFFF
add ax,1
adc bx,0
sub ax,2
sbb bx,0
push cx
push dx
push ax
mov ax,bx ;ax = high word
mov dx,0 ;dx:ax = high word
mov cx,3
div cx ;ax = high word / 3, dx = remainder
pop ax ;ax = low word
add ax,dx
mov dx,0
adc dx,0 ;dx:ax = low word + remainder
div cx ;ax = (low word + remainder) / 3
pop dx
pop cx
push dx
push cx
push si
push di
mov cx,4
mul cx ;dx:ax = low word * 4
mov si,ax
mov di,dx ;di:si = low word * 4
mov ax,bx
mul cx ;dx:ax = high word * 4
mov bx,di
add bx,ax
mov ax,si ;bx:ax = ((high word * 4) << 16) + (low word * 4)
pop di
pop si
pop cx
pop dx
If the language had no registers at all, then you'd produce something like this:
Code: Select all
;produced
mov word [variable],0xFFFF
mov word [variable+2],0xFFFF
add word [variable],1
adc word [variable+2],0
sub word [variable],2
sbb word [variable+2],0
mov ax,[variable+2]
mov dx,0
mov cx,3
div cx ;ax = high word / 3, dx = remainder
mov [variable+2],ax
mov ax,[variable]
add ax,dx
mov dx,0
adc dx,0
div cx ;ax = (low word + remainder) / 3
mov [variable],ax
mov ax,[variable]
mov cx,4
mul cx ;dx:ax = low dword * 4
mov [variable],ax
mov ax,[variable+2]
mov [variable+2],dx
mul cx
add [variable+2],ax
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
hmmm...that'd be smart...
though optimizers are hard to make..lol
really though, what if you knew you wanted something stored in a register for speed, and though your not using it right hten, you will use it later..something no optimizer can detect...
hmm...what about a "register" flag or something you could put for a certain address of memory where "if possible put this in a register"
though optimizers are hard to make..lol
really though, what if you knew you wanted something stored in a register for speed, and though your not using it right hten, you will use it later..something no optimizer can detect...
hmm...what about a "register" flag or something you could put for a certain address of memory where "if possible put this in a register"
Hi,
Think of it like an excercise in caching, where CPU registers are used to cache the most frequently used variables. As soon as you're caching something that isn't the most frequently used, then you start to lose efficiency.
BTW you'd be surprised what an optimizer could detect. It'd recommend doing some research into common optimizations (e.g. see this wiki page) before designing your language, so you can decide which features of your language will make it easier or harder to optimize later.
One thing I've been considering is replacing complete functions with table lookups. For example, consider a function like this:
This could be optimized to:
Of course to do this type of thing you need to implement an interpretter to generate the table, and it'd be nice if you could tell the compiler the exact range of a variable (so that if "b" only ever contains a value from 0 to 3, the optimizer could generate a 1024 byte table instead of a full 65536 byte table).
Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later (e.g. between "tokenising" and "compiling", and between "compiling" and "assembling").
Cheers,
Brendan
Using registers is faster, and not using registers is slower. Putting something in a register now that you intend to use much later on means that the register can't be used to speed up all the code between "now" and "later on". In this case you could improve performance by pushing the register on the stack, using the register to speed things up, then popping it off the stack much later on.hckr83 wrote:really though, what if you knew you wanted something stored in a register for speed, and though your not using it right hten, you will use it later..something no optimizer can detect...
Think of it like an excercise in caching, where CPU registers are used to cache the most frequently used variables. As soon as you're caching something that isn't the most frequently used, then you start to lose efficiency.
BTW you'd be surprised what an optimizer could detect. It'd recommend doing some research into common optimizations (e.g. see this wiki page) before designing your language, so you can decide which features of your language will make it easier or harder to optimize later.
One thing I've been considering is replacing complete functions with table lookups. For example, consider a function like this:
Code: Select all
int myFunction(char a, char b) {
int result;
result = ((a * b) / (a + b)) * b;
result &= (a * b) / MY_CONSTANT;
return result;
}
Code: Select all
movzx eax,byte [a]
mov ah,[b]
mov eax,[__myFunction_table + eax * 4]
ret
If your optimizer/s are any good, the "register" flag should be considered a suggestion to the optimizer (and mostly ignored), just like it is in C. It's more important to have it's opposite ("volatile") which tells the optimizer that the variable can't be cached in a register (which the optimizer must not ignore).hckr83 wrote:hmm...what about a "register" flag or something you could put for a certain address of memory where "if possible put this in a register"
Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later (e.g. between "tokenising" and "compiling", and between "compiling" and "assembling").
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Making the tokenizer is what I can't do...I have tried like 10 times to make something convert a plain text file in a language convert to tokens...it just doesn't work for me...
hmm...
what about a registers only language?
I think it simplifies things a LOT if you only have to worry about registers or memory...kinda one or the other..
course, I don't think registers only could have many uses practically.. So probably memory only would be best..what about storing addresses in registers though? would that be best done by the optimizer?
another bit, what about having portable bits? like be able to choose between one amount of bits to make the code in, and then that can be compiled to any other amount of bits? course...this also brings up the problem of endianess..and also the way the computer stores memory(most significant byte first or least significant byte first)
I just don't see anyway to combat those problems without major speed penalties..
hmm...
what about a registers only language?
I think it simplifies things a LOT if you only have to worry about registers or memory...kinda one or the other..
course, I don't think registers only could have many uses practically.. So probably memory only would be best..what about storing addresses in registers though? would that be best done by the optimizer?
another bit, what about having portable bits? like be able to choose between one amount of bits to make the code in, and then that can be compiled to any other amount of bits? course...this also brings up the problem of endianess..and also the way the computer stores memory(most significant byte first or least significant byte first)
I just don't see anyway to combat those problems without major speed penalties..
Tokenizers aren't really that difficult. You could even use flex to do it for you.hckr83 wrote:Making the tokenizer is what I can't do...I have tried like 10 times to make something convert a plain text file in a language convert to tokens...it just doesn't work for me...
If you write it yourself, you might want to look into finite state automata's as they do help.
C8H10N4O2 | #446691 | Trust the nodes.
Hi,
First you need some output routines which would send the tokenized data somewhere. For now just send the tokens to the screen and worry about the details later:
Next you want some framework for your tokenizer. I'm going to assume it's a line-oriented language (like assembly) and your tokenizer is sent lines of ASCII by the code that reads the source file/s from disk:
Now you probably want to ignore any white space between things. That's easy enough:
Every language has numerical constants, so let's add them next:
Then there's identifiers (labels, keywords, etc):
You'd probably also want to handle operators, punctuation, etc. I'll only do a few:
Handling string constants would be a good idea. Also if your language has comments like assembly you could probably stop tokenizing when you get a comment character:
Lastly, there's keywords. At the moment a keyword would end up being tokenized as an identifier. If your compiler handles several languages (e.g. C and inline assembly, or NASM and AT&T) it's probably better to leave them for the parser to sort out. Otherwise, you could start adding them:
Most of the code above is fairly dodgy (untested, and intended as an example only), but it should be able to tokenize something like this:
Of course it'd also accept some gibberish like this:
That's OK - the next stage of your compiler can work out what is gibberish and what isn't.
There's also a lot missing from the example code. For e.g. you'd want to be able to handle larger integers and floating point constants (I store all numerical constants as arbitrary precision floating point). It's also a good idea to use an array of keywords and write a function that checks each entry in the array to see if a string is a keyword or not, instead of using a "strcasecmp" for each possible keyword (it's easier to maintain that way, and means you can use hash tables or something to improve performance later on). You'd also want to add a generic error handling function, so that the compiler will tell you which line in which file it didn't like.
After writing your tokenizer you'd want to replace those output routines. For this I tend to add the output to a buffer, and send the buffer to another thread when it gets full or when there's nothing left to tokenize (but that's just how I do things - most people would just collect it all in memory).
I'd also be tempted to write the reverse - some code to convert the tokens back into ASCII. This just makes it easier to debug your compiler later.
Cheers,
Brendan
A quick crash course....hckr83 wrote:Making the tokenizer is what I can't do...I have tried like 10 times to make something convert a plain text file in a language convert to tokens...it just doesn't work for me...
First you need some output routines which would send the tokenized data somewhere. For now just send the tokens to the screen and worry about the details later:
Code: Select all
void sendSimpleToken(unsigned short token) {
printf("%d\n", token);
}
void sendStringToken(unsigned short token, char *string) {
printf("%d '%s'\n", token, string);
}
void sendNumberToken(unsigned short token, long int number) {
printf("%d %d\n", token, number);
}
Code: Select all
void tokenize(char *input) {
while(*input != 0) {
printf("Syntax error!\n");
}
}
Code: Select all
void tokenize(char *input) {
while(*input != 0) {
if( (*input = ' ') || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
input++;
}
printf("Syntax error!\n");
}
}
Code: Select all
enum {
TOKEN_NUMBER,
}
void tokenize(char *input) {
long int number;
char *temp;
while(*input != 0) {
if( (*input = ' ') || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
input++;
} else {
number = strtol(input, &temp, 0);
if(*temp != input) {
sendNumberToken(TOKEN_NUMBER, number);
}
}
printf("Syntax error!\n");
}
}
Code: Select all
enum {
TOKEN_NUMBER,
TOKEN_IDENTIFIER,
}
void tokenize(char *input) {
long int number;
char *temp;
char identifier[256];
int i;
while(*input != 0) {
if( (*input = ' ') || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
input++;
} else {
number = strtol(input, &temp, 0);
if(temp != input) {
sendNumberToken(TOKEN_NUMBER, number);
input = temp;
} else {
i = 0;
while( (input[i] > 'A') && (input[i] < 'Z') ) {
identifier[i] = input[i];
i++;
}
if(i != 0) {
identifier[i] = 0;
sendStringToken(TOKEN_IDENTIFIER, identifier);
input += i;
}
}
}
printf("Syntax error!\n");
}
}
Code: Select all
enum {
TOKEN_NUMBER,
TOKEN_IDENTIFIER,
TOKEN_LEFTBRACKET,
TOKEN_RIGHTBRACKET,
TOKEN_MULTIPLY,
TOKEN_ADD,
}
void tokenize(char *input) {
long int number;
char *temp;
char identifier[256];
int i;
while(*input != 0) {
if( (*input = ' ') || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
input++;
} else if(*input == '*') {
sendSimpleToken(TOKEN_MULTIPLY);
} else if(*input == '+') {
sendSimpleToken(TOKEN_ADD);
} else if(*input == '(') {
sendSimpleToken(TOKEN_LEFTBRACKET);
} else if(*input == ')') {
sendSimpleToken(TOKEN_RIGHTBRACKET);
} else {
number = strtol(input, &temp, 0);
if(temp != input) {
sendNumberToken(TOKEN_NUMBER, number);
input = temp;
} else {
i = 0;
while( (input[i] > 'A') && (input[i] < 'Z') ) {
identifier[i] = input[i];
i++;
}
if(i != 0) {
identifier[i] = 0;
sendStringToken(TOKEN_IDENTIFIER, identifier);
input += i;
}
}
}
printf("Syntax error!\n");
}
}
Code: Select all
enum {
TOKEN_NUMBER,
TOKEN_IDENTIFIER,
TOKEN_STRING,
TOKEN_LEFTBRACKET,
TOKEN_RIGHTBRACKET,
TOKEN_MULTIPLY,
TOKEN_ADD,
}
void tokenize(char *input) {
long int number;
char *temp;
char identifier[256];
int i;
while(*input != 0) {
if( (*input = ' ') || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
input++;
} else if(*input == ';') {
return;
} else if(*input == '*') {
sendSimpleToken(TOKEN_MULTIPLY);
} else if(*input == '+') {
sendSimpleToken(TOKEN_ADD);
} else if(*input == '(') {
sendSimpleToken(TOKEN_LEFTBRACKET);
} else if(*input == ')') {
sendSimpleToken(TOKEN_RIGHTBRACKET);
} else if(*input == '"') {
i = 1;
while( (input[i] != '"') && (input[i] != 0) {
i++;
}
if(input[i] == 0) {
printf("Hey - there's no right quote character on that string!\n");
exit(EXIT_FAILURE);
}
input[i] = 0;
sendStringToken(TOKEN_STRING, input);
input += i + 1;
} else {
number = strtol(input, &temp, 0);
if(temp != input) {
sendNumberToken(TOKEN_NUMBER, number);
input = temp;
} else {
i = 0;
while( (input[i] > 'A') && (input[i] < 'Z') ) {
identifier[i] = input[i];
i++;
}
if(i != 0) {
identifier[i] = 0;
sendStringToken(TOKEN_IDENTIFIER, identifier);
input += i;
}
}
}
printf("Syntax error!\n");
exit(EXIT_FAILURE);
}
}
Code: Select all
enum {
TOKEN_NUMBER,
TOKEN_IDENTIFIER,
TOKEN_STRING,
TOKEN_LEFTBRACKET,
TOKEN_RIGHTBRACKET,
TOKEN_MULTIPLY,
TOKEN_ADD,
TOKEN_FOO_KEYWORD,
TOKEN_BAR_KEYWORD,
}
void tokenize(char *input) {
long int number;
char *temp;
char identifier[256];
int i;
while(*input != 0) {
if( (*input = ' ') || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
input++;
} else if(*input == ';') {
return;
} else if(*input == '*') {
sendSimpleToken(TOKEN_MULTIPLY);
} else if(*input == '+') {
sendSimpleToken(TOKEN_ADD);
} else if(*input == '(') {
sendSimpleToken(TOKEN_LEFTBRACKET);
} else if(*input == ')') {
sendSimpleToken(TOKEN_RIGHTBRACKET);
} else if(*input == '"') {
i = 1;
while( (input[i] != '"') && (input[i] != 0) {
i++;
}
if(input[i] == 0) {
printf("Hey - there's no right quote character on that string!\n");
exit(EXIT_FAILURE);
}
input[i] = 0;
sendStringToken(TOKEN_STRING, input);
input += i + 1;
} else {
number = strtol(input, &temp, 0);
if(temp != input) {
sendNumberToken(TOKEN_NUMBER, number);
input = temp;
} else {
i = 0;
while( (input[i] > 'A') && (input[i] < 'Z') ) {
identifier[i] = input[i];
i++;
}
if(i != 0) {
identifier[i] = 0;
if(strcasecmp(identifier, "FOO") == 0) {
sendSimpleToken(TOKEN_FOO_KEYWORD);
} else if(strcasecmp(identifier, "BAR") == 0) {
sendSimpleToken(TOKEN_BAR_KEYWORD);
} else {
sendStringToken(TOKEN_IDENTIFIER, identifier);
}
input += i;
}
}
}
printf("Syntax error!\n");
exit(EXIT_FAILURE);
}
}
Code: Select all
; A comment
FOO 1 * (3 + 4)
BAR "Hello World"
Code: Select all
(FOO BAR) FOO + BAR "Hello" * 3
There's also a lot missing from the example code. For e.g. you'd want to be able to handle larger integers and floating point constants (I store all numerical constants as arbitrary precision floating point). It's also a good idea to use an array of keywords and write a function that checks each entry in the array to see if a string is a keyword or not, instead of using a "strcasecmp" for each possible keyword (it's easier to maintain that way, and means you can use hash tables or something to improve performance later on). You'd also want to add a generic error handling function, so that the compiler will tell you which line in which file it didn't like.
After writing your tokenizer you'd want to replace those output routines. For this I tend to add the output to a buffer, and send the buffer to another thread when it gets full or when there's nothing left to tokenize (but that's just how I do things - most people would just collect it all in memory).
I'd also be tempted to write the reverse - some code to convert the tokens back into ASCII. This just makes it easier to debug your compiler later.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
I completely agree with the premise, but I say that the conclusion logically contradicts the premise.Brendan wrote:Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later
If you have something like the old Whitesmith C compiler that will generate 100 lines of asm from a particular source code -- and GCC that will emit 500 lines of asm for the same code -- and if you ignore optimization completely and get 3000 lines of asm for the same code out of yours -- where an asm hacker can do it all in 40 lines of asm ...
you will NEVER significantly compress the code down later with an optimizer (because optimizing really is damned hard!), for any of the inefficient compilers. The one that will optimize best, later, is the one that was the best optimized to begin with.
And every piece of code that you compile with your inefficient compiler will be inefficient -- bogging down any system you ever run that code on.
Developer time is NOT always the most important factor. There are two pieces of software that are the fulcrum of the entire system -- the kernel, and the compiler. Spending the extra time to polish both of those foundation stones into slick shiny jewels will produce endless rewards, later.
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
umm... ok. I agree with what you are saying, but not in the context of this thread and the thread's original post's author. You have to slow down and realize that the guy is actually trying to learn how to write a compiler not revolutionize the world with a multi-million dollar compiler, which I think is the point Brendan had in mind. Notice he said he was having trouble learning to parse syntax of some language I think that is grounds for him to think and start small. Since as you know every prototype gets rebuilt.bewing wrote:I completely agree with the premise, but I say that the conclusion logically contradicts the premise.Brendan wrote:Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later
If you have something like the old Whitesmith C compiler that will generate 100 lines of asm from a particular source code -- and GCC that will emit 500 lines of asm for the same code -- and if you ignore optimization completely and get 3000 lines of asm for the same code out of yours -- where an asm hacker can do it all in 40 lines of asm ...
you will NEVER significantly compress the code down later with an optimizer (because optimizing really is damned hard!), for any of the inefficient compilers. The one that will optimize best, later, is the one that was the best optimized to begin with.
And every piece of code that you compile with your inefficient compiler will be inefficient -- bogging down any system you ever run that code on.
Developer time is NOT always the most important factor. There are two pieces of software that are the fulcrum of the entire system -- the kernel, and the compiler. Spending the extra time to polish both of those foundation stones into slick shiny jewels will produce endless rewards, later.
...but I say that the conclusion logically contradicts the premise... if we were writing a production compiler instead of for a educational factor.
You are correct, and my post was not as well phrased as it could have been. *My* OS is dead-serious, and sometimes I have trouble remembering that many/most people here are doing this for fun or as a hobby. My post was also in reply to several intermediate posts in this thread that I did not bother going back to quote.Kevin McGuire wrote: umm... ok. I agree with what you are saying, but not in the context of this thread and the thread's original post's author. You have to slow down and realize that the guy is actually trying to learn how to write a compiler not revolutionize the world with a multi-million dollar compiler, which I think is the point Brendan had in mind. Notice he said he was having trouble learning to parse syntax of some language I think that is grounds for him to think and start small. Since as you know every prototype gets rebuilt.
Hi,
For most high level languages you'd preprocess and tokenize, convert to some sort of intermediate language, optimize (dead code elimination, loop optimizations, function inlining, etc), compile from intermediate language to intermediate assembly, optimize again (peephole, cache management, instruction scheduling), then assemble, and output a file.
Ignoring preprocessing and all the optimizing stages, this only leaves optimizing register allocation and (depending on how good your peephole optimizer is) some instruction selection in the "compile from intermediate language to intermediate assembly" stage.
Of course you'd want to add the preprocessing and optimizing stages later, but until the other stages are working right they just get in the way and make things harder IMHO....
Cheers,
Brendan
Even in a more general context I'm not sure I agree...bewing wrote:You are correct, and my post was not as well phrased as it could have been. *My* OS is dead-serious, and sometimes I have trouble remembering that many/most people here are doing this for fun or as a hobby. My post was also in reply to several intermediate posts in this thread that I did not bother going back to quote.Kevin McGuire wrote: umm... ok. I agree with what you are saying, but not in the context of this thread and the thread's original post's author. You have to slow down and realize that the guy is actually trying to learn how to write a compiler not revolutionize the world with a multi-million dollar compiler, which I think is the point Brendan had in mind. Notice he said he was having trouble learning to parse syntax of some language I think that is grounds for him to think and start small. Since as you know every prototype gets rebuilt.
For most high level languages you'd preprocess and tokenize, convert to some sort of intermediate language, optimize (dead code elimination, loop optimizations, function inlining, etc), compile from intermediate language to intermediate assembly, optimize again (peephole, cache management, instruction scheduling), then assemble, and output a file.
Ignoring preprocessing and all the optimizing stages, this only leaves optimizing register allocation and (depending on how good your peephole optimizer is) some instruction selection in the "compile from intermediate language to intermediate assembly" stage.
Of course you'd want to add the preprocessing and optimizing stages later, but until the other stages are working right they just get in the way and make things harder IMHO....
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 566
- Joined: Tue Jun 20, 2006 9:17 am
Breif guide on tokeninzing
We have system software lab......., I have done a fair bit of reaseach on tokenizing ....... , Here are the few methods that will help you in tokenzing
1) Simplest and the dirtiest :
make use of the function str_tok function available in the C - libraty
(This is not recomended ..check the man pages.....)
2) Make an adhoc lexer : Something which i initially did.......
like this (pseudo code ..not entirely correct)
char token[80];
char *exp_ptr=token;
while(!end_of_file)
{
ch=fgetc(inp_file);
if(ch==' ')ch=fgetc(inp_file); // skip white space.....
*exp_ptr=ch;
while(ch!=' ')
{
exp_ptr++;
*exp_ptr=ch;
}
*exp_ptr='\0';
return token;
}
3) Much better method is to hard code a DFA (deterministic finite automata) table into your lexer.....This is cleaner and better than the
earler methods.there are many ways of doing this
one way is to use a array for implementing the transistion function,
(see adrew s appel )or else you can again hardcode the transistion function also...
4) Easier way is to use lex ... which will automatically generate and implement a DFA (NFA) for you .......It s used like this
{%
/*put declarations here*/
%}
%%
/*
eg
a(a|b)* { printf("ab string");}
/* put rules here*/
%%
/* other functions*/
I recommend method three coz ... i am wriing a compiler to lean not to
outperform gcc or microsoft.... My compiler is not yet compleate...In fact
it's not even begun well.....:)Gee ..it works sporadically......It uses method 2... Yea i am pragmatic guy...
1) Simplest and the dirtiest :
make use of the function str_tok function available in the C - libraty
(This is not recomended ..check the man pages.....)
2) Make an adhoc lexer : Something which i initially did.......
like this (pseudo code ..not entirely correct)
char token[80];
char *exp_ptr=token;
while(!end_of_file)
{
ch=fgetc(inp_file);
if(ch==' ')ch=fgetc(inp_file); // skip white space.....
*exp_ptr=ch;
while(ch!=' ')
{
exp_ptr++;
*exp_ptr=ch;
}
*exp_ptr='\0';
return token;
}
3) Much better method is to hard code a DFA (deterministic finite automata) table into your lexer.....This is cleaner and better than the
earler methods.there are many ways of doing this
one way is to use a array for implementing the transistion function,
(see adrew s appel )or else you can again hardcode the transistion function also...
4) Easier way is to use lex ... which will automatically generate and implement a DFA (NFA) for you .......It s used like this
{%
/*put declarations here*/
%}
%%
/*
eg
a(a|b)* { printf("ab string");}
/* put rules here*/
%%
/* other functions*/
I recommend method three coz ... i am wriing a compiler to lean not to
outperform gcc or microsoft.... My compiler is not yet compleate...In fact
it's not even begun well.....:)Gee ..it works sporadically......It uses method 2... Yea i am pragmatic guy...
-
- Member
- Posts: 566
- Joined: Tue Jun 20, 2006 9:17 am
Slight correction....
It's actually strtok not str_tok() ....also plse forgive my spell errors and
other grammar mistakes(I apologize for my ambigous grammar !!! ,
which makes parsing difficult ..... )....
other grammar mistakes(I apologize for my ambigous grammar !!! ,
which makes parsing difficult ..... )....
-
- Member
- Posts: 566
- Joined: Tue Jun 20, 2006 9:17 am
Hi Guys..................
<Stupidity Deleted >
Last edited by DeletedAccount on Sat Apr 11, 2009 7:13 am, edited 1 time in total.