A good maths parser

Programming, for all ages and all languages.
User avatar
Karlosoft
Member
Member
Posts: 277
Joined: Thu Feb 14, 2008 10:46 am
Location: Italy
Contact:

A good maths parser

Post by Karlosoft »

Hi, I'm looking for a good parser of math expressions. I have to use it in my assembler. It should be light and fast, I just have to solve simple expressions as 2*(4+2^5). I could write it but so I will spend a lot of time, and I've a lots of other (more important) things to do. Does anyone know one?
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: A good maths parser

Post by Thomas »

Hi ,
its simple to write a simple expression parser. It would be something like

E -> T ( "+"|" - " T) *
T -> F ( "*"| "/" F ) *
F -> number
E -> ( E ) | nothing

I don't follow rules :mrgreen: , * = Kleene Closure , | = or , quotes denote terminals , nothing = epsilon or lamda in some textbooks.It would translate into something like

Code: Select all

void  parse_expression() 
{
    
  token = get_token() ;  
  if ( token == "(" ) 
  {
      parse_expression() ; 
      verify(next_token == ") ") ; 
      token = get_token() ; 
  }
  else 
  {
    parse_term() ; 
    while ( next_token == "+" || next_token == "-" ) 
    {
      token = get_token() ; 
       parse_term() ; 
    } 
  }
}

void parse_term()
{
   parse_factor() ; 
   while ( next_token == "*" || next_token == "/" ) 
   {
      token = get_token() ; 
       parse_factor() ; 
   } 
}
void parse_factor()
{
  /* verify that current token is a valid number  */ 
}


The above pseudo code shows how it is done using recursive descent parsing.Other simple way to parse expressions is to use the method of operator precedence.Its very similar to Dijikstra's shunting yard algorithm. Its similar to creating a postfix version of the expression and then evaluating it using a stack.I will post later on operator precedence parsing :D

-- Thomas
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: A good maths parser

Post by Thomas »

You may use this source as reference

Code: Select all

/* the token id’s*/
#define TOKEN_END_OF_INPUT 100
#define TOKEN_NUMBER 101
#define TOKEN_VARIABLE 102
#define TOKEN_EQUAL 103
#define TOKEN_LESS 104
#define TOKEN_GREATER 105
#define TOKEN_OR 106
#define TOKEN_AND 107
#define TOKEN_PLUS 108
#define TOKEN_MINUS 109
#define TOKEN_COMMA 110
#define TOKEN_SEMI 111
#define TOKEN_MUL 112
#define TOKEN_DIV 113
#define TOKEN_LP 114
#define TOKEN_RP 115
#define TOKEN_MOD 116
#define TOKEN_CR 129

#define TOKEN_LET 117
#define TOKEN_PRINT 118
#define TOKEN_IF 119
#define TOKEN_THEN 120
#define TOKEN_ELSE 121
#define TOKEN_FOR 122
#define TOKEN_NEXT 123
#define TOKEN_GOTO 124
#define TOKEN_GOSUB 125
#define TOKEN_RETURN 126
#define TOKEN_CALL 127
#define TOKEN_STRING 128
#define TOKEN_INPUT 130
#define TOKEN_TO 131
#define TOKEN_REM 132
#define TOKEN_END 133

/* program restrictions*/
#define MAX_NUMBER_LEN 10
#define MAX_VAR_LEN 20
#define MAX_TOKEN_LEN 200

using namespace std;
/* main data structures used by the intrepreter*/
struct Token
{
char token[MAX_TOKEN_LEN];
int type;
};
struct for_state
{
string line_start;
string for_var;
int to;
};
list source_file;
list :: iterator source_ptr;
stack for_stack;
map varTab;
stack gosub_stack;

/* functions related to lexical analysis*/
void open_source_file(char *filename);
void init_tokenizer();
void display_source_file();
void print_source_file();
Token get_token();
void quit_error(char *msg);
Token current_token;
void match(int type,bool &correct);

/*function related to parsing*/
int factor();
int term();
int expression();
void print_statement();
void let_statement();
void input_statement();
void rem_statement();
void if_statement();
void next_statement();
void goto_statement();
void gosub_statement();
void return_statement();
void jump_to_line(char *lineno);
void for_statement();
void end_statement();
void line();

int main(int argc, char *argv[])
{

if(argc != 2 )
{
cout<<”\n ThomasBasic Intrepreter : Usage “<<argv[0]<<” “;
exit(EXIT_FAILURE);
}
open_source_file(argv[1]);
// (yea for debugging !)display_source_file();
init_tokenizer();
/*
for the production
program -> (line)*

*/
while(current_token.type != TOKEN_END_OF_INPUT )
{

line();

}

return EXIT_SUCCESS;
}
/* initilizes the tokenizer*/
void init_tokenizer()
{
source_ptr = source_file.begin();
current_token= get_token();

}

/*opens the input file and saves it into a linked list*/
void open_source_file(char *filename)
{
FILE *input_file = fopen(filename,”r”);
assert(input_file);
char ch;
/* first condition is needed so that we do not inset the end of
file marker
*/
while(((ch = fgetc(input_file))!= NULL)&& !feof(input_file))
{
source_file.push_back(tolower(ch));

}
/* insert final delimiter*/
source_file.push_back(”);

}
/* for debugging display the inputted source file*/
void display_source_file()
{
list ::iterator p = source_file.begin();
while(p != source_file.end())
{
cout<<*p;
p++;

}
cout<<”\n”;
}
/* skip white space*/
void skip_blanks()
{
while((*source_ptr != ”) && ((*source_ptr == ‘ ‘)|| (*source_ptr == ‘\r’))) source_ptr++;
}

/* ah hoc lexer , gets a token from the stream and identifies it*/
Token get_token()
{
Token tok;
char *tok_ptr = tok.token;
int num_len = 0;
int var_len = 0;
int const_str_len =0;
skip_blanks();
if( *source_ptr == ”)
{
*tok_ptr =”;
tok.type=TOKEN_END_OF_INPUT;
}
else if (*source_ptr == ‘+’)
{
*tok_ptr++ = ‘+’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_PLUS;
}
else if (*source_ptr == ‘-’)
{
*tok_ptr++ = ‘-’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_MINUS;
}
else if (*source_ptr == ‘<’)
{
*tok_ptr++ = ”)
{
*tok_ptr++ = ‘>’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_GREATER;
}
else if (*source_ptr == ‘=’)
{
*tok_ptr++ = ‘=’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_EQUAL;
}
else if (*source_ptr == ‘*’)
{
*tok_ptr++ = ‘*’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_MUL;
}
else if (*source_ptr == ‘/’)
{
*tok_ptr++ = ‘/’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_DIV;
}
else if (*source_ptr == ‘(‘)
{
*tok_ptr++ = ‘(‘;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_LP;
}
else if (*source_ptr == ‘)’)
{
*tok_ptr++ = ‘)’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_RP;
}
else if (*source_ptr == ‘,’)
{
*tok_ptr++ = ‘,’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_COMMA;
}
else if (*source_ptr == ‘;’)
{
*tok_ptr++ = ‘;’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_SEMI;
}
else if (*source_ptr == ‘%’)
{
*tok_ptr++ = ‘%’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_MOD;
}
else if (*source_ptr == ‘&’)
{
*tok_ptr++ = ‘&’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_AND;
}
else if (*source_ptr == ‘|’)
{
*tok_ptr++ = ‘|’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_OR;
}
else if (*source_ptr == ‘\n’)
{
*tok_ptr++ = ‘\n’;
*tok_ptr = ”;
*source_ptr++;
tok.type = TOKEN_CR;
}
else if(isdigit(*source_ptr))
{
while(isdigit(*source_ptr) && (num_len < MAX_NUMBER_LEN+1 ))
{
++num_len;
*tok_ptr++ = *source_ptr++;
}
if(num_len == MAX_NUMBER_LEN+1 )
{
quit_error(“Number length exceeded”);
}
else
{
tok.type = TOKEN_NUMBER;
*tok_ptr = ”;
}
}
else if((*source_ptr == ‘”‘) )
{
*tok_ptr++ = *source_ptr++;
while((*source_ptr != ‘”‘)&& (const_str_len < MAX_TOKEN_LEN-2))
{
*tok_ptr++ = *source_ptr++;
++const_str_len;
}
if(const_str_len == MAX_TOKEN_LEN-1)
{
quit_error(“constant string too long”);
}
else
{
*tok_ptr++= *source_ptr++;
*tok_ptr = ”;

tok.type = TOKEN_STRING;
}
}
else if(isalpha(*source_ptr))
{
while(isalpha(*source_ptr) && (var_len < MAX_VAR_LEN))
{
++var_len;
*tok_ptr++ = *source_ptr++;
}
if( var_len == MAX_VAR_LEN)
{
quit_error(“Token name length exceeded!”);

}
else
{

*tok_ptr = ”;
if(strcmp(tok.token,”print”) == 0 )tok.type = TOKEN_PRINT;
else if(strcmp(tok.token,”let”)==0)tok.type = TOKEN_LET;
else if(strcmp(tok.token,”if”)==0)tok.type = TOKEN_IF;
else if(strcmp(tok.token,”goto”)==0)tok.type = TOKEN_GOTO;
else if(strcmp(tok.token,”gosub”)==0)tok.type = TOKEN_GOSUB;
else if(strcmp(tok.token,”then”)==0)tok.type = TOKEN_THEN;
else if(strcmp(tok.token,”for”)==0)tok.type = TOKEN_FOR;
else if(strcmp(tok.token,”next”)==0)tok.type = TOKEN_NEXT;
else if(strcmp(tok.token,”return”)==0)tok.type = TOKEN_RETURN;
else if(strcmp(tok.token,”else”)==0)tok.type = TOKEN_ELSE;
else if(strcmp(tok.token,”input”)==0)tok.type = TOKEN_INPUT;
else if(strcmp(tok.token,”to”)==0)tok.type = TOKEN_TO;
else if(strcmp(tok.token,”rem”)==0)tok.type = TOKEN_REM;
else if(strcmp(tok.token,”end”)==0)tok.type = TOKEN_END;
else tok.type= TOKEN_VARIABLE;
skip_blanks();
}
}
else
{
quit_error(“Unidentified token!”);
}
return tok;
}
/* display message and quit*/
void quit_error(char *msg)
{
cout<– Error!–”<<msg<<”–<”<<”\n”;
exit(EXIT_FAILURE);
}

/* match the input stream for the given token and advance to next token*/
void match(int type,bool &correct)
{

if(current_token.type == type ) correct = true;
else correct = false;
if( correct) current_token = get_token();

}
/* look up the variable from the variable table*/
int lookup_var(string varname)
{
map :: iterator p = varTab.begin();
if( (p = varTab.find(varname))!= varTab.end())
return varTab[varname];
else
quit_error(“Variable not found!”);
}
/*
for the prodcution
factor -> number | “(” expression”)”|variable
*/
int factor(void)
{
int t;
bool correct;
switch(current_token.type)
{
case TOKEN_NUMBER:
t = atoi(current_token.token);
match(TOKEN_NUMBER,correct);
if(!correct)quit_error(“Expected Number!”);
break;
case TOKEN_LP:
match(TOKEN_LP,correct);
if(!correct)quit_error(“Expected (“);
t = expression();
match(TOKEN_RP,correct);
if(!correct)quit_error(“Expected )”);
break;

case TOKEN_VARIABLE:
t = lookup_var(string(current_token.token));
match(TOKEN_VARIABLE,correct);
if(!correct)quit_error(“Expected Variable”);
break;
default:
quit_error(“Factor Error!”);
break;
}
return t;
}

/*
for the production
term ->(factor (“*” | ” / ” | “%” ) )* factor
*/
int term()
{
int t1,t2,oper;
t1 = factor();

oper = current_token.type;

while((oper == TOKEN_MUL) || (oper == TOKEN_DIV) || (oper == TOKEN_MOD))
{
current_token = get_token();

t2 = factor();

switch(oper)
{
case TOKEN_MUL:
t1 *= t2;
break;
case TOKEN_DIV:
t1 /= t2;
break;
case TOKEN_MOD:
t1 %= t2;
break;
}
oper = current_token.type;
}
return t1;
}
/*
for the production
expression -> (term ( “+” | “-”" | “&” | “|” | ))* term
*/
int expression()
{
int t1,t2,oper;
t1 = term();
oper = current_token.type;
while ((oper == TOKEN_PLUS) || (oper == TOKEN_MINUS) || (oper == TOKEN_AND) || (oper == TOKEN_OR))
{
current_token = get_token();
t2 = term();
switch(oper)
{
case TOKEN_PLUS:
t1+=t2;
break;
case TOKEN_MINUS:
t1 -=t2;
break;
case TOKEN_AND:
t1 &= t2;
break;
case TOKEN_OR:
t1 |= t2;
break;
}
oper = current_token.type;
}
return t1;

}
/*
for the production
relational -> expression ( ” ” | “=” ) expression
*/
int relational()
{
int t1,t2,oper;
t1 = expression();
oper = current_token.type;
while( (oper == TOKEN_LESS) || (oper == TOKEN_GREATER) || (oper == TOKEN_EQUAL))
{
current_token = get_token();
t2 = expression();
switch(oper)
{
case TOKEN_LESS:
t1 = t1 t2;
break;
case TOKEN_EQUAL:
t1 = t1 == t2;
break;
}
oper = current_token.type;
}
return t1;
}
/*
for the production
statement – > ifstatement | rem_statement | input_statement | let_statement | for_statemet
statement – > goto_statement | gosub_statement | return_statement | if_statement |end_satement
statemet – > next_statement | print_statement
*/
void statement()
{
switch(current_token.type)
{
case TOKEN_PRINT:
print_statement();
break;

case TOKEN_REM:
rem_statement();
break;
case TOKEN_INPUT:
input_statement();
break;
case TOKEN_LET:
let_statement();
break;
case TOKEN_FOR:
for_statement();
break;
case TOKEN_NEXT:
next_statement();
break;
case TOKEN_GOTO:
goto_statement();
break;
case TOKEN_GOSUB:
gosub_statement();
break;
case TOKEN_RETURN:
return_statement();
break;
case TOKEN_IF:
if_statement();
break;
case TOKEN_END:
end_statement();
break;
default:
quit_error(current_token.token);
}
}
/*
this is a bad practice … this should have been done by the lexer.
for the production
rem_statemet – > “rem” any_number_of_crap “\n”
*/
void rem_statement()
{
bool correct;
match(TOKEN_REM,correct);
if(!correct)quit_error(“expected rem”);
while((current_token.type != TOKEN_CR) && (current_token.type != TOKEN_END_OF_INPUT))
{
current_token = get_token();
}
if(current_token.type != TOKEN_END_OF_INPUT ) current_token = get_token();
}
/*
for the prodution

print_statement -> “print ” ( expression ( “,” | nothing ))* expression “\n”
*/

void print_statement()
{
bool correct;
match(TOKEN_PRINT,correct);
if(!correct)quit_error(“Expected print”);
do
{
if(current_token.type == TOKEN_STRING)
{
char *tok_ptr = current_token.token;
tok_ptr++;
while((*tok_ptr != ‘”‘) && (*tok_ptr != ”))
cout<<*tok_ptr++;
current_token = get_token();
}
else if(current_token.type == TOKEN_COMMA)
{
cout<<” “;
current_token = get_token();
}
else if(current_token.type == TOKEN_SEMI )
{
current_token = get_token();
}
else if((current_token.type == TOKEN_VARIABLE) ||(current_token.type == TOKEN_NUMBER) || (current_token.type == TOKEN_LP ))
{
cout<<expression();
}
else
{
break;
}
}while((current_token.type != TOKEN_CR) && (current_token.type != TOKEN_END_OF_INPUT));
cout< “let” variable “=” expression “\n”
*/
void let_statement()
{
bool correct;
string var_name;

match(TOKEN_LET,correct);
if(!correct)quit_error(“Expected let”);

if(current_token.type == TOKEN_VARIABLE)
{
var_name = string(current_token.token);
match(TOKEN_VARIABLE,correct);
if(!correct)quit_error(“Expected variable”);
}

match(TOKEN_EQUAL,correct);
if(!correct)quit_error(“Expected =”);
varTab[var_name] = expression();
if( (current_token.type == TOKEN_CR))
{
current_token = get_token();
}
else if ( (current_token.type != TOKEN_END_OF_INPUT))
{
quit_error(“Stray inputs after let”);
}

}
/*
for the production
input_statement -> “input” variable “\n”
*/

void input_statement()
{
bool correct;
int value;
match(TOKEN_INPUT,correct);
if(!correct)quit_error(“Expected input”);
if(current_token.type == TOKEN_VARIABLE)
{
fflush(stdin);
string var_name = string(current_token.token);
cin>>value;
varTab[var_name] = value;
current_token = get_token();
}
else
{
quit_error(“Expected variable!”);
}
if(current_token.type == TOKEN_CR ) current_token = get_token();
else if(current_token.type != TOKEN_END_OF_INPUT) quit_error(“Expected CR”);
}

/*
jump to the given line number
*/

void jump_to_line(char *line_no)
{
init_tokenizer();
while(strcmp(current_token.token,line_no) != 0)
{
do
{
do
{
current_token = get_token();
}while((current_token.type != TOKEN_CR) && (current_token.type) != TOKEN_END_OF_INPUT);
if(current_token.type == TOKEN_CR) current_token = get_token();

}while(current_token.type != TOKEN_NUMBER);
}
}
/*
goto_statement -> “goto” linenumber
*/
void goto_statement()
{
bool correct;
char temp_token[80];
match(TOKEN_GOTO,correct);
if(current_token.type != TOKEN_NUMBER)
{
quit_error(“Expected number!”);
}
else
{
strcpy(temp_token,current_token.token);
jump_to_line(temp_token);
}
}
/*
forstatemet -> “for” variable “=” expression “to” expression “\n”
*/
void for_statement()
{
int to;
bool correct;
string var_name;
for_state this_for_state;
match(TOKEN_FOR,correct);
if(!correct)quit_error(“Expected for in for!”);
if(current_token.type == TOKEN_VARIABLE)
{
var_name = string(current_token.token);
current_token = get_token();
}
else
{
quit_error(“Expected variable in for”);
}
match(TOKEN_EQUAL ,correct);
if(!correct)quit_error(“Expected =”);
varTab[var_name] = expression();
match(TOKEN_TO,correct);
if(!correct)quit_error(“Expected to in for”);
to = expression();
match(TOKEN_CR,correct);
if(!correct)quit_error(“Expected CR”);

if(current_token.type != TOKEN_NUMBER)
quit_error(“Expected number after for “);
this_for_state.line_start = string(current_token.token);
this_for_state.for_var = var_name;
this_for_state.to = to;
for_stack.push(this_for_state);
}

/*
if_statement -> “if” relational then statement B
B -> “\n” | “else” statement “\n”
*/
void if_statement()
{
int condition;
bool correct;
match(TOKEN_IF,correct);
if(!correct)quit_error(“Expected if”);
condition = relational();
match(TOKEN_THEN,correct);
if(!correct)quit_error(“Expected then in if”);
if(condition)
{
statement();
}
else
{
do
{
current_token = get_token();
}while((current_token.type != TOKEN_CR) && (current_token.type != TOKEN_END_OF_INPUT) && (current_token.type != TOKEN_ELSE));
if(current_token.type == TOKEN_ELSE)
{
current_token = get_token();
statement();
}
else if(current_token.type == TOKEN_CR)
{
current_token = get_token();
}
}

}
/*
next_statment -> “next” variable “\n”
*/
void next_statement()
{
int var;
bool correct;
for_state current_state;
match(TOKEN_NEXT,correct);
if(!correct)quit_error(“Expected next”);
if(current_token.type == TOKEN_VARIABLE)
{
current_state = for_stack.top();
if((current_state.for_var.compare(current_token.token) == 0 ) && (!for_stack.empty()))
{
var = lookup_var(string(current_token.token));
var++;
varTab[string(current_token.token)] = var;
current_token = get_token();
if( var “gosub” line no
*/

void gosub_statement()
{
bool correct;
string var_name;
match(TOKEN_GOSUB,correct);
if(!correct)quit_error(“expected gosub”);
if(current_token.type == TOKEN_NUMBER)
{
var_name = string(current_token.token);
gosub_stack.push(var_name);
jump_to_line((char*)var_name.c_str());

}
else
{
quit_error(“Expected number here”);
}
}
/*
return -> “return”
*/
void return_statement()
{
bool correct;
match(TOKEN_RETURN,correct);
if(!correct)quit_error(“expected return”);
if(!gosub_stack.empty())
{
string to_jump = gosub_stack.top();
gosub_stack.pop();
jump_to_line((char *)to_jump.c_str());
}
}
/*
end_statement -> “end”
*/
void end_statement()
{
bool correct;
match(TOKEN_END,correct);
if(!correct)quit_error(“expected END”);
else
exit(EXIT_SUCCESS);
}
/*
line -> number statement
*/
void line()
{

bool correct;
match(TOKEN_NUMBER,correct);
if(!correct)quit_error(“expected line number!”);
statement();
}
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: A good maths parser

Post by NickJohnson »

You can transform it into postfix notation using the Shunting-Yard algorithm, then run it using a stack like this. I wouldn't expect it to take much more than 50 lines of C.
User avatar
qw
Member
Member
Posts: 792
Joined: Mon Jan 26, 2009 2:48 am

Re: A good maths parser

Post by qw »

Jack Crenshaw's "Let's build a compiler" tutorials are built around an expression evaluator.
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: A good maths parser

Post by Thomas »

Hi ,

Operator Precedence Parsing in Brief
There are two things to be considered while writing an operator precedence grammar . They are
  • A non-terminal should never yield a nothing,that is lamda or epsilon should never occur on the right hand side .eg there should not be anything like A -> lamda
  • Two non-terminals should not be adjacent in any production , eg there should not be a production of the form S -> A B , where A and B are non terminals
.However Operator precedence parsing is quite nifty and easily implemented.

Lets assume that there exists a precedence table , Precd , it takes two terminals as input and returns the precedence relationship,precd ( a , b) returns < , if a has lesser precedence than b,= if precedence is equal ,> for greater . { note that precedence table also includes the end of input symbol }
The algorithm is as follows.

Code: Select all

Assumptions : input_pointer,points to input string,oper_stack  is the operator stack (initially having just end_of_input symbol in it ) 
while( *input_pointer != end_of_input ) 
{ 
 if ( oper_stack == end_of_input and  input_pointer == end_of_input) 
 {
   /* done parsing , Success ! */
 }
 else
 {
     a = oper_stack.top() ; 
     b = *input_pointer ; 
    if ( precd(a ,b)  == '=' || precd(a,b) == '<' ) 
    {
       oper_stack.push(b) ; 
       input_pointer++ ;
    }
    else if ( precd(a,b) == '>') 
    {
        do 
        {
           oper_stack.pop() ;
           a = oper_stack.top();
        }while(precd(a,b) == '<' );
    }
    else
    {
       // Error ! 
    }
  }
}
This just shows the parsing, you need to write code for different actions in this skeletal code.Please let me know in case of any trouble.
--Thomas
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: A good maths parser

Post by Brendan »

Hi,

Probably the hardest part of writing a math parser is getting it right. For example, consider "((4/3)-1)*3" - a simple parser (using floating point) might say the answer is 0.999999999998 (for e.g.), which can cause problems later (especially if the result is converted to an integer). Worse, some math parses use integers internally (where 4/3 = 1). NASM is one example where the final answer it gives is 0 (but it's certainly not the only one).

To be honest, computers really are bad at (precise) maths. Floating point always has rounding errors (larger floating point formats with more significant digits are more precise but the answers are still wrong); and for complex/lengthy calculations these errors are compounded.

The only way to get it right is to use relational numbers (e.g. where you keep track of a numerator, a divisor and an exponent). The problem is that to implement relational numbers you need to write your own operators (code to add, subtract, multiply, divide, etc) which takes much longer than writing the math parser itself.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Thomas
Member
Member
Posts: 281
Joined: Thu Jun 04, 2009 11:12 pm

Re: A good maths parser

Post by Thomas »

Hi,
I would like to add that the set of real numbers is not recusively enumerable,no matter what you do you are never going to get 100% accuracy !

--Thomas
User avatar
thepowersgang
Member
Member
Posts: 734
Joined: Tue Dec 25, 2007 6:03 am
Libera.chat IRC: thePowersGang
Location: Perth, Western Australia
Contact:

Re: A good maths parser

Post by thepowersgang »

Of course you aren't, but you want to be able to have a correct result for something that should return an integer value.

E.g. 1/3*3 = 1, but if you use floats it comes out as 0.999984741 (using a 16-bit fraction)
Kernel Development, It's the brain surgery of programming.
Acess2 OS (c) | Tifflin OS (rust) | mrustc - Rust compiler
Currently Working on: mrustc
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: A good maths parser

Post by AndrewAPrice »

Brendan wrote:The only way to get it right is to use relational numbers (e.g. where you keep track of a numerator, a divisor and an exponent). The problem is that to implement relational numbers you need to write your own operators (code to add, subtract, multiply, divide, etc) which takes much longer than writing the math parser itself.
I agree. Don't calculate the actual answer until you call ToInt()/ToFloat() for output, but store the number internally as an optimized tree of operations that results in the answer. I haven't found library that does this though.
My OS is Perception.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: A good maths parser

Post by Owen »

Or use an arbitrary precision maths package.
User avatar
Karlosoft
Member
Member
Posts: 277
Joined: Thu Feb 14, 2008 10:46 am
Location: Italy
Contact:

Re: A good maths parser

Post by Karlosoft »

I don't know if I will have to use not integer values. I mean by the time this parser is just used to solve the expressions with the memory address (something as mov ax,*label-label2+4 where label and label2 are changed with address before the parser run.
So it's very difficult that an address is 100.75 :)
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: A good maths parser

Post by AndrewAPrice »

Owen wrote:Or use an arbitrary precision maths package.
How would it deal with math involving recursive or even irrational numbers without loosing precision if the computer can't understand the calculations involved at a high-level?

A poorly designed arbitrary precision library will be endlessly allocating memory if it tries to square root 2.
My OS is Perception.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: A good maths parser

Post by Owen »

It will eventually, of course, truncate.

But do you want absolute precision? I think most people prefer their calculations to be perfomed identically whether they are compile time constants or runtime calculated (Aside of small differences when using floating point).

Arbitrary* precision maths packages like GCC's GMP and MPFR help here, as they allow you to say "I want an 128-bit integer", or "I want an 80-bit floating point value" while working on a processor which lacks native support for said sizes (e.g. cross compiling), and allow you to write one piece of general code (Just needing to ensure the entering constants are the correct types).

(*Note I say arbitrary, I.E. under program control, not just normal bignums)
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: A good maths parser

Post by AndrewAPrice »

I think you're getting away from the point Brendan and I made. No matter how precise you're wanting to store a number, if you calculate 1 / 3 * 6 after each operation then you'll get 1.99999999999999999999999999999 unless in memory you store that number is equal to "1 / 3" then when you multiple by 6 it'll cancel out the "/ 3" and turn it into "* 2", in which case when you output the number (when the actual calculation occurs) it will return 2.

You can never predict where such accuracy may be needed over the speed, but I'm sure there are areas that would benefit from it that could run for years on end without the data being interfered with by rounding errors.
My OS is Perception.
Post Reply