Writing a Interpreter

beyondsociety · Post by **beyondsociety** » Sun May 11, 2003 10:41 pm

I am in the process of beginning to write a compiler and someone on this forum suggested creating a byte code interpreter to help with constructing the compiler.

How would I go about writing a simple interpreter?

df · Post by df » Mon May 12, 2003 1:18 am

a nice big switch statement?

you need to decide if your going to output pcode or real cpu code.

a register based system is the easiest to write an interp for.

when i get home from work I'll post some info on my interp I have.

AGI1122 · Post by **AGI1122** » Mon May 12, 2003 1:57 am

Sarien? Or are you guys talking about something completly different?

Perica · Post by **Perica** » Mon May 12, 2003 2:54 am

Beyond Infinity lazy · Mon May 12, 2003 4:22 am

Nay, Perica, they are talking about compilers and Interpreters of hll's. This can be easily induced from beyondsocietys inquiry.

df · Post by df » Mon May 12, 2003 10:59 am

below is an old core to one of my interpreters.
it was a register based machine.
basically it used a 32bit number

top 8 bits were opcode, next 8 were register 1, then next8 were register 2. the remaing were some status bits.

all opcodes ran on register to register with the exception of the load/store opcode.

it ran 'compiled' scripts in a 64kb data block (all code/text etc must sit inside 64kb. that was my restriction, since this version didnt have a VM doing memory interfacing in it, which my current one has).

anyway, gives you an idea of what code my compiler output. (I cut some commands out, since my message was too long)

Code: Select all

void run_opcode(vContext *vCPU)
{
   UINT32   op;
   UINT32   r1, r2, r3, r4;

   op = (UINT32)( ((UINT8*)vCPU->ptrMem)[ vCPU->cpu.reg[ REG_IP ] ] );

   switch(GET_OPCODE(op))
   {
      case op_NULL:
         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
         break;

      case op_ADD:
         r1 = vCPU->cpu.reg[ GET_REGA(op) ];
         r2 = vCPU->cpu.reg[ GET_REGB(op) ];
         vCPU->cpu.reg[ GET_REGA(op) ] = r1 + r2;
         
         if( r1 > vCPU->cpu.reg[ GET_REGA(op) ])
            vCPU->cpu.reg[ REG_FLAGS ] |= FLAG_OVERFLOW;
         else
            vCPU->cpu.reg[ REG_FLAGS ] &= ~FLAG_OVERFLOW;

         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
         break;

      case op_SUB:
         r1 = vCPU->cpu.reg[ GET_REGA(op) ];
         r2 = vCPU->cpu.reg[ GET_REGB(op) ];
         vCPU->cpu.reg[ GET_REGA(op) ] = r1 - r2;
         
         if( r1 < vCPU->cpu.reg[ GET_REGA(op) ])
            vCPU->cpu.reg[ REG_FLAGS ] |= FLAG_OVERFLOW;
         else
            vCPU->cpu.reg[ REG_FLAGS ] &= ~FLAG_OVERFLOW;

         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
         break;

      case op_MUL:
         r1 = vCPU->cpu.reg[ GET_REGA(op) ];
         r2 = vCPU->cpu.reg[ GET_REGB(op) ];
         vCPU->cpu.reg[ GET_REGA(op) ] = r1 * r2;
         /* compute if overflow */
         break;

      case op_DIV:
         r1 = vCPU->cpu.reg[ GET_REGA(op) ];
         r2 = vCPU->cpu.reg[ GET_REGB(op) ];
         vCPU->cpu.reg[ GET_REGA(op) ] = r1 / r2;
         vCPU->cpu.reg[ GET_REGB(op) ] = r1 % r2;
         /* compute if overflow */

         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
         break;

      case op_CMP:
         if( vCPU->cpu.reg[ GET_REGA(op) ] == vCPU->cpu.reg[ GET_REGB(op) ] )
            vCPU->cpu.reg[ REG_FLAGS ] |= FLAG_EQUAL;
         else
            vCPU->cpu.reg[ REG_FLAGS ] &= ~FLAG_EQUAL;

         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
         break;

      case op_MOV:
         // dont change reg0!
         if( GET_REGA(op) == 0)
            break;

         r1 = vCPU->cpu.reg[ GET_REGA(op) ]; // dest
         r2 = vCPU->cpu.reg[ GET_REGB(op) ]; // source

         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);

         // source
         switch( GET_MOD1(op) )
         {
            // mov r1, r2
            case 0:
               r4 = r2;
               break;

            // mov r1, [r4]
            case MOD_MEM:
               r4 = ( ((UINT32*)vCPU->ptrMem)[ r2 ] );
               break;
            
            // mov r1, 0xDEADBEEF
            case MOD_NUM:
               r4 = ( ((UINT32*)vCPU->ptrMem)[ vCPU->cpu.reg[ REG_IP ] ] );
               vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
               break;

            // mov r1, [0xDEADBEEF]
            case MOD_MEM+MOD_NUM:
               r4 = ( ((UINT32*)vCPU->ptrMem)[ vCPU->cpu.reg[ REG_IP ] ] );
               vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);

               r4 = ( ((UINT32*)vCPU->ptrMem)[ r4] );
               break;
         }

         // move r4 into { r1|[r1]|[xx] }


         // destination
         switch( GET_MOD1(op) )
         {
            // mov r1, r4
            case 0:
               vCPU->cpu.reg[ r1 ] = r4;
               break;

            // mov [r1], r4
            case MOD_MEM:
               r3 = ( ((UINT32*)vCPU->ptrMem)[ r1 ] );
               ((UINT32*)vCPU->ptrMem)[ r3 ] = r4;
               break;
            
            // mov 0xDEADBEEF, r4
            // illegal!
            case MOD_NUM:
               //r3 = ( ((UINT32*)vCPU->ptrMem)[ vCPU->cpu.reg[ REG_IP ] ] );
               //vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
               // signal illegal!
               break;

            // mov [0xDEADBEEF], r4
            case MOD_MEM+MOD_NUM:
               r3 = ( ((UINT32*)vCPU->ptrMem)[ vCPU->cpu.reg[ REG_IP ] ] );
               vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);

               ((UINT32*)vCPU->ptrMem)[ r3 ] = r4;
               break;
         }
         break;
         
      case op_JE:
         r1 = vCPU->cpu.reg[ GET_REGA(op) ];
         vCPU->cpu.reg[ REG_IP ] += sizeof(UINT32);
         
         if( (vCPU->cpu.reg[ REG_FLAGS ]&=FLAG_EQUAL)==FLAG_EQUAL )
             vCPU->cpu.reg[ REG_IP ] = r1;         
         break;
   }
}

beyondsociety · Post by **beyondsociety** » Mon May 12, 2003 1:39 pm

df: what cpu is this snipe of code for? just wondering because it looks familar.

Also, I would like to take a look at all the code for this intepreter you wrote to get a idea of what a intepreter consists of.

Whats the difference between pcode and real cpu code?

df · Post by df » Mon May 12, 2003 3:02 pm

well that code is just a virtual cpu i made up. 16 registers. very simple stuff. since its all register to register operands, implementation is really basic.

i guess there isnt a great deal of difference from pcode to real cpu in a lot of ways, the original 'pcode' was for a pascal compiler back in the early 80's.

you could have code in your pcode for list->next, list->prev, or encode really complex stuff into an operand, etc.

i dont know if there is any hard/fast rules for pcode. VB4 and < used to compile to PCODE.

these kinds of 'cpu's are really simple to construct.

beyondsociety · Post by **beyondsociety** » Tue May 13, 2003 1:20 am

Do you have an opcode list or instruction set for this made up virtual pc?

df · Post by df » Tue May 13, 2003 11:20 am

yeah I have an opcode list.

Code: Select all

/*

opcodes

00 - null
01 - add
02 - sub
03 - mul
04 - div
05 - cmp
06 - mov
07 - je
08 - xor
09 - and
10 - or
11 - not
12 - neg

16 regs

'' all reg, reg

push =
mov   [r13],r1
sub r13, 4

pop =
add r13, 4
mov r1,[r13]


xxxxxxxx 00zq1111 00zq2222 bbbbbbbb

x = opcode   
1 = reg1
2 = reg2
z = numeric flag
    0 - no num
    1 - 32bit num follows
q = memory flag
   0 - reg
   1 - memory
b = undefined.   

reg0 is always ZERO
reg13 is SIP
reg14 is flags
reg15 is IP

*/

beyondsociety · Post by **beyondsociety** » Wed May 14, 2003 11:36 am

How does the code you posted lookup the opcodes?

Do you have to put a list of opcodes into a buffer or table and load it first before I run the get_Opcode function?

df · Post by df » Wed May 14, 2003 1:31 pm

what??

i load my 'binary' into memory. setup my cpu registers...
and run it?

i dont get your question. i know opcode 1 is add.. so I run the add function when I get to it...

i allocate say 64kb

Code: Select all

char *x;

x=malloc(1024*64);
read_into_file(x);

// ok, buffer x contains my code..

opcode = x[ register[instruction_pointer] ];

switch(opcode)
{
}

maybe i'm reading your question wrong, I dunno.

df · Post by df » Thu May 15, 2003 3:30 pm

I uploaded all the code for that interpreter I had in my examples above.

There is probably lots of bugs in it, but its fairly simple.
unrar it into a directory and run virt.exe it will load and run x1.bin (it being a small test file).

[ftp=ftp://ftp.mega-tokyo.com/pub/my_stuff/temp/int.rar]ftp://ftp.mega-tokyo.com/pub/my_stuff/temp/int.rar[/ftp]

its designed to work, regardless of the endianness of its host cpu.

I compiled it under VC6, but there is nothing fancy, so should compile under much of any 32bit compiler...

OSDev.org

Writing a Interpreter

Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter

Re:Writing a Interpreter