Re: bytecode design
Posted: Wed Apr 24, 2013 3:24 am
The CIL bytecode (aka MSIL) used in .NET assemblies is actually really nicely designed. It is sufficiently high-level as to allow the Just-In-Time compiler the most possible freedom (e.g. to optimize), while at the same time smaller than most instructions in real machine code. Contrary to what tjmonk15 says, there is no one-to-one correspondence between CIL instructions and machine code. However, naturally, the basic arithmetic, comparison, jump and bitwise instructions (add, cmp, br, xor) are present in allmost all instruction sets, including CIL.
CIL is a stack-based language: most operands are put on the (imaginary) evaluation stack. This saves a lot of space in the instruction encoding, since those operands are implicit. For example, a method might have the following CIL content:
This method has a bytecode content of just 4 bytes. The first two instructions push a value on the evaluation stack, the third pops two values and pushes a new one, and the last one pops a value and returns it from the method. It is not very hard to convert stack-based bytecode to something that uses registers.
In CIL an opcode consists of one or two bytes. If the first byte is one of the reserved values, then a second byte follows. This is mostly used to encode a short and a long variant of the same instruction. There is also space for one or more explicit operands. For example:
If you want to know more about CIL bytecode, I suggest you read ECMA 335. It is a very readable specification. The instructions are in section III.
CIL is a stack-based language: most operands are put on the (imaginary) evaluation stack. This saves a lot of space in the instruction encoding, since those operands are implicit. For example, a method might have the following CIL content:
Code: Select all
{
ldc.i4.5 // 0x1B: Load constant 5
ldc.i4.1 // 0x17: Load constant 1
add // 0x58: Add
ret // 0x2A: Return
}
In CIL an opcode consists of one or two bytes. If the first byte is one of the reserved values, then a second byte follows. This is mostly used to encode a short and a long variant of the same instruction. There is also space for one or more explicit operands. For example:
Code: Select all
// Push pre-defined constant
ldc.i4.1 // 0x17
// Push 8-bit constant
ldc.i4.s 0xAB // 0x1F 0xAB
// Push 32-bit constant
ldc.i4 0xDEADBEEF // 0x20 0xEF 0xBE 0xAD 0xDE
// Push method argument by pre-defined index
ldarg.0 // 0x02
// Push method argument by 8-bit index
ldarg.s 4 // 0x0E 0x04
// Push method argument by 16-bit index
ldarg 0x123 // 0xFE 0x09 0x23 0x01