However, I could use some assistance, or at the very least criticism and advice. The current design, which is written in R6RS Scheme, calls for a separate data file and register definition file for each of the supported iterations of the ISA - that is to say, there is a pair of files for the 8086, another for the 80186, the 80286, etc. The register file format consists of a series of lists, each containing a list of the register name(s) followed by the bit width of the register in bytes (this does mean that architectures which aren't aligned on multiples of bytes would be unsupported, but how many such architectures are still in use?) . For example, the file 'i8086.regs' reads as follows:
Code: Select all
'(("AX" "Accumulator") 2)
'(("AH" "Accumulator-Upper-Half") 1)
'(("AL" "Accumulator-Lower-Half") 1)
'(("BX" "Base-Register" "Index") 2)
'(("BH" "Base-Upper-Half" "Index-Upper-Half") 1)
'(("BL" "Base-Lower-Half" "Index-Lower-Half") 1)
'(("CX" "Counter") 2)
'(("CH" "Counter-Upper-Half") 1)
'(("CL" "Counter-Lower-Half") 1)
'(("DX" "Data-Register") 2)
'(("DH" "Data-Upper-Half") 1)
'(("DL" "Data-Lower-Half") 1)
'(("DI" "Dest-Index") 2)
'(("SI" "Source-Index") 2)
'(("BP" "Base-Pointer" "Stack-Frame-Pointer") 2)
'(("SP" "Stack") 2)
'(("IP" "Instruction-Pointer") 2)
'(("FLAGS") 2)
'(("CS" "Code-Segment") 2)
'(("DS" "Data-Segment") 2)
'(("SS" "Stack-Segment") 2)
'(("ES" "Extra-Segment") 2)
The format for instructions a good deal more complex; it again consists of a series of lists, each with three outer fields, the list of mnemonics, the list of the possible fields of the opcode, and a text description of the instruction. The fields section is itself a list consisting (in the case of the x86 ISA) of the four common opcode prefixes, the primary opcode (which I will describe shortly), the MOD-R/M type, an enum indicating whether the field can accept the LOCK prefix (and under what circumstances), and an enum indicating the minimum security ring of the instruction. The opcode field is itself broken down into the size of the opcode field, the opcode itself, and the sub-fields which select various conditions or states (it gets complicated - really complicated). A few examples would be:
Code: Select all
'(("AAA" "ASCII-Adjust-After-Addition")
((NONE NONE NONE NONE (8 #x37 (NONE)) NONE NO RING-3))
"ASCII Adjust AL After Addition")
'(("ADD")
((NONE NONE NONE NONE (6 #x00 (D W)) reg REG-DEST-ONLY RING-3)
(NONE NONE NONE NONE (7 #x04 (W)) NONE NO RING-3)
(NONE NONE NONE NONE (4 #x80 (S W)) 0 ALLOWED RING-3))
"Add")
'(("ADDPD")
(#x66 #x0F NONE NONE (8 #x58 (NONE) reg NO RING-3))
"Add Packed Double-FP Values")
'(("BOUND" "Check-Array-Bounds")
(('NONE 'NONE '(8 #x62 (D)) 'NONE 'reg INT-FLAG))
"Check Array Index Against Bounds")
'(("CMOVB" "CMOVNAE" "CMOVC")
((NONE #x0F NONE NONE (5 #x40 (B)) reg NO RING-3))
"Conditional Move - below/not above or equal/carry (CF=1)")
As complex as this is, I am not sure that I have captured enough information about this ISA to make a truly table-driven assembler. I have repeatedly had to expand upon it already to handle edge cases I hadn't foreseen, and the absurd, irrational complexity of the x86 design and difficulty of following the manuals and the various (often contradictory) web pages documenting it means I could easily miss something important. Even now, I am uncertain enough about how to represent the multiplicity of argument formats that I am have not tried to add that information to the data files; I am hoping I won't need to do so explicitly.
Thus, against my better judgment, I am asking the good folks at this forum for three things: first, a review of the code and data, and of the data formats, to see if there is something I a have overlooked; two, advice on how best to represent the needed data; and third, assistance in entering the volumes of instruction data I am trying to cope with (most of my information is coming from http://ref.x86asm.net/, an excellent if somewhat opaque reference page to whom I am greatly indebted). I have barely scratched the surface at this point, and the outrageous number of instructions and variants thereof are threatening to drive me crazier than I already am. Can anyone give me some good advice on this matter?