Cjreek wrote:Hi,
I'm looking for a programming language (except assembler) which is able to create 16 bit flat binary code.
Is there any language like this?
I asked Google, but unfortunately I didn't find anything satisfying.
Since you asked about a programming language (not its compiler) I have been designing a low-level-oriented language called RealC, which is designed for just that, actually it's a language optimized to program for the x86 16, 32 or 64-bit processors. It can use addressing for both 16 and 32-bit offsets if you use 16-bit code, and is originally thought to be used for Unreal Mode 16-bit code but it can easily and directly create standard Real Mode code.
It isn't specifically designed to tweak or use segment registers like DS, ES, FS, etc., but you can assign values to them in the same way you would do in assembly, like $DS=$AX;
It's a mixture of the easiest features from assembly language and C, and it can use NASM-syntax assembly language. It mostly uses registers as variables, and db, dw, dd, dq variables, and using it rapidly makes assembly language much easier to debug, optimize and understand. That's a plus, but you would need to translate manually the code to assembly, but at least you will notice that it will make your low level programming more effective.
It's like a C-like standarized pseudocode with the potential to be compiled into .ASM files for you to use. It has the ability to "compile" single source files with undeclared symbols to allow you to update only pieces of code instead of a whole source project (but be careful with functions that define parameters -although they aren't really implemented yet-).
The good thing about it is that it's also designed to not add absolutely any instruction you don't tell it (only raw strings of bytes exist, like in assembly), and it means that by now you can only use basic expressions, asm-style, but you will see that most of the time it actually makes the code more readable than making it solve a highly nested statement.
It's also very intuitive. If you know basic assembly (general-purpose registers, etc., ).
The following is sort of an abstract of its specification; these are the elements you can use currently (standarized ones and guaranteed to stay available for future code):
General-purpose Registers:
Code: Select all
$AL, $AH, $AX, $EAX, $BL, $BH, $BX, $EBX, $CL, $CH, $CX, $ECX, $DL, $DH, $DX, $EDX, $ESI, $SI, $EDI, $DI, $EBP, $BP, $ESP, $SP
_________________________
Segment registers (to assign values only through geperal-purpose registers):
$DS, $ES, $FS, $GS, $SS
_________________________
Inline assembly:
Code: Select all
asm{
.... code block ....
}
asm ;single asm statement instruction
;newline-terminated
_________________________
ASM-Like directives:
Variables:
Basic types:
db -- byte
dw -- word
dd -- doubleword
dq -- quadword
Usage of variables:
byte[labelname];
word[labelname];
dword[labelname];
qword[labelname];
byte[$EAX];
byte[$EBX];
byte[$ECX];
byte[$EDX];
byte[$ESI];
byte[$EDI];
byte[$ESP];
byte[$EBP];
word[$EAX];
word[$EBX];
word[$ECX];
word[$EDX];
word[$ESI];
word[$EDI];
word[$ESP];
word[$EBP];
dword[$EAX];
dword[$EBX];
dword[$ECX];
dword[$EDX];
dword[$ESI];
dword[$EDI];
dword[$ESP];
dword[$EBP];
...etc...
byte[$AX+somenumber];
byte[$EBX+somenumber];
byte[$CX+somenumber];
byte[$EDX+somenumber];
byte[$SI+somenumber];
byte[$EDI+somenumber];
byte[$SP+somenumber];
byte[$EBP+somenumber];
word[$EAX+somenumber];
word[$BX+somenumber];
word[$ECX+somenumber];
word[$DX+somenumber];
word[$ESI+somenumber];
word[$DI+somenumber];
word[$ESP+somenumber];
word[$BP+somenumber];
dword[$AX+somenumber];
dword[$EBX+somenumber];
dword[$CX+somenumber];
dword[$EDX+somenumber];
dword[$SI+somenumber];
dword[$EDI+somenumber];
dword[$SP+somenumber];
dword[$EBP+somenumber];
Assigning label values, addresses:
$EAX=label_or_pointer_to_variable;
Object code:
org ?
bits 16
bits 32
File inclusion:
incbin
Global definitions:
equ
_________________________
Preprocessor:
#include -- translated to %include
#define -- translated to equ
_________________________
Special operators
>>> -- rotate right
<<< -- rotate left
_________________________
Numbers
Unlike in GCC, you can use binary numbers
like 010100101010101b, very useful for bitmasks.
_________________________
C instructions
goto -- very important because it translates into
a standard jmp instruction pointing to a label or a
number for offset.
while, do-while, if, else if, switch, break, continue, among other common ones, as well as standard bitwise, logical and arithmetic operators.
_________________________
_________________________
_________________________
_________________________
An example to convert an ASCII string to a binary value (see
here for some more examples of syntax in .CSM files).
Parameters (set up by the caller):
$BL takes the numeric base (2 for binary, 10 for decimal, 8 for octal, 16 for hexadecimal, etc.)
$ESI is the location of the zero-terminated string
Return values:
$EAX returns a numeric binary value, base 10, to be used normally.
Code: Select all
function /*$EAX */str2num(/*$BL numbase, $ESI strbuff*/)
{
asm push ebx
asm push ecx
asm push edx
asm push esi
$EAX = 0;
$ECX = 0;
while( byte[$ESI] != 0x00 )
{
$BH = byte[$ESI]; ;//get the character
$ESI++; ;//advance string pointer
;//Convert ASCII to binary value:
;;
if($BH >= '0' && $BH <= '9')
$BH -= 0x30;
else if($BH >= 'a' && $BH <= 'z')
$BH -= (0x61-10);
else if($BH >= 'A' && $BH <= 'Z')
$BH -= (0x41-10);
$CL = $BL; ;//take base in ECX
; $EAX *= $ECX; ;//get result in EDX:EAX
asm mul ecx
$EDX = $BH; ;//put the binary value in EDX
$EAX += $EDX; ;//add it to the value
}
asm pop esi
asm pop edx
asm pop ecc
asm pop ebx
}
Now see how it gets translated. It looks confusing, but translating it from RealC is much easier than it looks, and when a basic compiler is finished it should be even easier:
Code: Select all
;$EAX str2num($BL numbase, $ESI strbuff)
;;
str2num:
push ebx
push ecx
push edx
push esi;
xor eax,eax;
xor ecx,ecx;
.while0:
cmp byte[esi],0
jz .while0_end;
mov bh,[esi]; ;//get the character
inc esi; ;//advance string pointer
;//Convert ASCII to binary value:
;;
cmp bh,'0'
jb .not_if_ln47;
cmp bh,'9'
ja .not_if_ln47;
sub bh,0x30;
jmp .if_ln47__end;
.not_if_ln47:
cmp bh,'a'
jb .not_if_ln50;
cmp bh,'z'
ja .not_if_ln50;
sub bh,(0x61-10);
jmp .if_ln47__end;
.not_if_ln50:
cmp bh,'A'
jb .not_if_ln53;
cmp bh,'Z'
ja .not_if_ln53;
sub bh,(0x41-10);
jmp .if_ln47__end;
.not_if_ln53
.if_ln47__end:
mov cl,bl; ;//take base in ECX
mul ecx; ;//get result in EDX:EAX
;EAX*ECX == EDX:EAX
movzx edx,bh; ;//put the binary value in EDX
add eax,edx; ;//add it to the value
jmp .while0;
.while0_end:
pop esi
pop edx
pop ecx
pop ebx;
ret
Note that the use of semicolons for assembly instructions in the ASM source, while not necessary, makes it easier to distinguish between the end of single instructions or, more importantly, contiguous sets of related instructions for one same operation.
You can still use all of the instructions and directives that NASM/YASM understand inside an asm{} block or asm single-line statement.
As you can see the source code will be translated into assembly files, and thus you can further assemble and link it later with the object code from any other language that can link like this, like GCC.
This allows you to truly have full control over your low level source code and your own optimizations and to know exactly what you are doing. That's a very valuable thing, specially when you are starting to sort out the right algorithm implementations for your code. The RealC code is almost a mirror of the instructions that will be actually generated. Only loops, ifs, switches and things like that generate the necessary code to make them work.
You can even place code outside functions and you don't need a main(), so you have to be careful where you put the code to not overrun it, like in assembly.