Exactly my point, plus that there is an actual user group for it. There are thousands (maybe millions) of C programs that people actually use. Every production-grade operating system kernel (Linux, BSD, Windows, OS X, Minix, GNU/Mach?) is basically written in C. No one is waiting for another C-like language, or another language at all. We already have hundreds of languages. What we don't have are competent tools for the right job. Sure, the GNU Compiler Collection is moldable to an extent and the LLVM project does an amazing job, but it's getting kinda bloated as well. That's also why I support the idea of open standards that anyone can use, it's so that tools can be continously written and replaced every now and then, so we keep the code fresh, small and thus usable.Jezze wrote:I totally agree with you there SolDMG. C is so close to the perfect language for me that there isnt much I would like to be different in the language itself besides adding more syntactic restrictions and add something better to define data structures besides just using structs and/or bitfields together with either enums or defines for register definitions. Where I think the problem is today is in the tools where they are just too big and too bloated with options.
Assembler syntax
Re: Assembler syntax
My post is up there, not down here.
Re: Assembler syntax
Why would I support a pretty much dead format?Bencz wrote: Hi!
Why u not generate a ".obj" file, using the OMF obj format ?
http://en.wikipedia.org/wiki/Relocatabl ... ule_Format
That is what I am going to do, after reading this. This is actually a pretty good idea.Bencz wrote: In code-gen of your C compiler, u can make a struct, with machine code and text asm code..., in that struct, u can út the both sintax, AT&T or Intel, the user choice for generate machine coide or asm text code
Code: Select all
enum { push_eax=0,.... } intructions instru[] = { { "50", "push eax", "pushl %eax"}, .... }
My post is up there, not down here.
Re: Assembler syntax
That was what I did on my C compiler ....
But my compiler generates an win32 EXE.
But my compiler generates an win32 EXE.
Re: Assembler syntax
Oh... Well my toolchain will probably support a lot of formats. ROMF just isn't used anymore. PE and ELF are 'the name of the game' so to speak.Bencz wrote:That was what I did on my C compiler ....
But my compiler generates an win32 EXE.
You probably already knew that though.
My post is up there, not down here.
Re: Assembler syntax
In practice, OMF comes with a number of incompatible and non-universally supported extensions and misinterpretations by various implementors, which is why there are very few OMF tools that are actually compatible at the object file level. Besides that, it's quite complex on its own.Bencz wrote: Why u not generate a ".obj" file, using the OMF obj format ?
I've decided to use ELF and have been happy with the decision. There's even a tiny 16-bit extension (supported by GNU as and NASM) that allows one to compile 16-bit code into ELF (just 2 more relocation types to reflect 16-bit relocations) and I'm taking advantage of that in my compiler. The compiler supports 16-bit and 32-bit modes and generates assembly code for NASM and then links the resultant ELFs. One format for 16 bits and 32 bits, one assembler for DOS, Windows and Linux. It's a simple and sane format. At least, for x86 and static linking. About the only thing I dislike about it is the size of the symbol table. Every symbol takes 16 bytes plus whatever is needed for its name. That's a bit too much, IMHO.
Re: Assembler syntax
Hi,
For just one addressing mode (e.g. with SIB, like "mov rax,[rbx+rcx*4+offset]") there's an 8-bit REX prefix, an 8-bit ModRM then an 8-bit SIB; which means there's 2**(8+8+8) = 16777216 unique encodings for that addressing mode. For each instruction there's probably 3 opcodes/addressing modes on average; and there's probably over 200 instructions. As a rough estimate, your lookup table is probably going to need about 1 billion entries. If you assume 32 bytes per entry you'd be looking at a total size of about 32 GiB.
You will need to generate at least part of the instruction using code - e.g. maybe one table containing the instruction's mnemonic (without operands) and a list of "opcode and addressing mode" pairs; then use the addressing mode (from the first table) to figure out how to generate the operands (possibly using several smaller lookup tables).
Of course you could also do the smart thing; and generate machine code instead of assembly so that you don't need to bother generating text, then parsing the text and assembling. If anyone wants plain text, there's plenty of decent disassemblers floating around for both Intel syntax and AT&T syntax.
Cheers,
Brendan
Have you got any idea how large that table is going to be?Bencz wrote:In code-gen of your C compiler, u can make a struct, with machine code and text asm code..., in that struct, u can út the both sintax, AT&T or Intel, the user choice for generate machine coide or asm text code
Code: Select all
enum { push_eax=0,.... } intructions instru[] = { { "50", "push eax", "pushl %eax"}, .... }
For just one addressing mode (e.g. with SIB, like "mov rax,[rbx+rcx*4+offset]") there's an 8-bit REX prefix, an 8-bit ModRM then an 8-bit SIB; which means there's 2**(8+8+8) = 16777216 unique encodings for that addressing mode. For each instruction there's probably 3 opcodes/addressing modes on average; and there's probably over 200 instructions. As a rough estimate, your lookup table is probably going to need about 1 billion entries. If you assume 32 bytes per entry you'd be looking at a total size of about 32 GiB.
You will need to generate at least part of the instruction using code - e.g. maybe one table containing the instruction's mnemonic (without operands) and a list of "opcode and addressing mode" pairs; then use the addressing mode (from the first table) to figure out how to generate the operands (possibly using several smaller lookup tables).
Of course you could also do the smart thing; and generate machine code instead of assembly so that you don't need to bother generating text, then parsing the text and assembling. If anyone wants plain text, there's plenty of decent disassemblers floating around for both Intel syntax and AT&T syntax.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Assembler syntax
A small and simple x86 assembler should take several KLOCs of C code, 3-5 KLOCs.Brendan wrote:Have you got any idea how large that table is going to be?Bencz wrote: In code-gen of your C compiler, u can make a struct, with machine code and text asm code..., in that struct, u can út the both sintax, AT&T or Intel, the user choice for generate machine coide or asm text code
Code: Select all
enum { push_eax=0,.... } intructions instru[] = { { "50", "push eax", "pushl %eax"}, .... }
Re: Assembler syntax
Get my intructions table... and work's very well for me...
Code: Select all
instructions op[] = {
{ "8D85%08X", "lea eax,[ebp+]" },
{ "50", "push eax" },
{ "51", "push ecx" },
{ "55", "push ebp" },
{ "58", "pop eax" },
{ "59", "pop ecx" },
{ "03C1", "add eax,ecx" },
{ "05%08X", "add eax" },
{ "0101", "add [ecx],eax" },
{ "660101", "add [ecx],ax" },
{ "0001", "add [ecx],al" },
{ "83C4%02X", "add1 esp" }, // byte operand
{ "81C4%08X", "add4 esp" }, // int operand
{ "8300%02X", "add1 dwordptr[eax]" },
{ "8100%08X", "add4 dwordptr[eax]" },
{ "8301%02X", "add1 dwordptr[ecx]" },
{ "8101%08X", "add4 dwordptr[ecx]" },
{ "8000%02X", "add byteptr[eax]" },
{ "8001%02X", "add byteptr[ecx]" },
{ "48", "dec eax" },
{ "2BC1", "sub eax,ecx" },
{ "2901", "sub [ecx],eax" },
{ "662901", "sub [ecx],ax" },
{ "2801", "sub [ecx],al" },
{ "83EC%02X", "sub1 esp" },
{ "81EC%08X", "sub4 esp" },
{ "8328%02X", "sub1 dwordptr[eax]" },
{ "8128%08X", "sub4 dwordptr[eax]" },
{ "8329%02X", "sub1 dwordptr[ecx]" },
{ "8129%08X", "sub4 dwordptr[ecx]" },
{ "8028%02X", "sub byteptr[eax]" },
{ "8029%02X", "sub byteptr[ecx]" },
{ "0FAFC1", "imul eax,ecx" },
{ "69C0%08X", "imul eax,eax" },
{ "99", "cdq" }, // Convert Double to Quad.
{ "F7F9", "idiv ecx" },
{ "3BC8", "cmp ecx,eax" },
{ "83F8%02X", "cmp1 eax" },
{ "81F8%08X", "cmp4 eax" },
{ "80FC%02X", "cmp ah" },
{ "F6C4%02X", "test ah" },
{ "23C1", "and eax,ecx" },
{ "80E4%02X", "and ah" },
{ "09C0", "or eax,eax" },
{ "0BC1", "or eax,ecx" },
{ "0901", "or [ecx],eax" },
{ "660901", "or [ecx],ax" },
{ "0801", "or [ecx],al" },
{ "31C0", "xor eax,eax" },
{ "33C1", "xor eax,ecx" },
{ "80F4%02X", "xor ah" },
{ "D3E0", "shl eax,cl" },
{ "D3E8", "shr eax,cl" },
{ "F7D8", "neg eax" },
{ "89D0", "mov eax,edx" },
{ "8BC8", "mov ecx,eax" },
{ "B8%08X", "mov eax" },
{ "B8V%06X_", "mov eax_v" },
{ "B8X%06X_", "mov eax_x" },
{ "B8fn_%04X_", "mov eax_fn" },
{ "B8FN_%04X_", "mov eax_FN" },
{ "B9%08X", "mov ecx" },
{ "B9V%06X_", "mov ecx_v" },
{ "BA%08X", "mov edx" },
{ "C700%08X", "mov dwordptr[eax]" },
{ "C700V%06X_", "mov dwordptr[eax]_v"},
{ "66C700%04X", "mov wordptr[eax]" },
{ "C600%02X", "mov byteptr[eax]" },
{ "8B00", "mov eax,[eax]" },
{ "8B01", "mov eax,[ecx]" },
{ "89E5", "mov ebp,esp" },
{ "8901", "mov [ecx],eax" },
{ "668901", "mov [ecx],ax" },
{ "8801", "mov [ecx],al" },
{ "0FBF00", "movsx eax,wordptr[eax]"},
{ "0FBE00", "movsx eax,byteptr[eax]"},
{ "91", "xchg eax,ecx" },
{ "74%02X", "jz " },
{ "75%02X", "jnz " },
{ "E9ln_%04X_", "jmp " },
{ "0F85ln_%04X_", "jne " },
{ "0F82%08X", "jb " },
{ "7C%02X", "jl " },
{ "7D%02X", "jge " },
{ "7E%02X", "jle " },
{ "7F%02X", "jg " },
{ "E8fn_%04X_", "call " },
{ "FF10", "call dwordptr[eax]" },
{ "FF15X%06X_", "call dwordptr[]" },
{ "C9", "leave" },
{ "C3", "ret" },
{ "0F94C0", "sete al" },
{ "0F95C0", "setne al" },
{ "D9E0", "fchs" },
{ "D9C9", "fxch st(1)" },
{ "DD00", "fld qwordptr[eax]" },
{ "DD01", "fld qwordptr[ecx]" },
{ "DD5C2400", "fst qwordptr[esp]" },
{ "DFE0", "fstsw" },
{ "DD18", "fstp qwordptr[eax]" },
{ "DD19", "fstp qwordptr[ecx]" },
{ "DEC1", "faddp st(1),st" }, // +=
{ "DEE9", "fsubrp st(1),st" }, // -=
{ "DEC9", "fmulp st(1),st" },
{ "DEF9", "fdivrp st(1),st" },
{ "DAE9", "fucompp" },
{ "DB1C24", "fistp dwordptr[esp]" },
{ "DC25V%06X_", "fsub qwordptr[]_v" },
};
Re: Assembler syntax
Hi,
My guess is that your compiler does no optimisation at all and just uses the CPU as a stack machine (constantly pushing and popping while half of the CPU's registers aren't used); and the generated code probably runs about 100 times slower than it should.
Cheers,
Brendan
It's not even slightly close to "works very well".Bencz wrote:Get my intructions table... and work's very well for me...
My guess is that your compiler does no optimisation at all and just uses the CPU as a stack machine (constantly pushing and popping while half of the CPU's registers aren't used); and the generated code probably runs about 100 times slower than it should.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Assembler syntax
I'm not worried about it, it's for just study.
Re: Assembler syntax
When I developed this compiler, I had the greatest intention to study EXE format, then I was not a bit worried about optimizing code....
Re: Assembler syntax
For people actually still wondering what syntax I'm considering more and more to use, it's the Intel syntax with AT&T-like directives, and an exclamation mark means a comment. The latter can be changed really easily though, if people don't like it. So a sample bootsector would look like this:
And then of course "bpb.inc" would contain the BIOS parameter block, and it would be inserted.
I do realize I'm REALLY (almost discriminatingly to other probably better written assemblers) re-inventing the wheel here.
Code: Select all
! A boot sector.
.bits16
.org 0x7c00
! Include the BPB.
.include "bpb.inc"
start:
jmp short boot
nop
bpb:
.insert _bpb
print:
mov ah, 0Eh
repeat:
lodsb
or al, 0
jz done
int 0x10
jmp repeat
done:
ret
boot:
! Print a message.
mov si, msg
call print
! Halt the system.
cli
hlt
msg:
.ascii "Hallo wereld!"
.hex 0x0A
.hex 0x0D
.dec 0
! Make sure the binary size is 510 bytes + boot signature, and make the filler 0.
.size 510 0
! Boot signature.
.hex 0x55
.hex 0xAA
I do realize I'm REALLY (almost discriminatingly to other probably better written assemblers) re-inventing the wheel here.
My post is up there, not down here.
Re: Assembler syntax
If you are not using the characters ; or # or // for another purpose why are you creating a new comment character?
"God! Not Unix" - Richard Stallman
Website: venom Dev
OS project: venom OS
Hexadecimal Editor: hexed
Website: venom Dev
OS project: venom OS
Hexadecimal Editor: hexed
Re: Assembler syntax
I didn't read every post in this thread, so I'm not sure what you've decided on, but I'd highly suggest not creating your own assembly language. It just defeats the purpose. Adding in things like macros, structures, enumerations, local labels, etc. is a good idea though.
It took some time to get used to at first, but it's very easy to comment (especially function arguments) and the indentation better reflects how the code is actually parsed.
On a side note, one idea I came up with that I think would be a good extension to the C language is a behavior declaration. It'd allow better namespacing and class-like functionality.
I know this is an old post, but I figured I'd reply that this often depends on the person. K&R users are probably far more likely to spend time messing around with whitespace than Allman style users because Allman style is much easier to comment. People who use spaces probably spend more time on whitespace than people who use tabs as well. I tend to use GNU-style these days with pre-ansi C function delarations.Brendan wrote:I remember a presentation (which was actually C++ syntax, and may have been about a code sanitiser that Google built out of parts of the LLVM project) where they investigated where programmer's time is spent and found that most programmers spend about 20% of their time just diddling with white-space.
It took some time to get used to at first, but it's very easy to comment (especially function arguments) and the indentation better reflects how the code is actually parsed.
On a side note, one idea I came up with that I think would be a good extension to the C language is a behavior declaration. It'd allow better namespacing and class-like functionality.
Re: Assembler syntax
It looks nice, I guess. And what would you want it to be? Just getting as much feedback as possible.b.zaar wrote:If you are not using the characters ; or # or // for another purpose why are you creating a new comment character?
My post is up there, not down here.