C/C++ Compiler

This forums is for OS project announcements including project openings, new releases, update notices, test requests, and job openings (both paying and volunteer).
User avatar
~
Member
Member
Posts: 1226
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

C/C++ Compiler

Post by ~ »

Image COMPILER-2018-12-27.zip

http://sourceforge.net/p/c-compiler/

I've been developing a C compiler all this year 2018.
The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript, and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.

Now that there are only 4 days left for this year, I am announcing it.

I just managed to implement a set of basic text processing functions to recognize C keywords and some syntax.

The most important thing was developing a file called C.ASM, which helps to generate code for implementing the C conventions for calling, declaring function bodies and declaring local or global variables.

The most complex thing I managed was being able to make cout << "Hello"; work in pure assembly using MSVCIRT.DLL (C:\COMPILER\C\X86\EXAMPLES\HAND_ASM\CPP\OOP_CPP\01_00.cpp).


I would like to keep developing it, but next year I will work on implementing full 32-bit paging functionality for a formal memory allocator based on finding fast free/used/reserved pages. I will work more slowly on my compiler as I need to use it, until I find time, another year, to study how to process complex expressions and text in general for source code and structured binaries.
Last edited by ~ on Fri Dec 28, 2018 12:30 am, edited 1 time in total.
YouTube:
http://youtube.com/@AltComp126

My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: C/C++ Compiler

Post by alexfru »

~ wrote:The idea is a compiler capable of ... and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.
FYI, gcc has an option to put functions into individual sections. So, if your .c (and therefore .o) file has 10 functions and only one is being pulled at link time, you will get just that one function, not all 10.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: C/C++ Compiler

Post by Schol-R-LEA »

Having seen the last version of it you posted, I am approaching this with trepidation. I fear that bleeding eyes may be in my near future.

EDIT #1: I am currently (at 2150 EST on 2018-12-27) on my fourth attempt to download a 20MB file from the unstable piece of effluvia that is Archefire. Tilde, you do know you can attach files to posts, right? Also, 20MB for a hand-coded compiler? Please tell me you didn't include the executables and object files, because seriously, that would be just rude.

EDIT #2: Yep, it's all there. sigh Why do I even bother...
Last edited by Schol-R-LEA on Thu Dec 27, 2018 8:59 pm, edited 3 times in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: C/C++ Compiler

Post by Solar »

~ wrote:The idea is a compiler capable of handling any C/C++ code for any compiler transparently, without the real need of including header files but exposing transparently the whole environment just like in JavaScript...
Note that this concept is breaking the language specification.

For example, unless I explicitly...

Code: Select all

#include <stdio.h>
...there may not be a declaration of a function named printf(). This is a requirement of the language C.

And as include statements are part of the language specification, not implementing include functionality is also breaking the language (and most, if not all, existing code).

Also, I assume that "the whole environment" refers to the respective standard libraries for C and C++. Aside from wondering how you intend to provide these in this "header-less environment" of yours, note that the very purpose of C / C++, or any real programming language actually, is to interface with third-party libraries that the compiler vendor may never have heard of.

The way to interface with these third-party libraries, in C/C++, is through header files that provide the declarations necessary for the compiler to actually do its job.

I.e., you're implementing a new language that is neither C nor C++, and will not work with existing code, only with code explicitly written for your "language that isn't really C or C++".
~ wrote:...and only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers.
If you think this is how existing compilers / linkers work, you are sorely mistaken. Which makes me question your qualification to make a project like this happen in the first place.

I've written a partial implementation of the C standard library. I've worked as a professional C++ coder for the last, oh, about 16 years now. I'd say what you're presenting here is...
  • ...based on a faulty understanding of how existing C/C++ toolchains work,
  • ...not solving any problems anybody (except you?) is actually having with C/C++,
  • ...is far outside the scope you, or even a team of a handful of "you's", can pull off in any realistic time scale.
You want to toy with a custom compiler for a custom language, which might be quite similar to C/C++ even, go right ahead.

But please don't call it "a C/C++ compiler", as it isn't. These two languages explicitly forbid what you are describing, and there is absolutely no need for doing it in the first place.

I'd be happy to explain the various details I hinted at.
~ wrote:...to study how to process complex expressions and text in general for source code and structured binaries.
If you have to study that yet, stay away from C++. Seriously. That language isn't just "C plus a bit", it's the litmus test for compiler builders, as C++ is among the ugliest beasts imaginable as far as parsing the language is concerned. C isn't quite that simple to begin with, but it's a walk in the park compared to C++.

You're setting yourself up for a train wreck. Try lower hanging fruit...
Every good solution is obvious once you've found it.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: C/C++ Compiler

Post by Schol-R-LEA »

Solar wrote:You're setting yourself up for a train wreck. Try lower hanging fruit...
Clearly you haven't read the code yet. The train is well and truly wrecked already.

Seriously, the few parts of it that are intelligible at all show a massive, deliberate ignorance of every rule of writing clear and concise code which I know of, and not a shred of knowledge about compiler design can be found in any part of it.

I did note that ~ took absolutely no heed of the previous advice, as almost everything I critiqued about the early cuts of the program are not only still there, but greatly expanded upon. Reading this make me want to cry in frustration and horror.

~, I am going to be blunt: STOP. You don't know what you are doing.

Go read a book on compiler design - any book on compiler design, because, honestly, even a bad one would be better than what you think you are doing now. You are not merely trying to reinvent the wheel; you are trying to reinvent the high-tech wheels from a Mars rover using a screwdriver and bits of bubble gum, and it isn't working.

As I have said before, there is no other subject in computer science that has been as thoroughly studied as compilers and interpreters. You are hurting yourself by not learning more about it before trying to write one.
Last edited by Schol-R-LEA on Thu Dec 27, 2018 9:29 pm, edited 4 times in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
iansjack
Member
Member
Posts: 4685
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: C/C++ Compiler

Post by iansjack »

only link the functions that are actually used, not entire code libraries unnecessarily like with existing compilers
With that lack of understanding you are going to have a tough time writing anything approaching a compiler.

It's not true for static linking, let alone the situation when dynamic libraries are used.
nullplan
Member
Member
Posts: 1766
Joined: Wed Aug 30, 2017 8:24 am

Re: C/C++ Compiler

Post by nullplan »

iansjack wrote:It's not true for static linking, let alone the situation when dynamic libraries are used.
Well, if dynamic libraries are used, the entire library is linked in, but hey, at least the text sections are shared across processes. If the kernel supports that. Not the data sections, though. And of course you have to pay for the position-independent code and the relocations at every start (or, with lazy binding, when you call a function). Oh, and the text section sharing doesn't help if a process is the only user of a library (version).

I am not a fan of dynamic linking.
Carpe diem!
User avatar
iansjack
Member
Member
Posts: 4685
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: C/C++ Compiler

Post by iansjack »

I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.

(And, to be strictly accurate, dynamic libraries are not linked in to the executable.)
User avatar
MichaelFarthing
Member
Member
Posts: 167
Joined: Thu Mar 10, 2016 7:35 am
Location: Lancaster, England, Disunited Kingdom

Re: C/C++ Compiler

Post by MichaelFarthing »

iansjack wrote:I look at the number of running processes on my Linux installation. I look at how many of them use standard C library functions.

It's a no-brainer.
Hm. Well I agree with the words here, Ian, but not I'm afraid with the meaning you were wanting to convey! :| :)
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: C/C++ Compiler

Post by Schol-R-LEA »

Ordinarily, I would want to try and bring this thread back onto the original topic, but given what that topic was, I can understand why no one wants to go back to it...

Still, let's at least give ~ some help. He clearly needs it. I'll start with the obvious and necessary part which Tilde doesn't seem to have yet: the grammar.

So, I'll write a simple grammar for the lexical analyzer. The grammar for the parser can wait; in many ways, the lexer is more crucial, as it is where most compilers spend 80% or more of their time. I don't expect ~ to write a high-performance Deterministic Finite Automaton for it the way professional compilers do, but at least knowing what you are looking for will help.

Actually, to make it even simpler, let's start with the lexer for the preprocessor, which should really sort of be a separate thing from the compiler (mostly, it is useful for them to share some symbols but that's getting ahead of things).

So, a regular grammar for the lexemes of a subset of the C preprocessor in Extended Backus-Naur Form:

Code: Select all

token ::= keyword | identifier | paren
keyword ::= "#"("include" | "define" | "if" | "ifdef" | "ifndef" |  "elif" | "else" | "endif" | pragma")
identifer ::= alpha {alphanum}
alpha ::= "A" | "a" |"B" | "b" | "C" | "c"  ... | Z" | "z"
alphanum ::= alpha | digit
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
paren ::= lparen | rparen
lparen ::= "(" 
rparen ::= ")"
I am ignoring the contents of the body of a #define, as a simple preprocessor won't actually look at the body of the defined macros. A more complete preprocessor will, but I am trying to keep things simple here.

For those unfamiliar with EBNF, these are a set of what are called 'production rules', which describe in a compact way what makes up a given type of grammar element. This particular grammar translates to:
  • A preprocessor token is either a keyword, an identifier, or a parenthesis.
  • A keyword is a hash character ("#") followed by one of a set of literals: "include", "define", "if, "ifdef", "ifndef", "elif", "else", "endif", or "pragma".
  • An identifier consists of a letter, followed by zero or more letters or digits.
  • A parenthesis is either a left parenthesis "(" or a right parenthesis ")".
Now, at this point you might be asking why you would go to the trouble of making something like this. The reason is simple: you can use this as a guide for how to code the lexer itself, either directly for a simple ad-hoc lexer, or for defining the states of a Deterministic Finite State Automaton for a more formal lexer. I can discuss that in greater detail later if Tilde wants.

The lexer for the C code itself is a good deal more complicated; to give you a leg up on that, I will write out the EBNF for basic number recognition for you as well:

Code: Select all

number ::= "0" | non-zero-digit [integer] | "0" octal-integer | "0x" hex-integer | fp-number |  signed-number 
non-zero-octal-digit ::= "1" | "2" | "3" | "4" | "5" | "6" | "7"
octal-digit ::= "0" | non-zero-octal-digit
non-zero-digit ::= non-zero-octal-digit | "8" | "9"
digit ::= "0" | non-zero-digit
non-zero-hex-digit ::= non-zero-digit | "A" | "a" |"B" | "b" | "C" | "c" | "D" | "d" | "E" | "e" | "F" | "f"
hex-digit ::= "0" | non-zero-hex-digit
integer ::= digit [{digit}]
octal-integer ::= octal-digit [{octal-digit}]
hex-integer ::= hex-digit [{hex-digit}]
fp-number ::= "." integer | integer "." integer
signed-number ::= ("+" | "-" ) (integer | fp-number)
Once again, I'm deliberately ignoring some things like exponential notation, in order to keep it simple. Note also that as it is now, it would not be strictly deterministic, as there are some potential issues with the definition of fp-number; this can be ironed out later.

I might go over the actual preprocessor grammar (that is, the grammar for parsing the preprocessor directives) later, once I am convinced that ~ has actually understood this post and why it is relevant.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
~
Member
Member
Posts: 1226
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: C/C++ Compiler

Post by ~ »

How to Code an stdcall or cdecl Function

Here, if we set .ret_bytes to anything other than 0, the function becomes stdcall, automatically popping parameter bytes from the stack.

With this skeleton we can see how easy it is to make a C/C++ translator, and how easy it is to write C-like functions with local variables, stack parameters, return value in WIDEAX, even by hand or with an integrated compiler translator for regular functions.

Look how easy it is to assign local labels to parameters and variables, just as if we were programming in plain C:

Code: Select all

;Inputs (push order):
;       Param 1
;       Param 0
;
;;
C_function_skeleton:
 ;Create stack frame:
 ;;
  push widebp
  pushfwide
  mov widebp,widesp
  add widebp,wideword_sz*3   ;Go past saved flags, WIDEBP,
  ;;                         ;and return address to
                             ;directly access parameters

 ;Stack parameters:
 ;;
  %xdefine .Param0   wideword[widebp]
  %xdefine .Param1   wideword[widebp+wideword_sz]



 ;Variables start:
 ;;
  %xdefine .Var0 wideword[widebp-((wideword_sz*3)+(wideword_sz*1))]
  %xdefine .Var0_byte byte[widebp-((wideword_sz*3)+(wideword_sz*1))]
  %xdefine .Var1 wideword[widebp-((wideword_sz*3)+(wideword_sz*2))]
  sub widesp,wideword_sz*2

 ;Number of parameter bytes to discard by the function on return.
 ;If this is NOT 0, the function is stdcall, otherwise it's cdecl:
 ;;
  %xdefine .ret_bytes wideword_sz*2


 ;Save used registers:
 ;;
  push widecx
  push widedx
  push widedi




















 ;Code start:
 ;;
  mov .Var0,53  ;Initialize Var0 and copy it to Var1
  push .Var0
  pop .Var1
 
 ;Code end:
 ;;




















 ;Restore used registers:
 ;;
  pop widedi
  pop widedx
  pop widecx

 add widesp,wideword_sz*2      ;Release local stack variables


 ;Discard stack frame:
 ;;
  popfwide
  pop widebp
retwide .ret_bytes  ;Return discarding N parameter bytes
YouTube:
http://youtube.com/@AltComp126

My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
User avatar
iansjack
Member
Member
Posts: 4685
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: C/C++ Compiler

Post by iansjack »

Note that, if working in 64-bits, most ABIs use registers rather than the stack to pass parameters/return values.

This is one of the problems with trying to write assembler code that is portable to 32 and 64 bits. It seems to me that, in chasing the rather illusory goal of compatibility, you are ending up with an inefficient implementation.
User avatar
~
Member
Member
Posts: 1226
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: C/C++ Compiler

Post by ~ »

Defining Typed Variables for Immediate Access in Assembler

Instead of making the usual code:

Code: Select all

;Declaration:
;;
 variable0 dd 0


;Low-level access:
;;
 mov dword[variable0],0



We can do the C-like code:

Code: Select all

;Declaration:
;;
 %xdefine variable0 dword[_variable0]
 %macro variable0. 2   ;Manual type-cast macro for this variable
   ;%1 can be char, short, etc., which
   ;in turn expand to byte, word, dword, wideword
   ;specific to 16, 32 or 64-bit mode:
   ;;
    mov %1[_variable0],%2
 %endmacro
 _variable0 dd 0


;High-level access:
;;
 mov variable0,0

Which would be the equivalent to declare an int for 32-bit platforms or int32_t/uint32_t.

-----------------------------
iansjack wrote:Note that, if working in 64-bits, most ABIs use registers rather than the stack to pass parameters/return values.

This is one of the problems with trying to write assembler code that is portable to 32 and 64 bits. It seems to me that, in chasing the rather illusory goal of compatibility, you are ending up with an inefficient implementation.
You can make functions specific to an ABI. Those would assemble with that style in all modes, and then you can make other functions that don't follow any format. You can arrange your code elegantly to choose the best internal functions for each environment in which the program will be built for.
YouTube:
http://youtube.com/@AltComp126

My x86 emulator/kernel project and software tools/documentation:
http://master.dl.sourceforge.net/projec ... 7z?viasf=1
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: C/C++ Compiler

Post by Schol-R-LEA »

Should I ever look into this project again, I foresee nasal demons and bleeding eyes... seriously, you seem to be determined to break both ABI compatibility (for any and all OSes other than perhaps your own) and the C language standard at every turn.

At this point, you'd be better off designing a whole new language - and yes, I am fully aware of how large a project that would be - rather than folding, spindling, and mutilating the existing language on a whim. While you can make a dialect of C if you like - it isn't as if there's some sort of Parser Police who would hunt you down for it - if you then claim that it is still compliant C, then you are going to anger anyone else using your hybrid language expecting it to follow the standard language's rules.

As for ABI compliance, well, if you are bound and determined to ensure that your compiler will never interop with any existing libraries, ever, then you do you, I guess, but don't be surprised when it blows up in your face.

Of course, all of this is predicated on your success in compiler development, and given what you've done with that so far, I think everyone save you can see that this is several bridges too far.

Let me ask you again: have you read any of the books or tutorials (or watched any of the videos) we've recommended in the past, and are you applying any of what they say? Please, just give us some sort of answer on this, since as things stand, we don't know what you know, and don't know how to give you any more advice.

I do know that, from your own statements, you've read the old Crenshaw "Let's Build a Compiler" tutorial, yet so far you seem to be ignoring most of what it says, which puzzles me to no end. It speaks to something I've said before: if you write and act like a crackpot, and there is no evidence that you know or understand what you are saying, we can only conclude that you are a crackpot even if you aren't because that's what the evidence is pointing to.

That having been said, I have noticed that you haven't updated the ZIP file with your source code - and I'll address the matter of misusing Sourceforge shortly - which means that, even if you have made significant progress on your compiler (hopefully including fixing all of the problems I have previously mentioned) - we can't see any of that, so we can only base our opinions of your current progress on a single archive file from a year ago.

On that topic, you seem to have misunderstood the point of version control repositories such as Sourceforge. While it is possible - and distressingly common - for projects to upload a single archive there for quick download, the real intent of a site such as SF or Github is to serve as a host for your VCS repo. We've discussed this topic at length on this forum, as well as in the wiki, but let me repeat the point: IF YOU DON'T USE A VCS, EVENTUALLY YOU WILL LOSE YOUR WORK, or worse, be unable to perform a regression on a hidden bug and have to scrap a whole section of existing code. While there are other ways to solve such problems, version control software is designed to facilitate this, and using it should be a no-brainer.

Uploading a single archive file with everything in it doesn't count as version control. You need to use something like Subversion, Git, or Mercurial, and use it consistently. You need to ensure that only the source and resource files get included in the repo, not the object files and executables. You need to have the individual source files visible to anyone browsing your repo, so they can review it (and maybe contributed bug fixes and additional code, should someone want to bother). What you have now is a flat-out misuse of Sourceforge.

Take a look at some other people's code repos, both on SF and on other hosts, and see how they do things correctly. On Sourceforge, a good example of how to use it with SVN is FreeDOS, which I expect you are at least passingly familiar with. For GIT and Github, try Mezzano OS. You should be able to see right away how they differ from what you are doing. Note that both of these hosting platforms have services which automatically pack the latest code release into either a ZIP or a tarball, while at the same time allowing access to individual files through the version control systems.

Please, for your own sake, try emulating their approaches for this and any other projects you're working on.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
dseller
Member
Member
Posts: 84
Joined: Thu Jul 03, 2014 5:18 am
Location: The Netherlands
Contact:

Re: C/C++ Compiler

Post by dseller »

Any compiler that doesn't use a lexer and parser component, is bound to be ridiculously complex and redundant with code. I am really curious to see if this compiler will ever actually work, and produce a valid executable from any arbitrary piece of C code that it processes.

Also, this post:
Defining Typed Variables for Immediate Access in Assembler
I am absolutely 100% not understanding what those assembly snippets/macros have to do with writing a C compiler :| Unless of course it has no real backend and it would emit assembler, which would be a proper design decision from ~. At least that would restrict his scope somewhat.
Post Reply