Everybody, please excuse the upcoming negativity, but (not only) I have tried to talk to ~ before, and he remained unresponsive. At this point I consider his advertising his "magic compiler" to be bordering on spam.
We've always been giving OS projects some flak if they come up with beautiful utopias, to see if they have an idea about the necessary groundwork. I take the liberty of doing the same for this project (where even the "utopia" seems to be rather vaguely defined). If my reply crosses the line, I apologize in advance and will hold no grudge if mods take it down.
~ wrote:The plan of this project is writing a C/C++ compiler that can at least be compiled with MinGW or Open Watcom. We can only use the language that those compilers can understand in common.
MinGW is an environment, not a compiler...
The language all halfway-competent C++ compilers "understand in common" is C++. The same goes for C, although it should be noted that
several of the compilers you mention in the course of your post don't even come as far as C99.
Seeing how a compiler does need nothing special in ways of library support (basic file I/O is fully sufficient), plain old standard compliance
should be enough.
The resulting compiler needs to be able to understand old and new versions of the language, needs to be able to understand idioms from all versions of Visual C++, Turbo C++, Borland C++, Open Watcom, and GCC mainly.
We have talked about this before. I still get the impression that you are confusing and commingling several things when you talk of "idioms".
When a software written on one compiler doesn't compile on another, that is either 1) a (potentially dangerous) bug in the software, 2) an issue of the build system (which is quite separate from anything the compiler has influence on), 3) an issue with third-party headers / libraries, or 4) some explicit usage of compiler extensions.
I shudder to think of an attempt to bring all the extensions of Visual C++ and GCC into one compiler. Let alone the other three, which, to put it bluntly, are so outdated as to be irrelevant.
(TurboC++ --> BorlandC++ --> Embarcadero C++Builder, and
that is using the CLang compiler core by now. OpenWatcom has not even reached C++11 standards by now, and is rather unlikely to ever get there.)
The goal is that we can easily port software with a compiler that is capable of compiling recent software under old platforms like DOS and Windows 98.
Again, you are confusing things. "Recent software" will make use of recent OS features. If a software uses recent versions of, e.g., DirectX, or OpenGL, or any other API, you would have to
port those new APIs to these old platforms. A good many of these APIs are proprietary, so there is nothing for you to compile, let alone port. And you would need
drivers for recent hardware to run on those old platforms, unless you expect recent software using recent APIs to run on 20+ years
old hardware.
None of these issues are addressed by a compiler, no matter how advanced it might be. At which point I challenge you to demonstrate that you even know what you are talking about, to give an architectural overview on how you expect things to work out.
Not just words, you tried that often enough. Paint us a picture. Hardware, OS layers, drivers, APIs.
No fancy code should be present in the compiler even if it becomes fully capable of understanding such code. This is to make it possible to always be able to compile it directly in old platforms and in exchange have a modern and capable compiler.
Do you know what a "self-hosted" compiler is?
It is a compiler that can compile itself. That is one of the first major milestones in a compiler's development. Once you have achieved that,
you are no longer depending on what other compilers can, or cannot, do.
A new platform B gets supported by adding the necessary logic (e.g. binary formats) to the compiler backend, then compiling the compiler for platform B on platform A, getting a binary for platform B that is then capable of compiling
on B. This is called "bootstrapping".
Again, you seem to be ignorant of some very basic concepts, which casts doubt on whether you should be embarking on this project, let alone ask for contribution, at this point in time.
So far the code is only able to open a file and record the start/end offsets of each line, and also to count the number of lines present in the given source file.
Which is about the level of sophistication displayed by the
"Quickstart 1 - A word counter" tutorial example of Boost.Spirit / Lex. Only that you have so far ignored the existence of something like lexer / parser generators, despite multiple hints and pointers in that direction.
The code should also compile under Linux.
Believe me, that is the
least of your problems.
Currently what I need to do next is to parse the source code to detect comments, strings and end-of-instruction/end-of-parameter characters like ; and , ...
Everyone, note how ~ is eschewing several decades of experience in how compilers are usually built, and is trying to not only reinvent the wheel, but reinventing the concept of roundness in the process. He is actually going at this bottom-up. No syntax tree, no lexer / parser...
(Note, at this point, that rather experienced people have attempted to get full C++ compliance with such generated parsers, and -- after
years of learning -- have found that it cannot really be done, that such generated parsers
still need manual tweaking. Note that this is manual work added
on top of generated parsers...)
I have, for some time, dabbled in maintaining AStyle, a C/C++/Java reformatter. It does (or did, back then) the "state machine" parsing you seem to be aiming for. It wasn't pretty, it was fragile as all hell, and next to unmaintainable. (I tip my hat to Jim Pattee, who took over, and had more success in improving things than I had.) You
don't want to walk
that road for a "real" compiler. Trust me. I've been there.
(It works, somewhat, as long as you are only going for source that is halfway-decent to begin with. Unfortunately C, and
especially C++, allow some things that will make your toenails curl, but which are still 100% legit and
must be parsed correctly.)
You can help me by telling me how to implement things...
I have pointed you to the
Dragon Book before, to
GNU Bison (C) resp.
Boost.Spirit (C++). I have also pointed out that you
will fail at playing catch-up with C++ single-handedly. I have so far refrained from pointing you to various style guides, seeing how your function names have reached a rather extensive state of illegibility before your source even starts
doing things.