Hi,
Kevin wrote:Brendan will disagree because he considers his OS too different for this to make sense, but I don't think you have to wait for your OS to be running before you can throw away your first compiler.
Yes. For my case:
- the languages are different
- the libraries are different
- the source file format is not "plain text"
- the output file format is very different to ELF object files (no object files, no linking)
- the tools aren't used the same way (automated file format conversion, no make, no link)
- the file IO is very different (versioning file system, all files are immutable)
- processes use a different conceptual model (use of thread specific storage and asynchronous communication as the basis for a "shared nothing" concurrency model)
- it's intended as a distributed architecture (e.g. local computer supervises the work done by other/remote computers, including load balancing and "start alternative slave/s on slave failure"; where that work may be a pre-compile phase on one remote computer that's fed to multiple back-ends, and the ability to generate N different output executables optimised for N different targets on M different remote computers in parallel)
Kevin wrote:The only reason why you even need a throw-away compiler in the first place is that you're designing a new language, so the first compiler can't be written in its own language. As soon as your preliminary compiler implements a subset of the language that is powerful enough, you can start writing the real compiler. Not running on your own OS, but e.g. Linux. At this point, you can start working on optimisations etc. without having to throw them away later.
Then you use that compiler to start developing your OS, and as soon as its mature enough, porting your compiler will be easy enough. All you really need to change is a few functions for accessing files and similar things.
If you're able to run the native compiler on Linux, then you should start by splitting your "tools + OS" project into 2 separate projects and then forget about the OS until after the "tools" project is finished (and possibly forget about the OS completely, given that there'd be at least 5 well established "work alike" OS projects you can join and/or fork instead).
Kevin wrote:Brendan wrote:If you must add fancy features, add features that help humans avoid mistakes and/or make it easier for people and compilers ensure that code is correct. Don't add features that make things more complex, or more ambiguous, or make it harder to ensure code is correct (e.g. templates/generics, operator overloading, etc).
How are generics a featuer that make it harder to ensure correctness? Aren't they generally used where you had an untyped pointer before, so introducing type safety, which is one of the most important measures to ensure correctness?
How would you implement a list, for example?
A list of how many items, where each item in the list contains what, and for which use cases?
For a random example, let's assume it's an extremely large list of "name and contact details" structures; where you expect to frequently use the name to search for contact details; but never really expect to use the contact details to search for the name.
The first thing I'd do is split this into 2 structures - one for "hot data" (used frequently for searches) and another for "cold data" (not used for searches). The "hot data" would consist of the name and a reference to the "cold data". I'd create a hash from the name; then I'd have one thread per CPU per NUMA domain per computer, and use "name_hash % number_of_threads" to determine which thread is responsible for which names. Each of these threads would have "name_hash / number_of_threads" sub-lists.
For cache locality (to minimise TLB misses and cache misses), the "hot data structures" for each thread all need to be close to each other. For this reason (regardless of how you store the sub-lists), you're going to need a special "hot data virtual memory pool". You do not want to do something completely retarded (like just use a generic "malloc()" that's shared by all sub-lists for all threads in a process, and completely destroys cache locality). To be more correct; I'd actually want to have a separate "hot data memory pool" for each individual sub-list for each thread (plus another memory pool for each thread for its "cold data").
For the sub-lists themselves; I'd probably just have singly linked lists of "hot data structures". This means having some sort of "next" field, some sort of "reference to cold data" field and the "name" field in that structure. Because the name field would be (UTF8) bytes (no alignment needed), and because I'd be trying to pack the most number of "hot data structures" into the least number of cache lines; I wouldn't use full (64-bit) pointer for the "next" field or for the "reference to cold data" field (e.g. 2*8 bytes plus up to 7 bytes for alignment). Instead I'd use some form of partial pointers.
For the "next" field; because we're using a separate "hot data memory pool" for each individual sub-list we only need an offset from the start of the virtual memory pool, and if we assume some sort of alignment we don't need to store "always zero due to alignment" bits of the offset. This means that we probably only need a 16-bit "(offset to next entry) >> 1" for the next field. This does limit it to 128 KiB per sub-list (or assuming an average of 16 bytes per entry, about 8000 entries); but that doesn't really matter - with a large enough hash size (and good enough hash function) the number of sub-lists will be large enough to avoid the need to store too many entries per sub-list anyway.
In a similar way; for the "reference to cold data" we only need an offset from the start of that memory pool, but we don't care much about cache locality for the cold data and could maybe align entries on 64-byte boundaries if we want. This means we can use something like a 32-bit "offset of cold data >> 6" as a reference to the cold data. That would mean that our cold data virtual memory pool (that is shared by all of one thread's sub-lists) would be limited to 256 GiB; which is more than we're ever likely to use (maybe some of those bits could be used for something else, like a "male or female" flag or something, or maybe we should align cold data entries on a 32-byte or 16-byte boundary instead). Even though this is a 32-bit field I wouldn't bother aligning the field itself on a 4-byte boundary, as it's only used to find the cold data (what doesn't happen frequently) - the "next" field would be aligned on a 2 byte boundary, and that's good enough alignment for the "32-bit cold data reference".
That means that both the "next" and "cold data reference" fields cost us 6 bytes plus one optional byte of padding (rather than the original "2*8 bytes plus up to 7 bytes for alignment"); which means a higher "entries per cache line" and less cache misses.
Now; how would you implement this same list? Would you use a generic "list" template that doesn't scale (either not thread safe and/or has lock contention), that uses a generic memory allocator and has extremely bad cache locality, that also uses 64-bit pointers just in case it doesn't quite suck badly enough already?
Cheers,
Brendan