Hi,
MessiahAndrw wrote:Brendan wrote:I've been thinking similar things for a while now. Mostly; there are benefits from structured editors (even with "plain text" as the source code file format), and there are benefits to using a binary source code file format (even with a normal/non-structured editor); but by combining both (a structured editor with a binary source code file format) it's more than just the sum of the parts - you get the flexibility to add many more benefits.
I'm tossing between a binary or text format for storage.
A binary format is obviously smaller and you can compress the data. Or we could store the source code as text, which doesn't have to look anything like what the user sees in the editor, only because it would be nice to keep it compatible with existing revision control software - merging two people's changes to a single file, for example.
Most of the data is naturally a hierarchical tree, and requires "conversions" to store as a sequence of bytes (a file). The main problem with text is that these conversions (between "hierarchical tree" and "sequence of bytes") are very slow.
For example, I stored my AST (in memory) using a generic structure for each node, sort of like this:
Code: Select all
typedef struct AST_node {
uint32_t type;
struct AST_node *parentNode;
struct AST_node *firstChildNode;
struct AST_node *nextSiblingNode;
int dataLength;
uint8_t *data;
} AST_NODE;
To serialise it, each node got stored as a "length, type, data" record, and to rebuild the tree structure I added 2 flags to the type field - one for "this node is first child" and one for "this node is last child". This meant I could have generic conversion code (that doesn't care what any node is) that's fast.
The other problem is that for text you can't index it in a sane way. In memory; I had a "list of types" (with a pointer to the top AST node for the type's definition), and a "list of symbols" (with a pointer to the top AST node for the symbol's definition). In the file I had the same, except the pointers were "offset in file for top AST node" instead. This means that if you want to find the function "foo" in the file you search the symbol table and find the "offset in file for function foo's top AST node" and can just decode the nodes for "function foo" alone (without loading most of the file from disk and without parsing all of the data for all of the nodes).
MessiahAndrw wrote:Some kind of text serialization would be nice for other reasons - pasting code onto this forum or paste bin.
Yes; but that can be done in multiple different ways (e.g. having an "export/import as text" feature in the IDE, having a separate stand-alone utility to convert between file formats and/or extract specific pieces by name, etc). Note that if you do use "plain text" it won't be useful for cut & paste anyway, because it will be littered with things people don't want to see. For example, instead of seeing this in the text file:
Code: Select all
int foo (int y) {
if(y == 0) return 1; // Just in case
return y/2;
}
The text file might contain this:
Code: Select all
<function><hint_inlined><hint_no_side_effects>int foo (int y)<new_child>if<hint_likely_not_taken><new_child><condition>(y == 0)<last_child><new_child><statement>return 1<comment>Just in case<last_child><last_child><new_child><statement>return y/2<last_child><last_child>
MessiahAndrw wrote:The precheck is a great idea. I don't want to prevent type errors during editing (it would make it difficult to write code if your code always had to be in a compilable state), but the editor could certainly underline type errors during type-time.
For me, an error was mostly the same as a comment (e.g. just a string of "anything" characters) except it had a different value for the node's "type". Of course it makes sense to index the errors so you can quickly find the locations of all errors; and so that (e.g.) when the user defines a new symbol (function name, variable name) you search the list of errors and determine if any of them aren't errors any more. That also means the compiler can quickly generate a list of errors without loading or decoding most of the file.
MessiahAndrw wrote:I think the pre-optimisation stuff is interesting. My main motivation isn't faster build times, but if this stuff is made simple to implement, I'll take it.
Yes, I want to get rid of tools like make. You should be able to just open a project in the IDE and click 'build', with an equivalent that can be automated from the command line. You should be able to add dependencies easily in the IDE (either another project in source form, or the built binary library), and when you build a project it makes sure that the dependencies are built and up to date first.
I want to stream-line everything. Instead of wasting time learning/writing/maintaining makefiles, having a directory of separate source files and separate object files, having a separate linker, and having slow compile times; I want "single file contains source for single executable" with very fast compile times.
MessiahAndrw wrote:I love the idea of live coding. If we are able to execute the same AST that the editor manipulates, why can't we modify it while it's running in debug/interpret mode? Image a game loop and being able to adjust the physics constants as it's running to see what feels right. (Interpreting the AST is in the editor only, the language will be compiled when you actually build it.)
There's no reason you can't have a virtual machine that executed/interprets the AST that does allow live coding (in addition to a compiler that generates fast code that can't support live coding).
Cheers,
Brendan