Hi,
mreiland wrote:Hey Brendan, I'm actually really interested in that utility, is there a chance I could get a look at the old version to give me an idea of how I would go about automating things? Obviously my needs are much simpler than yours, but it would be nice to see the specifics of how someone else solved that problem.
The basic idea is relatively simple, but first let me explain something about my project's source code.
Once upon a time computers didn't have much memory (e.g. 64 KiB of RAM, or less). Back then, you simply couldn't fit the compiler and all the data it needs in memory at the same time. To get around that people designed toolchains so that large projects could be broken up into smaller pieces and compiled separately, then all the separate pieces could be combined. For example, you might have 20 different C source files, compile them into object files separately, then link all the object files together.
Ironically, this ancient practice isn't very smart on modern machines (where there's plenty of RAM). Not only are you starting up the compiler 20 times instead of once, you're doing a lot more file IO (storing the object files, then loading them all again). In theory you get to avoid recompiling individual source files that didn't change (unless you change a header file or something and all object files need to be recompiled), but in practice (even if only one source file changed) this benefit is negligible. It also means that the compiler can't optimise properly because it can only see a small part of the whole program at a time.
So, I don't do that. For each binary I have a file (called "0index.asm" or "0index.c") that includes all the other source files; and the compiler and linker (or the assembler) does the entire binary in one go. Of course this also means it's easy for a utility to look at the first source file ("0index.asm" or "0index.c") and find all of the dependencies...
The first thing the build utility does is recursively scan the project's directory looking for "important" files, like "0index.asm" or "0index.c". For each important file it finds it creates a "thread data structure" and spawns a thread. Each thread opens its initial file and parses the file's header and searches for anything it "#includes". To replace something like "make" you have to know what command/s to use to compile or assemble the source code, and what the resulting output file/s will be called. That's why each of the "0index.asm" and "0index.c" files need a small header.
Once each thread knows what its dependencies and output files are, it does a few sanity checks - determine if other threads want to create the same output binaries, determine if there's any circular dependencies, etc. While doing this a thread determines what other threads it might need to wait for.
Then each thread looks at file modification times. The idea here is to determine which output files definitely need to be updated and which output files might not need to be updated. This is simple - if the latest source file modification time is later than the earliest output file modification time, then the thread definitely does need to rebuild its output file/s. In this case the thread sets a flag in its "thread data structure" so other threads can see that.
Next, if a thread might not need to update its output file/s, it checks any other threads that it depends on to determine if any of them will be updated (this doesn't need to happen when a thread already know their output file/s definitely need to be updated). After this a thread knows if it has to do something or not; and if it doesn't have to update it's output file/s it simply stops.
After this we're left with threads that do have to update their output files. The first thing a a thread needs to do is wait for any other threads that it depends on to complete. After that, a thread can start the compiler or assembler to generate the output files.
Of course there's a bunch of little details I've skipped. For example, you need to redirect any child process' "stdout" and "stderr" to the build utility, and buffer any output from each thread so it can be displayed in a sane order.
Once you've got this working it's easy to extend. The first thing you'd want to look at is include files and headers. When you're scanning the project's directory you can remember any sub-directories called "include"; and use this information (combined with the list of dependencies) to automatically generate the include file path. For example, if "/foo/bar/0index.asm" contains the line "%include "thing.inc""; then you can search for "/foo/bar/thing.inc", then "/foo/inc/thing.inc", then "/inc/thing.inc" and add the correct directory to the include path that you pass to the assembler.
The next step is to add support for "scripts" to the build utility. The problem with scripts is you can't easily determine which files they depend on, so I just add a "dependency list" to the script's header. You've already got the ability to execute commands (to run compilers and assemblers) and that's all a script is (just run a list of commands one at a time). You've also already got the automatic dependency resolution stuff. For example, if a script generates the file "/foo/bar/hello.asm" and this file is included by "foo/bar/0index.asm", then the thread responsible for "foo/bar/0index.asm" know that it has to wait for the script's thread to complete before it assembles "foo/bar/0index.asm".
Then you start looking at backups. It's easy to get the build utility to "tar and gzip" the entire project's directory, but you can be smarter than that. You already know which files will be generated by compilers, assemblers and scripts; and there's no point including any of those files in the backup, so you can tell "tar" to exclude them. Because none of the output files will be included in the backup, you can start the "tar and gzip" process as soon as the other threads have parsed headers; and you don't have to wait until the other threads have finished updating their output binaries. Of course the problem with generating backups like this is that you'll fill up your hard drive with millions of backups (on a good day I'll press F12 about 50 times, and 50 backups per day starts to add up over a few years). However, it's easy enough to start the "tar and gzip" process, then (while it's running) do some maintenance of you "backup" directory and delete unnecessary backups. I give the backups a file name based on the time and date, and then keep:
- all backups from the current day and the previous day, plus
- one backup per day for the most recent 2 weeks, plus
- one backup per week for the most recent 2 months, plus
- one backup per month
Anything other backups get discarded.
The next thing to think about is documentation. If you're like me you'll end up with a bunch of utilities in C (to create floppy images, create CD images, convert font data into a different format, compress files in a way your boot code likes, etc); and eventually you'll want to provide documentation on how to install/use the OS for end users. Then there's specifications - I end up creating lots of specifications for different file formats, different APIs, etc; and they're not too different from documentation. I really don't like existing tools (like LaTeX, docbook, etc), so I add that to my build utility too.
Basically, when the build utility scans the project directory looking for "special" files; it also notices anything called "*.txt" in a "doc" or "spec" directory; and for each text file it finds it starts a thread to process them. These threads convert the plain text source files into fancy HTML web pages (for example, here's the [rul=http://bcos.hopto.org/www2/doc/util/build.html]documentation for the previous version of the build utility[/url], which was processed by the previous version of the build utility).
Now, one of the things that annoys me with most normal utilities is that you can move/rename or delete files and you'll end up with old output files. For example, let's say you've got a file "/doc/foo.txt" which gets converted into "/www/doc/foo.html", and you rename the source file to "/doc/bar.txt". Now you get a new "/www/doc/bar.html" file, but you've also got an obsolete "/www/doc/foo.html" that's left laying around. Next feature for the build utility is a "cleaner" thread - while everything else is happening, find all these obsolete/orphaned files and remove them!
Now let's talk about source code. If your build utility is already generating fancy HTML documentation and specifications, why not make it parse assembly and C (and scripts) too and generate fancy web pages for them too? There's plenty of utilities that do this (e.g. doxygen), but they're a pain in the neck (maintenance) and I don't like the way the resulting web pages look. Also, an "all in one" utility can be much more powerful than any of these stand-alone utilities. For example, I can have a comment like "
// For more information see [s:foo, bar]." in some source code, and the utility will automatically find the section labelled "bar" in the specification "foo.txt" and generate a HTML link that looks like "
// For more information see Section 1.2.3 The Thingy, in Some Specification" (which is easy to do because the utility already parses specifications - one thread just asks the other thread for the information), and if I change anything in the specification the web page for the source code will be updated to reflect the changes. To make this more fun I also added titles and subtitles to the headers (e.g. the header in an "0index.asm" might say that the title is "80x86 Floppy Boot Loader").
Finally, there's navigating through the project. A bunch of disorganised web pages is just a mess. To fix that I have "association files" which organise the web pages into a hierarchical structure. Any file called "index.txt" is converted into a fancy HTML page with HTML links to its parent and all it's children (but it's very similar to the way documentation and specifications are parsed and converted to fancy HTML). On top of that, no web site is complete without a site map - I have a thread that auto-generates
a project map from all titles of other pages (that follows the "hierarchical tree" navigation system).
That's about it for the previous version of my build utility. For the new version a lot more features are planned - performance improvements (the thing already has it's own file cache), source code reformatting, a project-wide glossary, better HTML pages (CSS + HTML4 rather than plain HTML3), better support for C source code, etc.
Cheers,
Brendan