Page 2 of 3

Re: Workflow questions

Posted: Mon Nov 28, 2011 6:09 pm
by Rusky
First of all, my benchmark was poorly representative. Revising it to behave more like what Brendan described, it comes down to this:

Baseline:

Code: Select all

$ time g++ -o test *.cc
real	0m1.343s
user	0m1.136s
sys	0m0.176s
Brendan-style:

Code: Select all

$ time cat *.cc | g++ -o test -x c++ -
real	0m0.759s
user	0m0.676s
sys	0m0.072s
Makefile:

Code: Select all

$ make clean
$ time make -j6
real	0m0.735s
user	0m1.720s
sys	0m0.156s
$ time make -j6
real	0m0.014s
user	0m0.000s
sys	0m0.008s
$ touch file.cc printer.cc parser.cc
$ time make -j6
real	0m0.546s
user	0m0.704s
sys	0m0.120s
Depending on the run, Brendan-style and Makefile-from-scratch are generally the same, though Makefile-from-scratch is often a little faster (I'd need a bigger project to find anything statistically significant). However, for anything else, i.e. rebuilding half the project (to simulate a touched header file) Makefile-style is significantly faster- 30% or more.
Brendan wrote:Now do 50 of those at the same time ("When your build utility is running 150 threads and up to 50 external processes, "extra parallelism" isn't going to gain you anything."); and touch a header file that is used by everything.
As you can see, parallelism gains you a lot, especially when you've touched a major header file. Pick an appropriate number of threads for your dev machine, give make a -jN, and you win. 3 characters is a lot less than a system to manage build threads yourself in C, for a better system in the end.
Brendan wrote:If all things are built in similar ways (e.g. with GCC using the same arguments), then a well-written Makefile wouldn't take much maintaining. If your project consists of lots of things that are created in lots of different ways then it becomes a mess.
If your project consists of lots of things that are created in lots of different ways, how is a custom build tool going to be any less of a mess? In a Makefile on the other hand, you just add file or directory-specific rules. I don't see it getting any simpler than that.
Brendan wrote:
Rusky wrote:Is it somehow more of a waste to write a simple Makefile than it is to maintain a custom make replacement written in C? Is it somehow more of a waste to use existing, well-tested tools that store, with virtually no effort on your part, much more of your project's history in a better format?
I lot of people make the mistake of thinking my build utility is just a make replacement. The reality is that "make replacement" is only a small amount of the build utility's code. If you combined make, docbook and doxygen you'd be a lot closer (but still be lacking support for a lot of things).
Other than replacing make (with harmful requirements imposed on project/build structure), implementing in C what would be more easily done with a few make targets and a real version control system, and duplicating various documentation and parsing utilities (which I don't see your problem with), what does it do? It reorders error messages, which is nice if you ever get a million of them at once; it guesses include directories, which is nice if you ever have more than one or two; it implements an ad-hoc version of make's dependency system just for scripts, which is nice if you don't have a more general one that can be automated. Oh, and it's all implemented in multithreaded C. Awesome.
Brendan wrote:Obfuscating code by hiding common algorithms in libraries is something I don't do.
You call libraries of common code obfuscation, I call their absence harmful duplication of code. The Linux kernel, for example, has generic code like allocators, data structures, atomics, etc. in a single location. Kernel modules don't re-implement any of that, and if they did, maintenance would be impossible.
Brendan wrote:
JackScott wrote:I believe (Solar may correct me here) but context basically means which component of the operating system. Say if I fix a bug in my keyboard handler, and then Ctrl-C process termination no longer works, I want to be able to take a guess that it's in the keyboard handler and not process control code. Revision control means I can see what code has changed in both components since it was last working, and thus figure out where to focus the bug-fixing efforts.
Ok - that makes sense.

I test often, and I don't have Alzheimer’s.
Alzheimer's is not a prerequisite for forgetting exactly why some line is the way it is. While good testing and commenting are great, version control can tell you exactly which lines changed from what, to what, and for which reasons, even when you don't have a test case or perfect memory for whatever it is you changed last month. Version control also makes what information you could gather anyway far more accessible than searching through archives (full text search through the whole project's history? done.)
Brendan wrote:
Solar wrote:
  • easy revert to last checked-in version;
Occasionally (if I'm not sure what I'm doing) I might make a copy a file before doing something so that I can switch back to the original if I screw things up. Usually this is for refactoring the code though - so I can have the old/original in one window and use it as a reference while modifying/refactoring.
Version control is a better way to make those copies for several reasons. They're presented in context with the other files that they actually worked with and a log message, they don't clutter up your working directory (or get duplicated into backups unnecessarily), they can stay around forever because there's no mental overhead, and they can be stored in separate branches for when you're experimenting with different implementations.
Brendan wrote: A decision chart (for "single developer" projects only):
  • Do you want to create a branch?
    • No:
      • Ok then.
    • Yes:
      • Will you want to merge later?
        • No:
          • Then just create a copy of the project's directory.
        • Yes (or maybe):
          • Then you need to spend more time deciding what the future of your project should be instead.
:)
That last point is utter crap. Especially with the ease of branching and merging you get from distributed version control, there's absolutely no reason not to make a new branch for each and every feature. It has nothing to do with indecision about the future of the project, and everything to do with avoiding lots of ad-hoc backup archives, files, etc. With tools like gitk, you can even visually browse your project's history, and branches make that much more clear.
Brendan wrote:Can I "drag and drop" the repo into my FTP server's directory if I felt like letting someone download the entire project?
[...]
With git, you can just drag the entire directory; but the person who might want to download it walks away without downloading anything because it's too much hassle.
Most developers *ahem* already use version control, so they don't need an extra tool. However, version control does not preclude a "make dist" target that builds an archive of the project. The problem is that those archives are not a good enough substitute for real version control.
Brendan wrote:My internet connection is too slow for that - press F12, wait for 5 minutes while it uploads.
[...]
When I press F12 everything completes in less than half a second and I'm trying to reduce it further. Yesterday I pressed "F12" about 100 times. Those ten seconds (which doesn't even count creating the diff in the first place) would've added up to about 16 minutes throughout the day. Screw that. ;)
Version control does not mean F12 has to upload anything. Distributed version control like Git commits locally, so you could just do a "git commit -am 'auto-generated build id'" each build and have exactly what you do now with archives, but better. :)

Re: Workflow questions

Posted: Mon Nov 28, 2011 9:31 pm
by Brendan
Hi,
Rusky wrote:
Brendan wrote:Now do 50 of those at the same time ("When your build utility is running 150 threads and up to 50 external processes, "extra parallelism" isn't going to gain you anything."); and touch a header file that is used by everything.
As you can see, parallelism gains you a lot, especially when you've touched a major header file. Pick an appropriate number of threads for your dev machine, give make a -jN, and you win. 3 characters is a lot less than a system to manage build threads yourself in C, for a better system in the end.
If all CPUs are idle, then "make -j" will help. If all CPUs are busy doing other things then "make -j" won't help at all (and will probably just make things worse). I'm doing over lots of things at the same time, you can assume all my CPUs will be busy, "make -j" won't help, and your benchmarks aren't representative because they continue to ignore this.

When I press F12, what matters the most is the worst case time (everything rebuilt, including the entire web site). If the worst case time is acceptable (doesn't cause more than a few seconds of distraction) then faster (less things being rebuilt) will be fine too. My OS project is very modular. Let's do some maths...

I'll have around 50 threads purely for generating HTML pages (for documentation, specifications, etc) plus another few threads for housekeeping (backup, directory cleaning, generating a site map, etc). The backup thread starts an external process to do the "tar/gzip" while it cleans up the "/backup" directory. That's 53 threads and 1 external process to start with.

For code, there's around 8 boot loaders ("first stage" boot), 4 boot modules ("second stage" boot) and 3 kernel setup modules ("third stage" boot) - all of these are separate binaries. Then the micro-kernel is modular (e.g. separate modules/binaries for physical memory manager, virtual memory manager, scheduler, etc). With 8 separate binaries for 32-bit 80x86, 8 separate modules for 64-bit 80x86, and one extra "with PAE" 32-bit virtual memory manager module you end up with 17 more binaries (and that's under-estimated). That's a total of 29 separate binaries (8+4+8+8+1) to build (one external process each), plus 29 threads to convert source code into HTML. That takes it the total up to 82 threads and 30 external processes.

In addition to that there's various utilities - things to convert data from one file type to another, utilities to create floppy and CD images, etc. Let's say there's 8 of these utilities that need to be compiled (not including the build utility itself), plus 9 threads to convert their source code into HTML (including the build utility). Another 9 threads and another 8 external process brings the total to 91 threads and 38 external processes.

Then there's some scripts to do various things (running those utilities we built earlier, creating tidy little "tar/gzips" of end-user documentation and OS installation files for the general public, etc). 20 of these scripts means 20 more threads to convert the scripts into HTML, plus 20 more processes to execute the script (sort of). Another 20 threads and another 20 external process brings the total to 111 threads and 58 external processes.

Of course I only scratched the surface for the OS's code. Let's allow for 100 device drivers, 20 "services" (VFS, file system, network stack, font engine, etc) and 20 applications. That's going to be another 140 threads and another 140 external processes.

The final total is around 251 threads and 198 external processes. They aren't all happening in parallel (some things can't start until other things have completed) but I've done a lot of work to make sure as much as possible can happen in parallel (mostly it's only the later things that have to wait, after all the binaries and utilities have been done and I'm combining separate binaries into a single "boot image" or something).

Now, what would happen if each "*.c" and each "*.asm" where compiled/assembled separately? Let's assume most of them are around 30 files (definitely conservative once you look at things like GUIs). That means instead of having 169 processes for compiling/assembling we'd be looking at 5070 processes (the totals would be 251 threads and 5099 processes). I've only got 16 CPUs. Can you see how "doing more in parallel" might not help?
Rusky wrote:
Brendan wrote:If all things are built in similar ways (e.g. with GCC using the same arguments), then a well-written Makefile wouldn't take much maintaining. If your project consists of lots of things that are created in lots of different ways then it becomes a mess.
If your project consists of lots of things that are created in lots of different ways, how is a custom build tool going to be any less of a mess? In a Makefile on the other hand, you just add file or directory-specific rules. I don't see it getting any simpler than that.
Show me a makefile rule to handle CSS files.

When a "source" CSS file changes, the makefile has to make sure that the main web site is updated correctly, and also the end-user manual's "tar.gz" (which is just a copy of the HTML pages for OS installation, etc intended for easy off-line viewing).

WARNING: This might be harder than it sounds at first (if you don't know why, do some research into the way web browser's cache CSS files)...
Rusky wrote:
Brendan wrote:
Rusky wrote:Is it somehow more of a waste to write a simple Makefile than it is to maintain a custom make replacement written in C? Is it somehow more of a waste to use existing, well-tested tools that store, with virtually no effort on your part, much more of your project's history in a better format?
I lot of people make the mistake of thinking my build utility is just a make replacement. The reality is that "make replacement" is only a small amount of the build utility's code. If you combined make, docbook and doxygen you'd be a lot closer (but still be lacking support for a lot of things).
Other than replacing make (with harmful requirements imposed on project/build structure), implementing in C what would be more easily done with a few make targets and a real version control system, and duplicating various documentation and parsing utilities (which I don't see your problem with), what does it do? It reorders error messages, which is nice if you ever get a million of them at once; it guesses include directories, which is nice if you ever have more than one or two; it implements an ad-hoc version of make's dependency system just for scripts, which is nice if you don't have a more general one that can be automated. Oh, and it's all implemented in multithreaded C. Awesome.
What you call "harmful requirements" I consider beneficial restrictions. I designed it for me and my project, not as a general tool for other people or other projects.

I honestly feel sorry for you if you don't understand how much time it saves and how much better the results are. If you're happy with build times that are an order of magnitude slower, with web pages that get screwed up when you change something as simple as a document title, with spending half an hour each week messing with makefiles and docbook and doxygen (and getting crappy results because there's integration between them) and still having to manually maintain things like a site map; then there's probably little hope for you.

The funny thing is that I was a bit like you once. I started with makefiles and "manually maintained" web pages full of broken links. The first version of the build utility was only intended to make generating parts of the web site easier. It grew. Going back to coping with many different tools that all try to do specific jobs (and inevitably fail to varying degrees) seems like torture to me now.
Rusky wrote:
Brendan wrote:Obfuscating code by hiding common algorithms in libraries is something I don't do.
You call libraries of common code obfuscation, I call their absence harmful duplication of code. The Linux kernel, for example, has generic code like allocators, data structures, atomics, etc. in a single location. Kernel modules don't re-implement any of that, and if they did, maintenance would be impossible.
If it can't be done with a macro in a header file, then it belongs in a formal specification for a standardised protocol/API/interface. Linux people seem to frown on standards though (the evil "third-party binaries" might attack and eat their sandwiches).

Rusky wrote:
Brendan wrote:
JackScott wrote:I believe (Solar may correct me here) but context basically means which component of the operating system. Say if I fix a bug in my keyboard handler, and then Ctrl-C process termination no longer works, I want to be able to take a guess that it's in the keyboard handler and not process control code. Revision control means I can see what code has changed in both components since it was last working, and thus figure out where to focus the bug-fixing efforts.
Ok - that makes sense.

I test often, and I don't have Alzheimer’s.
Alzheimer's is not a prerequisite for forgetting exactly why some line is the way it is. While good testing and commenting are great, version control can tell you exactly which lines changed from what, to what, and for which reasons, even when you don't have a test case or perfect memory for whatever it is you changed last month. Version control also makes what information you could gather anyway far more accessible than searching through archives (full text search through the whole project's history? done.)
What matters to me is how it is now and how it should be. I've never really needed to care how it was in the past. Maybe "excess modularity" has helped me avoid regressions? I don't know.

Rusky wrote:
Brendan wrote:
Solar wrote:
  • easy revert to last checked-in version;
Occasionally (if I'm not sure what I'm doing) I might make a copy a file before doing something so that I can switch back to the original if I screw things up. Usually this is for refactoring the code though - so I can have the old/original in one window and use it as a reference while modifying/refactoring.
Version control is a better way to make those copies for several reasons. They're presented in context with the other files that they actually worked with and a log message, they don't clutter up your working directory (or get duplicated into backups unnecessarily), they can stay around forever because there's no mental overhead, and they can be stored in separate branches for when you're experimenting with different implementations.
Just how how many months does it take you to refactor 100 lines of code? I'm guessing I'd have the job done before you've finished playing with your version control system.
Rusky wrote:
Brendan wrote: A decision chart (for "single developer" projects only):
  • Do you want to create a branch?
    • No:
      • Ok then.
    • Yes:
      • Will you want to merge later?
        • No:
          • Then just create a copy of the project's directory.
        • Yes (or maybe):
          • Then you need to spend more time deciding what the future of your project should be instead.
:)
That last point is utter crap. Especially with the ease of branching and merging you get from distributed version control, there's absolutely no reason not to make a new branch for each and every feature. It has nothing to do with indecision about the future of the project, and everything to do with avoiding lots of ad-hoc backup archives, files, etc. With tools like gitk, you can even visually browse your project's history, and branches make that much more clear.
If you know (without any doubt) that the new feature is going to be part of the final product (or the next release of the final product), then why do you need to keep the old code in a separate branch (unless it's not a single-developer project)?

Note: I don't have "lots of ad-hoc" backup archives. I have one directory containing a nice orderly collection of them (where extraneous backups are auto-deleted). In the last 5 years or so I've looked at this directory once to give someone else a tarball of the entire project, occasionally copied the latest backup to a USB stick or CD or another computer (I probably should do that more often and store a copy off-site), and sometimes I just like to look at it to see that it's all still working like it should. I have never used it to revert to a previous version of my project, or to find out what I changed when, or anything else like that. The backup is purely for the purpose of backing it up, and that's how it's used.
Brendan wrote:Can I "drag and drop" the repo into my FTP server's directory if I felt like letting someone download the entire project?
[...]
With git, you can just drag the entire directory; but the person who might want to download it walks away without downloading anything because it's too much hassle.
Most developers *ahem* already use version control, so they don't need an extra tool. However, version control does not preclude a "make dist" target that builds an archive of the project. The problem is that those archives are not a good enough substitute for real version control.[/quote]

I agree - archives aren't a good substitute for version control, in the same way that text editors are a bad substitute for bitmap graphics editors, or a car is a bad substitute for a boat. I dare you to add SVN and GIT to this wikipedia page.
Rusky wrote:
Brendan wrote:My internet connection is too slow for that - press F12, wait for 5 minutes while it uploads.
[...]
When I press F12 everything completes in less than half a second and I'm trying to reduce it further. Yesterday I pressed "F12" about 100 times. Those ten seconds (which doesn't even count creating the diff in the first place) would've added up to about 16 minutes throughout the day. Screw that. ;)
Version control does not mean F12 has to upload anything. Distributed version control like Git commits locally, so you could just do a "git commit -am 'auto-generated build id'" each build and have exactly what you do now with archives, but better. :)
If I install Git, continue pressing F12 and never bother to waste my time doing "git commit -am 'auto-generated build id'" for the nice fuzzy feeling I've never needed; would you be "happy enough" and stop trying to convince me to adopt whatever you think you need for your project?


Cheers,

Brendan

Re: Workflow questions

Posted: Tue Nov 29, 2011 2:23 am
by Solar
Brendan wrote:If all things are built in similar ways (e.g. with GCC using the same arguments), then a well-written Makefile wouldn't take much maintaining. If your project consists of lots of things that are created in lots of different ways then it becomes a mess.
Just you wait until I got the new tutorial done. You'll be surprised. ;-)
Brendan wrote:Not sure why I'd want multiple development machines - they're all connected to the same KVM anyway.
I have the PDCLib sources on my laptop at home, on the netbook I take with me whenever the laptop is too cumbersome or I might not have power (since the batteries on my laptop died long ago), and I have them on my office machine to show off, and because I tend to prefer my own docs over "man something". All those copies are kept in sync via SVN.
My internet connection is too slow for that - press F12, wait for 5 minutes while it uploads.
Modern VCS only transmits the changes. That shouldn't take longer than sending an EMail...
Brendan wrote:Can I "drag and drop" the repo into my FTP server's directory if I felt like letting someone download the entire project?
The entire project, or the entire project including history?

The answer is, of course, "it depends", on the VCS you are using. Personally, I host my stuff at LCube, which provides me with a nice web frontend for installing as many repos and Trac frontends as I want. Trac provides the user with a browsable repo frontend. Note the "Zip Archive" link at the bottom. Automatically updated whenever I "commit" on any of my machines. I could link to my PUT() macro and ask someone for his opinion. I could also link to the solution to bug #47 which touched on two different files.

The list goes on.

Not having a VCS is your decision, of course, and from what I see about your constraints, I might even agree that it "works" for you.

But generally speaking, not using VCS is like walking on crutches. You get where you want to go, and you might even be fast, but you'll have blisters on your hands by the end of the day... 8)

Re: Workflow questions

Posted: Tue Nov 29, 2011 3:33 am
by rdos
Brendan wrote:A decision chart (for "single developer" projects only):
  • Do you want to create a branch?
    • No:
      • Ok then.
    • Yes:
      • Will you want to merge later?
        • No:
          • Then just create a copy of the project's directory.
        • Yes (or maybe):
          • Then you need to spend more time deciding what the future of your project should be instead.
:)
Mostly true, but I once did a branch and join. This was when I updated my scheduler with SMP-support. That operation wouldn't have been successful without version control, because I had to do it multiple times, and had to revert multiple times because of bugs on single-core processors that I was unable to solve. Eventually, I could make it work, and then I needed to make a join in order not to have to manually do changes that were done previously, but part of a non-working SMP. Maybe you will never need to do anything like this, and it is a little extreme, but there are situations when version control is useful on single-developper projects. :mrgreen:

Re: Workflow questions

Posted: Tue Nov 29, 2011 3:42 am
by SDS
Brendan wrote: Show me a makefile rule to handle CSS files.

When a "source" CSS file changes, the makefile has to make sure that the main web site is updated correctly, and also the end-user manual's "tar.gz" (which is just a copy of the HTML pages for OS installation, etc intended for easy off-line viewing).

WARNING: This might be harder than it sounds at first (if you don't know why, do some research into the way web browser's cache CSS files)...
I must admit that I am not familiar with CSS and its intricacies. I do, however, have a substantial makefile system which produces a variety of different formats of documentation using sphynx.

The elements required for constructing a functioning makefile are:
  • A deterministic dependency structure
  • A consistent set of rules for dealing with files (i.e. the same file is not magically treated differently in different compiles for no reason).
If these are not fulfilled, then I am not sure that the build system is within my comprehension.
Brendan wrote:What you call "harmful requirements" I consider beneficial restrictions. I designed it for me and my project, not as a general tool for other people or other projects.
This is perfectly reasonable, even if it isn't the choice I would make.
Brendan wrote:I honestly feel sorry for you if you don't understand how much time it saves and how much better the results are. If you're happy with build times that are an order of magnitude slower, with web pages that get screwed up when you change something as simple as a document title, with spending half an hour each week messing with makefiles and docbook and doxygen (and getting crappy results because there's integration between them) and still having to manually maintain things like a site map; then there's probably little hope for you.

The funny thing is that I was a bit like you once. I started with makefiles and "manually maintained" web pages full of broken links. The first version of the build utility was only intended to make generating parts of the web site easier. It grew. Going back to coping with many different tools that all try to do specific jobs (and inevitably fail to varying degrees) seems like torture to me now.
To be honest, if I was spending half an hour each week tinkering with makefiles, docbook and doxygen, I would have ditched those tools. If I had to manually maintain complicated websites rathre than automatically generating them, I would also do the same.

I think it is a little presumptuous of you to assume that because you are unable to get make to work effectively in a reasonably complicated system, then everyone else struggles with the same issues. I haven't touched our make file in a year (except to add some funky python-script preprocessing of certain files, but I would have to add that logic to any build system...)
Brendan wrote:Just how how many months does it take you to refactor 100 lines of code? I'm guessing I'd have the job done before you've finished playing with your version control system.
*sigh*. Sadly it is very hard to explain how beneficial VCS is, until you don't need it explaining any more.

I currently maintain three major branches of our computational code. One is the main branch, and works nicely. One is clean and standards compliant (sadly the ensemble of fortran compilers we are required to support is not, so this can't be the main code), and one has a somewhat-complete restructuring of how the main algorithm is implemented.

It is important that I can move code between these branches, and keep track of it easily. It is not something that I would have dared to do if I was not already completely fluent in using the VCS. I prefer the general approach - find tools which work on all sizes of project.
Brendan wrote:
  • Do you want to create a branch?
    • No:
      • Ok then.
    • Yes:
      • Will you want to merge later?
        • No:
          • Then just create a copy of the project's directory.
        • Yes (or maybe):
          • Then you need to spend more time deciding what the future of your project should be instead.
:)
I liked Solars bastardisation of Star Wars the other day. "Try. There is no Do or Do Not". Effective development relies on trying out different ideas and testing them relative to each other. If you restrict yourself to one rigid entirely-pre-planned path you won't have the flexibility to do this.
Brendan wrote:Can I "drag and drop" the repo into my FTP server's directory if I felt like letting someone download the entire project?
Of course. There is absolutely nothing to prevent you creating an archive of your project in exactly the same way you already do, or sharing as much or as little of your projects history as you would choose.

I note that implicit in this is that you do want to share your code - which undermines your notion of a purely one-developer one-user project...
Brendan wrote:If I install Git, continue pressing F12 and never bother to waste my time doing "git commit -am 'auto-generated build id'" for the nice fuzzy feeling I've never needed; would you be "happy enough" and stop trying to convince me to adopt whatever you think you need for your project?
Please feel free to use the tools you want to - or even to develop them yourself. That's great!

Please, however, don't tell me what I cannot effectively do with more standard tools, or imply that other workflows don't have significant benefits. This just makes you sound uninformed or incompetent - neither of which I really believe.

Re: Workflow questions

Posted: Tue Nov 29, 2011 5:07 am
by Solar
Brendan wrote:Show me a makefile rule to handle CSS files.
Since CSS is interpreted, not compiled, you won't need a makefile rule to handle CSS files.
When a "source" CSS file changes, the makefile has to make sure that the main web site is updated correctly...
Ah, you mean a makefile rule to handle the update of the main web site...

You won't like my solution, which would be to keep the whole website under version control and use the VCS to update it...

Code: Select all

WEBSITE_REPO_TRUNK := http://www.brendans-domain.org/brendan_project/trunk/website
WEBSITE_DEST := /var/srv/www/brendan_project/htdocs

.PHONY: website

website:
	svn export $(WEBSITE_REPO_TRUNK) $(WEBSITE_DEST)
	apachectl restart
Since you wouldn't want to update your website everytime you did a minor touch to your CSS, I wouldn't make the rule depending on anything, or part of "make all", but rather have it called explicitly by a cronjob at some convenient time during the week.
Brendan wrote:...and also the end-user manual's "tar.gz" (which is just a copy of the HTML pages for OS installation, etc intended for easy off-line viewing).
Note that this is a completely different ballgame from updating the website...

Code: Select all

manual.tar.gz: $(HTMLFILES) $(CSSFILES)
	tar czf $@ $^

Re: Workflow questions

Posted: Tue Nov 29, 2011 5:55 am
by Brendan
Hi,
SDS wrote:
Brendan wrote: Show me a makefile rule to handle CSS files.

When a "source" CSS file changes, the makefile has to make sure that the main web site is updated correctly, and also the end-user manual's "tar.gz" (which is just a copy of the HTML pages for OS installation, etc intended for easy off-line viewing).

WARNING: This might be harder than it sounds at first (if you don't know why, do some research into the way web browser's cache CSS files)...
I must admit that I am not familiar with CSS and its intricacies. I do, however, have a substantial makefile system which produces a variety of different formats of documentation using sphynx.

The elements required for constructing a functioning makefile are:
  • A deterministic dependency structure
  • A consistent set of rules for dealing with files (i.e. the same file is not magically treated differently in different compiles for no reason).
If these are not fulfilled, then I am not sure that the build system is within my comprehension.
The problem with CSS files is that browsers will cache them, and when you change the CSS there's no way to tell the browser that their cached copy is obsolete. Even reloading the entire page doesn't work (in the browsers I tested).

The normal trick web developers use to work around this is to change the CSS file's name whenever the CSS file's contents change. That way the old CSS file is still in the browser's cache, but the web page/s reference a different CSS file so the browser knows it has to fetch the new CSS file. For an actual web site this isn't so hard - most web sites dynamically generate pages anyway (even when there's no sane reason to do it) and it's easy to use something simple like SSI to avoid the need to change each HTML file. Of course that won't work for "off-line viewing" where the end user downloads an archive of HTML files (and no web server is involved).

When the source CSS file changes; you have to auto-generate a new unique name for it (and hopefully delete the old/obsolete copies of the CSS file), then regenerate every web page that uses that CSS file so they reference the new CSS file and not the old one. The new version of my build utility does this (it uses "base64 encoded seconds since 1970" as the new CSS file's name).

Of course I've got one main CSS file that's used by all web pages, plus 5 smaller CSS files for different types of web pages (one for documents and specifications, one for C, one for assembly and one for scripts) that override some of the things in the main CSS file. A makefile rule that has "*.css" as a dependency would be inefficient (e.g. if you change the CSS file that's only used by scripts, then you'd end up updating all web pages rather than just a few of them).

Basically, what I'm after is a makefile rule that has a "random" file name as a dependency.
SDS wrote:
Brendan wrote:The funny thing is that I was a bit like you once. I started with makefiles and "manually maintained" web pages full of broken links. The first version of the build utility was only intended to make generating parts of the web site easier. It grew. Going back to coping with many different tools that all try to do specific jobs (and inevitably fail to varying degrees) seems like torture to me now.
To be honest, if I was spending half an hour each week tinkering with makefiles, docbook and doxygen, I would have ditched those tools. If I had to manually maintain complicated websites rathre than automatically generating them, I would also do the same.

I think it is a little presumptuous of you to assume that because you are unable to get make to work effectively in a reasonably complicated system, then everyone else struggles with the same issues. I haven't touched our make file in a year (except to add some funky python-script preprocessing of certain files, but I would have to add that logic to any build system...)
Like I said - I was using Makefiles once upon a time. Makefiles alone wouldn't be too bad. It's "makefiles and HTML generators and scripting languages and markup languages and all the different configuration files and commands you need to learn to make them happy". I'm a low-level programmer (assembly and C) - I don't want to waste 2 weeks relearning python, just so I can write "funky script preprocessors".
SDS wrote:I currently maintain three major branches of our computational code. One is the main branch, and works nicely. One is clean and standards compliant (sadly the ensemble of fortran compilers we are required to support is not, so this can't be the main code), and one has a somewhat-complete restructuring of how the main algorithm is implemented.
The key word there is "our". As soon as there's more than one developer you're screwed without version control.

From your description, if I was the only developer (and in complete control of the direction the project takes) the first thing I'd be doing is dropping support for the non-standards compliant fortran compilers and wiping out that branch; then I'd be deciding if the "somewhat-complete restructuring of the main algorithm" is better or worse and wiping out one of the remaining 2 branches. Of course after that I'd create a new version of the project in C (and wipe out the last branch). Then I'd finish my breakfast.
SDS wrote:I liked Solars bastardisation of Star Wars the other day. "Try. There is no Do or Do Not". Effective development relies on trying out different ideas and testing them relative to each other. If you restrict yourself to one rigid entirely-pre-planned path you won't have the flexibility to do this.
Let's say I want to play around with a radically different scheduler design. I grab the old "scheduler module" directory and copy it, open the new version's "0index.asm" and change the output binary's file name in the header, then add the new/experimental "scheduler module" to the list of files that get included in the OS's boot image. Now the end-user can choose which scheduler module they want to use when the OS boots (although I wouldn't make an official release like that until the new/experimental scheduler is stable).
SDS wrote:Please, however, don't tell me what I cannot effectively do with more standard tools, or imply that other workflows don't have significant benefits. This just makes you sound uninformed or incompetent - neither of which I really believe.
I haven't claimed that other workflows don't have significant benefits for other people (that I'm aware of). I've only claimed that other workflows don't have significant benefits for me (while trying to justify/explain my choices).


Cheers,

Brendan

Re: Workflow questions

Posted: Tue Nov 29, 2011 6:08 am
by Brendan
Hi,
Solar wrote:Since you wouldn't want to update your website everytime you did a minor touch to your CSS, I wouldn't make the rule depending on anything, or part of "make all", but rather have it called explicitly by a cronjob at some convenient time during the week.
If I did a minor change to my CSS, then I'd want the entire web site updated within 2 seconds just so I can see how it looks.
Solar wrote:

Code: Select all

manual.tar.gz: $(HTMLFILES) $(CSSFILES)
	tar czf $@ $^
Probably best to read the part about CSS files in my previous post (or, how does "$(HTMLFILES)" cope with file names changes?)... ;)


Cheers,

Brendan

Re: Workflow questions

Posted: Tue Nov 29, 2011 6:19 am
by Solar
Brendan wrote:If I did a minor change to my CSS, then I'd want the entire web site updated within 2 seconds just so I can see how it looks.
Ah... so you're not speaking about updating the website (i.e., the "production server"), but the test environment.

Actually, my take on the issue (consider that I never did any serious CSS work) would be to embed the CSS in the HTML page as long as I am testing things, and putting the updated CSS into a seperate file only after I am done testing. Shouldn't that solve the issue?

I could write up a Makefile that does the renaming and updating as you required, probably with the help of a shell script or two. But that feels too much like a workaround for a workaround for shitty browsers... i.e., "won't fix, report to upstream". :twisted:

Edit: Somewhat similar is the issue of LaTeX, which requires multiple compiler runs to get the final result. The number of runs depends on the LaTeX features you are using. Of course you could write a complex Makefile to resolve the issue, and people have done just that. But the "correct" solution is that modern LaTeX distributions come with a tool called "latexmk", which does this kind of LaTeX-specific stuff internally (i.e. under maintenance of people who know about how LaTeX works). The Makefile only contains one simple rule that calls "latexmk". Solving LaTeX's problems in your Makefile is a workaround (since LaTeX's peculiarities are not really the domain of 'make'), "latexmk" is the solution.

Re: Workflow questions

Posted: Tue Nov 29, 2011 6:43 am
by Solar
This might be a good short-term solution for your CSS woes: http://www.stefanhayden.com/blog/2006/0 ... hing-hack/

Re: Workflow questions

Posted: Tue Nov 29, 2011 6:44 am
by bluemoon
The rename part is easy with Makefile and subversion, i do this for my make tarball

Code: Select all

SVN_REV := $(shell svnversion -n .)

(snipped)

tarball:
	@echo TAR xxxxx-$(SVN_REV).tgz
	@tar czf xxxxx-$(SVN_REV).tgz $(SOME_FILES)
SVN version should give you a unique token string. With a little effort it's possible to rename .CSS files and pattern replace their reference within HTMLs.

Re: Workflow questions

Posted: Tue Nov 29, 2011 9:42 am
by Rusky
Brendan wrote:The problem with CSS files is that browsers will cache them, and when you change the CSS there's no way to tell the browser that their cached copy is obsolete. Even reloading the entire page doesn't work (in the browsers I tested).
Which browsers did you test??

*points browser at local html/css page*
*changes body background-color in the css file*
*hits f5*
Oh look, it changed.
*tries again in Chrome, Firefox, Safari, and Internet Explorer on various machines*
Works on all of them.

If you are testing from an actual server, you can use the ?<unique> trick, but it would be better just to configure the server correctly for web dev.
Brendan wrote:... 251 threads and 198 external processes. ...
Now, what would happen if each "*.c" and each "*.asm" where compiled/assembled separately? ... 5070 processes ...
As my benchmarks showed, even in the worst case of recompiling everything, make -j will be about the same as, or barely faster than, your solution. But in any case, most of your assumptions are flawed- having more threads doesn't mean anything if all the threads are shorter.

Your hyper-modular structure has bought you some of the ability to skip unnecessary compilation, but I have a hard time believing you're not still doing a lot more of it.
Brendan wrote:If you're happy with build times that are an order of magnitude slower, ... various false assumptions ... then there's probably little hope for you.
Who's benchmarks are ignoring reality now? An order of magnitude slower? When?
Brendan wrote:If it can't be done with a macro in a header file, then it belongs in a formal specification for a standardised protocol/API/interface. ... meaningless accusation ...
What does a formal specification have to do with separate compilation? Even if you could benefit from a standardized protocol for btrees and linked lists, why must they and their support code be duplicated between binaries through recompilation or reimplementation? (note: I'm not talking about shared libraries)
Brendan wrote:If you know (without any doubt) that the new feature is going to be part of the final product (or the next release of the final product), then why do you need to keep the old code in a separate branch (unless it's not a single-developer project)?
[...]
the first thing I'd be doing is dropping support for the non-standards compliant fortran compilers and wiping out that branch; then I'd be deciding if the "somewhat-complete restructuring of the main algorithm" is better or worse and wiping out one of the remaining 2 branches. Of course after that I'd create a new version of the project in C (and wipe out the last branch). Then I'd finish my breakfast.
Usually, you can't drop old code because that would, or could- you don't usually know, break compatibility with dependent software (yes, yes, not for your one-man hobby kernel, but in the general case).
Usually, you can't drop support for something just because you dislike it (yes, yes, not for your project where you control the toolchain, but in the general case).
Usually you can't just rewrite a project because you don't like it (yes, yes, you have no deadlines, but in the general case).
Usually, you can't know whether a new feature will be part of the final product until you've implemented it, tested it, etc. (yes, yes, you're perfect and clairvoyant and wonderful, but in the general case).
Brendan wrote:Let's say I want to play around with a radically different scheduler design. I grab the old "scheduler module" directory and copy it, open the new version's "0index.asm" and change the output binary's file name in the header, then add the new/experimental "scheduler module" to the list of files that get included in the OS's boot image. Now the end-user can choose which scheduler module they want to use when the OS boots (although I wouldn't make an official release like that until the new/experimental scheduler is stable).
This very often a good way to do things, and your design helps a lot here, but it's also occasionally impossible, difficult, or not the best way. Version control branches are, however, also helpful in the cases mentioned above.
Brendan wrote:I agree - archives aren't a good substitute for version control, in the same way that text editors are a bad substitute for bitmap graphics editors, or a car is a bad substitute for a boat. I dare you to add SVN and GIT to this wikipedia page.
Version control is actually a very good tool for backups. It just so happens to be a superset of what most of those backup programs do. It's definitely a superset of your backup system. It stores snapshots of the project back in time, but compresses them much more efficiently than a simple collection of archives could ever hope to do. It automates off-site backups the way you've been saying you would like to. It handles merging between snapshots, even if you in your perfection will never need that. Et cetera.
Brendan wrote:I've only claimed that other workflows don't have significant benefits for me (while trying to justify/explain my choices).
Personally, I couldn't care less what you use for your build system. I'm not trying to convince you to change. What I am doing is giving and supporting "specific suggestions on how best to organize a workflow." You implied, by posting in this thread, that your system would be beneficial in the general case- something I disagree with.

Re: Workflow questions

Posted: Wed Nov 30, 2011 1:08 am
by Brendan
Hi,
Rusky wrote:Which browsers did you test??
Firefox and Internet Explorer.
Rusky wrote:If you are testing from an actual server, you can use the ?<unique> trick, but it would be better just to configure the server correctly for web dev.
I am testing on an actual server. It is configured correctly. When users download the "tar/gzipped off-line version" of the same pages the "?<unique>" trick won't work for them (unless I expect all the end-users to install and configure a web server).
Rusky wrote:
Brendan wrote:... 251 threads and 198 external processes. ...
Now, what would happen if each "*.c" and each "*.asm" where compiled/assembled separately? ... 5070 processes ...
As my benchmarks showed, even in the worst case of recompiling everything, make -j will be about the same as, or barely faster than, your solution. But in any case, most of your assumptions are flawed- having more threads doesn't mean anything if all the threads are shorter.
As your benchmark showed, for recompiling everything, using 4 CPUs instead of one is less than twice as fast (and nowhere near 4 times faster), and even that lame "lack of decent speedup" is only going to happen when those other 3 CPUs are idle.

Let's look at it a different way. What if I had a huge monolithic kernel where there's only one binary to build, and I'm using a 4-CPU system. In this case "make -j6" should be a huge improvement. Why "make -j6" and not "make -j20" or "make -j123456"? This is obvious - you can't do more in parallel when you've run out of CPUs. Ok; so what if I had a huge monolithic kernel, and I'm using a 4-CPU system, but half of those CPUs are busy doing other things? In this case you'd probably want to drop back to "make -j3" because you've only really got 2 CPUs that aren't busy doing other things.

Now let's split that huge monolithic kernel into 50 separate pieces. Would you do "make -j3" for each piece in parallel, and end up trying to do "3*50 = 150" things in parallel with the 2 CPUs that aren't busy? Would you expect that to be faster than doing "make -j1" and only trying to do 50 things in parallel with the 2 CPUs that aren't busy?

What I'm continuing to say is that 50 seperate "make -j6" processes (where each of them run in parallel, and each are responsible for compiling one group of "*.c" files and creating one binary) is pointless and silly.

Now...

Perhaps I've misunderstood what you're trying to say. Perhaps you've somehow shifted the discussion from "compile each *.c separately" to "let make control *everything* (with one and only one "make -j6"). This is an entirely different discussion - in theory this would make sense. In practice, in my case, it would be horribly inefficient if "make" attempted to control which pieces of the web site are rebuilt when - it'd be hard to do better than "if any *.txt changes, regenerate all web pages".
Rusky wrote:
Brendan wrote:If it can't be done with a macro in a header file, then it belongs in a formal specification for a standardised protocol/API/interface. ... meaningless accusation ...
What does a formal specification have to do with separate compilation? Even if you could benefit from a standardized protocol for btrees and linked lists, why must they and their support code be duplicated between binaries through recompilation or reimplementation? (note: I'm not talking about shared libraries)
For me, anything complex enough (and used often enough) to justify putting in a library should be put in a "service" in a different process/binary instead; where only the standardised protocol/API/interface (which should be published as a formal specification) is needed to use it; and where anyone can replace that service with their own implementation based on the published formal specification, even if the original implementation and all software that uses it are all closed-source binaries made by completely different people.
Rusky wrote:
Brendan wrote:If you know (without any doubt) that the new feature is going to be part of the final product (or the next release of the final product), then why do you need to keep the old code in a separate branch (unless it's not a single-developer project)?
[...]
the first thing I'd be doing is dropping support for the non-standards compliant fortran compilers and wiping out that branch; then I'd be deciding if the "somewhat-complete restructuring of the main algorithm" is better or worse and wiping out one of the remaining 2 branches. Of course after that I'd create a new version of the project in C (and wipe out the last branch). Then I'd finish my breakfast.
Usually, you can't drop old code because that would, or could- you don't usually know, break compatibility with dependent software (yes, yes, not for your one-man hobby kernel, but in the general case).
Usually, you can't drop support for something just because you dislike it (yes, yes, not for your project where you control the toolchain, but in the general case).
Usually you can't just rewrite a project because you don't like it (yes, yes, you have no deadlines, but in the general case).
Usually, you can't know whether a new feature will be part of the final product until you've implemented it, tested it, etc. (yes, yes, you're perfect and clairvoyant and wonderful, but in the general case).
Looks like we finally agree on something - none of that applies in my specific case.
Rusky wrote:
Brendan wrote:Let's say I want to play around with a radically different scheduler design. I grab the old "scheduler module" directory and copy it, open the new version's "0index.asm" and change the output binary's file name in the header, then add the new/experimental "scheduler module" to the list of files that get included in the OS's boot image. Now the end-user can choose which scheduler module they want to use when the OS boots (although I wouldn't make an official release like that until the new/experimental scheduler is stable).
This very often a good way to do things, and your design helps a lot here, but it's also occasionally impossible, difficult, or not the best way. Version control branches are, however, also helpful in the cases mentioned above.
Looks like we're agreeing here too - what I do works for me (and may not work for things that don't matter in my specific case).
Brendan wrote:Personally, I couldn't care less what you use for your build system. I'm not trying to convince you to change. What I am doing is giving and supporting "specific suggestions on how best to organize a workflow." You implied, by posting in this thread, that your system would be beneficial in the general case- something I disagree with.
I never implied (by posting in this thread or otherwise), that my system would be beneficial in the general case. I only posted "Here's what *I* do". The unstated intent was to allow the original poster to see what I do and steal/borrow any ideas that they think might work for them. If you look at my first post in this topic it's easy enough to see 3 things - one key (F12 in my case) that starts the build process (no need to type in a command, click on a script, etc), the idea of spending time designing something to build the project and take care of other maintenance tasks (which needn't necessarily be a purpose designed utility in C but could be a varied assortment scripts, makefiles, configuration files and whatever existing tools you like), and the idea of having (real and virtual) test machines ready to go at a moments notice (network boot - no messing about with USB sticks, CD-ROMs, floppies, etc).

I only wrote the second post in this topic (the one that goes into details about how my build utility works) because the original poster asked me for details on how my build utility works. In hindsight, it may have been better if I'd split the original topic and created a "How the BCOS Build Utility Works" topic instead. It's a pity foresight isn't so easy.


Cheers,

Brendan

Re: Workflow questions

Posted: Wed Nov 30, 2011 8:25 am
by Rusky
Brendan wrote:When users download the "tar/gzipped off-line version" of the same pages the "?<unique>" trick won't work for them.
Are you saying that we're experiencing entirely different behavior from the same browsers? The test I described in my previous post works equally well when you archive the files. From what I'm seeing, you shouldn't need any trick.
Brendan wrote:Perhaps I've misunderstood what you're trying to say. Perhaps you've somehow shifted the discussion from "compile each *.c separately" to "let make control *everything* (with one and only one "make -j6"). This is an entirely different discussion - in theory this would make sense. In practice, in my case, it would be horribly inefficient if "make" attempted to control which pieces of the web site are rebuilt when - it'd be hard to do better than "if any *.txt changes, regenerate all web pages".
Letting make control everything has been implicit in what I've been saying the whole time- we've been kind of talking past each other here... And considering I don't know how your website is laid out I can't really say anything on the efficiency of using make's dependency system for it (although as I'm sure you've guessed, I'm skeptical of your claims here :P).
Brendan wrote:For me, anything complex enough (and used often enough) to justify putting in a library should be put in a "service" in a different process/binary instead; where only the standardised protocol/API/interface (which should be published as a formal specification) is needed to use it; and where anyone can replace that service with their own implementation based on the published formal specification, even if the original implementation and all software that uses it are all closed-source binaries made by completely different people.
What of the example data structures I've been giving? In general I agree with your position on this, but it would be highly impractical to move those into separate services.
Brendan wrote:I never implied (by posting in this thread or otherwise), that my system would be beneficial in the general case.
I guess this quote from your first post wasn't meant to imply that your build tool is a better solution than make, then?
Brendan wrote:Then, I write a large and complicated "build utility" (in C, with pthreads) that handles *everything* extremely quickly.
Because that's what I got out of it.

Re: Workflow questions

Posted: Wed Nov 30, 2011 9:49 pm
by Brendan
Hi,
Rusky wrote:
Brendan wrote:When users download the "tar/gzipped off-line version" of the same pages the "?<unique>" trick won't work for them.
Are you saying that we're experiencing entirely different behavior from the same browsers? The test I described in my previous post works equally well when you archive the files. From what I'm seeing, you shouldn't need any trick.
If it wasn't a problem, I wouldn't have noticed a problem, wouldn't have googled for a solution, and wouldn't have found plenty of people with the same problem and plenty of web sites describing the problem and offering solutions to it. It might be something that only effects some versions of some browsers and not others, and there might be ways of configuring (or mis-configuring) web servers to avoid it (e.g. prevent all caching of all CSS files, regardless of whether they've changed or not). To be honest it's easier for me to work around the problem in the build utility than to do extensive research (e.g. work out which browsers are effected, gather statistics about which browsers people are using to access my site, etc).
Rusky wrote:
Brendan wrote:Perhaps I've misunderstood what you're trying to say. Perhaps you've somehow shifted the discussion from "compile each *.c separately" to "let make control *everything* (with one and only one "make -j6"). This is an entirely different discussion - in theory this would make sense. In practice, in my case, it would be horribly inefficient if "make" attempted to control which pieces of the web site are rebuilt when - it'd be hard to do better than "if any *.txt changes, regenerate all web pages".
Letting make control everything has been implicit in what I've been saying the whole time- we've been kind of talking past each other here... And considering I don't know how your website is laid out I can't really say anything on the efficiency of using make's dependency system for it (although as I'm sure you've guessed, I'm skeptical of your claims here :P).
For dependencies, for web page generation alone (not including building binaries, etc):
  • All pages depend on the build utility itself (if the build utility was recompiled, then all web pages need to be regenerated)
  • All pages depend on a corresponding file (e.g. "www/doc/foo.html" would depend on "doc/foo.txt"), excluding the site map
  • All pages (may) depend on any specification (as any page can have some "[s:spec]" markup that creates a HTML link to a specification page or a section within it)
  • All pages (may) depend on any documentation (as any page can have some "[d:spec]" markup that creates a HTML link to a documentation page)
  • If "glossary.txt" exists, then all pages depend on "glossary.txt" (if "glossary.txt" doesn't exist then the "global glossary" feature is disabled)
  • All pages depend on their parent "index.txt" file (if any)
In addition to those:
  • All "association" pages and the site map depend on the main CSS file
  • All assembly and "include" pages depend on the main CSS file and another (smaller) CSS file
  • All C pages depend on the main CSS file and another (smaller) CSS file
  • All script pages depend on the main CSS file and another (smaller) CSS file
  • The site map depends on everything except the glossary
  • Any "index.html" file depends on all of its children (which could be anything)
  • Any HTML file that is part of a group of "*_c.html" and "_h.html" files depends on all "*.c" and "*.h" files in that group (for example, "www/myUtility/foo_c.html" and "www/myUtility/bar_c.html" would both depend on "myUtility/foo.c" and "myUtility/bar.c"). Note: "0index.c" is parsed to determine what is part of the group and what isn't. For example, "myUtility/unused.c" may not be part of the group and may not be converted to HTML at all
  • Any HTML file that is part of a group of "*.asm" and "*.inc" files depends on all other "*.asm" and "*.inc" files in that group (where "group" is similar to above); plus any external "*.inc" files (e.g. the web page for "pc_bios/someBootloader/foo.asm" may depend on "pc_bios/inc/*.inc" or "inc/*.inc")
Finally, don't forget that C and assembly source code may be auto-generated. For example, "pc_bios/someBootloader/foo.asm" may depend on a script that depends on "myUtility.exe" which depends on "myUtility/*.c".

For all of the above, there's 2 different types of "depends on". A file may depend on another file's header (e.g. title) and nothing else; or a file may depend on the complete contents of another file. This is important because it determines how much work needs to be done. For example, if "inc/foo.inc" didn't change but it's parent "index.txt" did, then you need to parse the header in "inc/foo.inc" but nothing else (because "index.txt" needs the title from "inc/foo.inc"); and if "pc_bios/someBootloader/foo.asm" changed (and it depends on "inc/foo.inc") then you'd have to parse the entire "inc/foo.inc" (because "pc_bios/someBootloader/foo.asm" may need to know about any defines, macros, structures, etc. in "inc/foo.inc" for cross-linking purposes). Of course if 20 different things all depend on "inc/foo.inc" you'd parse "inc/foo.inc" no more than once (and only parse the header if you can).

In addition to all of that; there's the "cleaner" (which removes HTML pages that have become obsolete due to source files being renamed, moved, deleted). The cleaner depends on the absence of files rather than depending on the presence of any file/s.

For web page generation; using multiple instances of the same utility or multiple different utilities leads to the duplication of work and/or extra work. For example, if one utility is responsible for generating "index.txt" and needs to parse part of "inc/foo.inc" because of that, and another utility is responsible for "pc_bios/someBootloader/foo.asm" and needs to parse "inc/foo.inc" because of that, then you end up parsing "inc/foo.inc" twice instead of once (or 3 times instead of once if "inc/foo.inc" was changed too). To reduce that duplication of work you could do extra work (e.g. create a "inc/foo_inc.info" file that is easier to parse than "inc/foo.inc"), but I'm not sure that you'd gain much with this approach (other than a whole new set of dependancies to worry about).

Also note that if you're using one utility to handle all web page generation (including all the dependancy stuff above), then that utility has almost all of the information needed (everything except output binary modification times) to determine which binaries need to be rebuilt and all the logic needed to determine the order things need to happen. Basically, if you used a makefile to handle building binaries (and not web pages) and one utility to handle generating web pages (and not binaries); then most of the work "make" would do (but not work make would delegate) would be duplicated.
Rusky wrote:
Brendan wrote:For me, anything complex enough (and used often enough) to justify putting in a library should be put in a "service" in a different process/binary instead; where only the standardised protocol/API/interface (which should be published as a formal specification) is needed to use it; and where anyone can replace that service with their own implementation based on the published formal specification, even if the original implementation and all software that uses it are all closed-source binaries made by completely different people.
What of the example data structures I've been giving? In general I agree with your position on this, but it would be highly impractical to move those into separate services.
I couldn't find any "example data structures I've been giving". From your "Linux kernel, for example, has generic code like" examples:
  • allocators - not sure which "allocators". For allocating ranges of the physical address space (not RAM), physical memory pages and virtual memory pages use the kernel API/s. For allocating IO port ranges and IRQs it's all contained within a "device manager" module (nothing else ever allocates these things). For anything else, it either doesn't exist, is confined to a specific module or you create your own implementation to suit your specific case.
  • data structures - no idea which ones. The only data structures that are used by more than one binary are those that are covered by IPC protocols (and the specifications and include/header files that define them) or the kernel's APIs (and the specifications and include/header files that define them).
  • atomics - reimplemented where needed (although I doubt using something like a "lock add" instruction counts as "reimplementing an atomic" from an assembly language programmer's point of view).
Rusky wrote:
Brendan wrote:I never implied (by posting in this thread or otherwise), that my system would be beneficial in the general case.
I guess this quote from your first post wasn't meant to imply that your build tool is a better solution than make, then?
Brendan wrote:Then, I write a large and complicated "build utility" (in C, with pthreads) that handles *everything* extremely quickly.
Because that's what I got out of it.
That quote implies that my utility does things extremely quickly. It does not imply that any other alternatives (including but not limited to "make", docbook, doxygen, a bunch of perl scripts, a well trained monkey, etc) don't also do things extremely quickly, or quicker than my utility, or slower than my utility.


Cheers,

Brendan