Hi,
Schol-R-LEA wrote:Brendan wrote:My theory is that ideally it would all be designed more like an online game.
A group of people can join the same Minecraft server and collaborate to build a complex machine (in redstone, etc) in real-time; with no revision control, no compiler, no build system and no time waiting for the complex machine to be processed before it can be used. Why can't a group of programmers join the same server and collaborate to build a complex piece of source code in real-time; with no revision control, no compiler, no build system and no time waiting for the source code to be processed before it can be executed?
I honestly can't tell if you are being sarcastic here. While this idea is intriguing - and frankly, even at its worst it would still be an improvement over some of the
sh(1)-shows I have had the misfortune to be involved in - it sounds pretty much like the exact opposite from all the things you've advocated in the past about having a strong, central hand in control of the development and ensuring things got done. I am not saying they are actually contradictory goals, but I think meshing them might be difficult.
I'm quite serious.
Let's start with something more traditional. For almost all source code (for almost all languages) there's a set of "top level things" (e.g. typedefs, global data, global functions, class definitions, ...). If the language was designed with a specific keyword for each of these things and didn't allow nesting (e.g. no type definitions in the middle of a function, etc) then you could scan through the source code relatively quickly to find the start and end of each top level thing and build a list of the top level things, then determine signatures/types of each top level thing in parallel, then convert each top level thing into IR in parallel (while doing all sanity checks and doing partial optimisation).
Now, let's get rid of the "plain text source code" insanity and have a single binary file containing all of the source code, which also keeps track of various pieces of metadata (the list of top level things, which top level things depend on which other top level things, etc), and also caches the "already sanity checked and partially optimised IR" for all the top level things. When a top level thing (e.g. a global function) is modified you'd invalidate the previously cached IR for it (and if it's signature changed also invalidate the IR for things that depended on it) and put the invaldiated top level thing/s in a queue of top level things that need to be regenerated, and have threads running in the background to regenerate whatever was invalidated. This means that when any code is changed it doesn't take long to regenerate the IR for it (even without a huge "compile farm" regenerating the IR for each separate top level thing in parallel, although that wouldn't hurt either) simply because you're not regenerating all the IR for the entire program (and it'd be finer granularity than people currently get from "make" and object files).
Now let's add "client/server" and multi-user to it. When the client is started it displays some kind of diagram showing the top level things and their dependencies (which is information the server already knows/caches); plus a few buttons to add a new top level thing, delete an existing top level thing, etc. When the user clicks on a top level thing in the diagram the server gives the client the source for it and the server keeps track of the fact that the client is viewing that top level thing; and if the user starts editing anything the client tells the server to lock the top level thing (so that only one person can modify the same top level thing at a time) and if that succeeds (nobody else is editing) the client tells the server the changes the user makes and the server broadcasts changes to any other clients that the server knows is viewing that top level thing. Let's also assume that:
- While the source for a top level thing is being modified, the client is converting it into tokens and doing syntax checking before sending it to the server and only sending "syntactically valid tokens", and the server is only double checking that it's valid.
- When client asks server for a top level thing to be locked for writing, the server stores a copy of the current version, and (if the user doesn't cancel their changes and does commit their changes) the server stores the replaced old version of the top level thing somewhere (likely with some other info - who modified it when, and maybe a commit message), so that people can roll back changes later.
- As part of tokenising, the client would convert names (function names, variable names, type names, etc) into a "name ID" to put in the token; which means that when anything is renamed the server would only need to update a "which name to use for which name ID" structure and inform all the other clients, so that changing the name of anything costs almost nothing (no need for programmers to update all the source code where the old name was used, no need for server to regenerate IR everywhere, etc)
Of course with the server only dealing with "already tokenised and checked for syntax" source code; the previous "doesn't take long to regenerate the IR because you're not regenerating doing all the IR for the entire program" becomes even quicker. Specifically, for most cases (e.g. only a single function was modified) it'd be so fast that there'd it can appear to be "almost instant".
So.. now we've got an environment where multiple programmers can collaborate, that auto-generates sanity checked and partially optimised IR in an "often almost instant" way. The next step is to add an interpreter to execute that sanity checked and partially optimised IR, so that it can be executed "almost instantly". Of course you'd also have a proper compiler running in the background doing the full "whole program optimisation" thing to generate an actual executable; but programmers wouldn't need to wait for that to be finished before they can test the code in the interpreter (with all the additional debugging support).
However, often (especially with multiple people simultaneously editing source code) you'd have to assume that the program isn't in a state where it can be executed, and testing a whole program is fairly inconvenient anyway (especially for things like services or libraries that don't have a nice user interface or anything). The obvious solution is to add a new kind of "top level thing" for unit tests; so that programmers can test a small piece of the program (execute individual unit test/s only, using the interpreter/debugger) even when other unrelated pieces aren't in a consistent state.
Note that I've mostly been assuming a simple language like C, where only one layer of "top level things" is enough; but there wouldn't be much reason why you can't have 2 or more layers (e.g. "top level packages, second level classes, third level methods"). Also; I should admit that I mostly focus on "an application is a collection of processes working together" (where each source code project for each process is relatively small and has a well defined purpose) and don't care much about "application is a single massive blob of millions of lines of code" (which would be harder/slower to deal with).
Of course it wouldn't be easy to design or implement a system like this, and there would also be multiple "unexpected problems" and/or additional tricks or features that my simple overview doesn't cover, and you'd probably want to augment it with other ideas (starting with some kind of chat system so users can communicate with each other); but (with enough work) I think it's entirely possible in practice.
In other words (as far as I can tell), you can have a group of programmers join the same server and collaborate to build a complex piece of source code in real-time; with no revision control, no compiler, no build system and no time waiting for the source code to be processed before it can be executed.
The only thing that's really stopping this is that people are obsessed with recycling horrible ideas from half a century ago (e.g. "source code as collection of plain text files").
Schol-R-LEA wrote:I am also not sure if most corporate IT departments would sign off on this approach, despite the fact that I have (as I mentioned) seen far worse being made Official Company Policy in the past. The management of companies, both big and small, is mostly ego-driven; managers rarely like the idea of a system that doesn't let them put their fingerprints all over everything, even (or perhaps especially) when they have no idea of what is going on. The very fact that the myth of 'Waterfall development' - which was never a workable model (it was introduced as a straw-man argument), and hence was never actually used by the developers of any project, anywhere - persists in management circles even now, is proof that IT management and planning are rarely hindered by reality.
I'm not suggesting that the entire world would immediately abandon everything they're using as soon as the first ever attempt at this kind of tool is released.
Schol-R-LEA wrote:This brings me to one problem I see with this, at least as you are presenting it: I have seen the problems that arise from 'cloud' (ugh) based development systems, the ShillFarce system in particular. While the fact that none of those I've seen in professional use have done it well doesn't mean it is a bad idea, but I am wary of it.
I am particularly concerned about the idea of a centralized, rather than distributed, approach to hosting such a system; the usual result of this is that the code and data are in effect held for ransom by the company doing the hosting, and once committed to a host or system, you rarely have any viable exit strategy except to start over from scratch. It also requires you to trust that they are competent at keeping everything up, load-balanced, secured, and backed up; while some are pretty good about this, a lot more do a lackluster or worse job of it. To be fair, this is a sore spot for me personally, as this was exactly the scenario I saw at my last position, and dealing with this day after day was part of what led to my meltdown towards the end.
There's no need for cloud and no need for a third party to host the server. Anyone that feels like it could run their own server on their own private LAN, and anyone that feels like it (who has a publicly accessible IP address) can run a server that isn't restricted to a private LAN. Of course there's also no reason a third party couldn't provide "server as a service" either. It could be the same as (e.g.) git (where anyone can run the server but companies like gitlab and github also exist).
Schol-R-LEA wrote:But I don't see any reason why the system would need to be centralized to that degree. Indeed, I suspect you goals would be better served by a peer-to-peer approach, so that a) a given developer can continue working even if entirely offline, or unable to reach a server and/or other devs for some other reason (which happens a lot more often than you would expect, even in the best systems), withe the system automatically updating everywhere (as a 'branch') once the developer is connected again; b) it would shift the system from a single primary point of failure to many smaller ones, where no failure of a single one would be fatal; and c) it would distribute the loading automatically, so that while each node would be sending and receiving broadcast updates, none of them would act as a bottleneck for the others.
You're right - it could be more "peer-to-peer" (and at a minimum I'd want to support things like redundancy); it's just easier to describe (and probably easier to design and implement) as "client/server" because of the synchronisation involved (complex peer-to-peer systems tend to get messy due to the
consensus problem).
Schol-R-LEA wrote:Finally, I am assuming that all of this applies to the 'Development server', and that while all of this is going on there is an equally automatic process - presumably managed by someone, but requiring little intervention from most of the individual devs - for pushing the program through a series of unit, integration, and all-up tests, and some way for the project leads to decide which stages of the project to pass to UX testing, acceptance ('alpha') and release ('beta') testing. Surely you don't mean to have the developers working directly on the user release version... do you? Because that's some pretty mid-1990s web dev, cowboy Bovine Excrement right there. No sane developers - not even web developers - work directly in production these days, and trust me, there are good reasons for this.
Yes, just "development server" - you'd still do things like fork the project for each release, and have UX testing and alpha/beta/release candidate versions, etc. For automated testing, I'd want that integrated as much as possible, ideally so that test results can be fed straight back to the programmer/user soon after they commit changes.
Schol-R-LEA wrote:Brendan wrote:with no revision control, no compiler, no build system and no time waiting for the source code to be processed before it can be executed?
I suspect that you don't mean that the tools wouldn't be there, but that the tools would be mostly invisible, running in an online mode, operating automatically without direct action from the programmers. I can certainly get behind this generally, as it more or less parallels my own ideas. As I would see it, the 'compiler' would operate in a compile-and-go parti-operation mode, recompiling the code as it is being edited and maintaining a dynamic set of possible outcomes; since this is in fact how some Lisp 'interpreters' actually work, it is a natural fit to my ideas.
The underlying functionality (at least most of it, possibly combined with some that can't be found in existing tools) would still be there; but the individual tools themselves wouldn't exist in a recognisable way. It'd be like grabbing GCC, make, GIT, an IDE (eclipse?) and some kind of server (Apache) and throwing them into a wood chipper and then gluing the chips of wood together.
Schol-R-LEA wrote:Similarly, since a part of my goal is to have something similar to Xanadu, I mean to have 'revision control' as an inherent part of the storage system; the storage system, which records changes to a document as a series of branching paths and never deletes the earlier versions, would serve as a rolling log of changes (smaller series of changes would be journalled to some degree for efficiency's sake, but the sequence would always be present), with the 'version control' only accessed directly if you needed to roll something back or bump a milestone - the VCS would mostly just be an application for browsing the document's history, and annotating different 'branches' as 'main branch' and so only. The biggest complication would be in integrating parallel work (which is going to happen, is going to be necessary, even for the kind of collaboration you are talking about; if nothing else, you need to have 'maintenance' branches, as you will still have people working with older versions which need to get bugs fixed and such-like. The main role of the lead developer would be to decide which changes get merged into the final 'release'.
For what I described you'd still be able to do the equivalent of creating a "diff" between different versions of one branch and then try to apply that diff to the current version of a different branch (with the same problems people get now when "try" doesn't mean "succeed").
Cheers,
Brendan