Brendan wrote:
I remember seeing studies about users and web server response times that indicated an 8 second limit between requesting something and seeing it. Basically (for web), if the user can't see it within 8 seconds they get impatient and/or assume something is wrong.
Using that as a rough basis for GUI apps, that gives a maximum of 8 seconds to compile to native, start it running, and then get a window displayed. You could probably split that in halves - up to 4 seconds for compiling, then up to 4 seconds to start the resulting executable and display a window. That's probably not the best way to split the 8 seconds (shouldn't take too long to start a process and display a window). However; I personally think 8 seconds is a large amount of time. It sounds short, but when you're waiting it's not. For an example, after reaching the end of this sentence, sit still (don't touch anything or move a muscle) for an entire 8 seconds.
We can easily agree, that 8 seconds is a looong time, and too long indeed.
Brendan wrote:
Now let's consider a reasonably large application, like Bochs. That's a little more than 4 MiB of (compressed) source code and probably takes around 10 minutes to compile on my machine; but my machine is slightly faster than an old 80486. If we assume the user has half the CPU power that I do (e.g. a quad-core i7 rather than a pair of quad-core Xeons) we could assume it might take 20 minutes (or 1200 seconds). I want that cut down to about 4 seconds, or about 600 times faster than GCC. Sounds easy enough!
Now think about how "compiling" normally works. Lots of separate little source files (and all the headers they include) with disk heads bouncing all over the place trying to read them. If that's not bad enough, you're writing individual object files after compiling and then reading them back during linking. It's like it's been designed as a VFS stress test. On top of that there's the overhead of process creation (and termination) to get the compiler running for each source file; then there's a bunch of CPU time spent on parsing plain text into "tokens", doing syntax/sanity checks, building a list of user defined types, getting a symbol table, etc. Mostly; it's all incredibly stupid.
You might be wondering why this is "standard practice". There's 2 very good reasons for this....
The first reason is that it's impossible to find a computer with enough memory; so we have to break "compiling" into several separate steps (pre-process, compile, assembly, then link); and we also have to break the source code into many separate files. That way you only need to fit a small part of the "input", a small part of the "output" and only part of the compiler's code and data in memory at the same time (and you can use disk space to store all the intermediate files). It's the only way we can make a compiler that actually works on a typical computer that only has 20 KiB of RAM. Of course this was back in 1970. We have computers with more RAM now, so maybe this isn't a good reason at all now.
The second reason is that there's no way on earth we'll ever see a CPU that's fast enough to do things like tokenising and maintaining lists of user defined types and symbols while a person is typing. I mean, do you want to press a key and wait 30 seconds? That's not going be anywhere near practical; so we have to push all that processing onto the compiler, and then compile the code as a batch job over the weekend. It's the only way. At least, it was the only way. Hmm, maybe this isn't a good reason either.
By forgetting about "standard practice" it'd be easy to slash compile times by a factor of 10. Instead of compiling something like Bochs in 1200 seconds that brings it down to 120 seconds. Still far too slow, but a little closer.
As I started before, I don't mind my compile times being long, if the incremental compile time is low (and thereby the debug cycle time). Having a low incremental compile time, can be achieved for instance by compiling small potions of the code at a time. This may be troublesome, whenever you need to recompile the hole program, however this is something you should only do rarely, and whenever you have a break off, ect.
That being said; we can easily agree, that compilers and some languages are indeed designed for how the world was back in 1970. However providing a nice type system, and static analysis is indeed something that was not entirely possible back in 1970, and is one of the main areas to look out for in compilers in the years to come. As this is where bad programmers like me gets help and guidance, I'm merely a human being, and hence I do write buggy code, so whenever there's a tool to tell me I did something wrong; I welcome it.
Clang is currently working on a list of helpful tools, called sanitizers, which do a lot of static and compiler enabled runtime analysis of the code. Their goal is to catch 80% of bugs with static analysis and the rest via runtime analysis.
Chandler Carruth did a presentation about this, at Going Native last year. If you guys haven't checked it out, consider doing so, the talk can be found at;
http://channel9.msdn.com/Events/GoingNa ... -s-Dragons
Brendan wrote:
Here's a known, provable fact: assembly language is very fast, and C++ is very slow.
I'm not talking about the performance of the resulting code though - I'm talking about the speed of the tools. Most assemblers are IO bound, even with the overhead of parsing plain text. Most C++ compilers spend 3 days of CPU time just trying to figure out what "a * b" actually means; because let's face it, the compiler can't look at a tiny piece of code in isolation and know exactly what it does with no chance of being surprised/misled, and therefore can't look at a group of tiny pieces of code (an expression?) and know what it does with no chance of being surprised/misled, or...
I get the point that assembly language is indeed very fast to compile, but I actually don't think that's a huuge advantage.
I'd rather have a slow compiled language, which offers me the support, warnings and errors that I need in order to ensure program correctness, rather than I'd have a language which compiles super fast. Because I can compile my slow high level language to some low level language, which is reasonably fast to compile.
Brendan wrote:
I'm fairly sure you get what I'm saying here - once you fix the "standard practice stupidity" problem, complex languages are the next largest cause of long build times. To get fast compile times you can't use complex languages. By using a very simple language you could easily slash compile times by a factor of 10 (once you've fixed "standard practice"). Instead of compiling something like Bochs in 120 seconds that might bring it down to 12 seconds. Still far too slow, but a little closer.
Sadly, there's "simple" and there's "too simple". To get compile times down that much you're going to have to get rid of a lot of good things, like type checking (in case someone does "unsigned int = x && 1;"), and making sure you don't allow variables to be set to out of range values (in case someone does "char c = EOF;"), and making sure grammar is sane (in case someone forgot a "break;" after a case statement) and all sorts of other things that catch obvious bugs and make programmer's lives easier.
The end user doesn't need all that sanity checking though - it's only really useful to the programmer. What you want is a tool that the programmer uses that does all the sanity checking, and for users to download/install files that have already passed all the sanity checks. Note: This also speeds things up a little for programmers like me, who use the compiler as a sanity checking tool and often compile without executing the result at all.
Clang provides a tool for checking the source code only, it's called ClangCheck, alternatively there's flag to Clang '-fsyntax-only', which does checking only.
Brendan wrote:
Of course if the programmers have to run some sort of sanity checking tool anyway; then that tool could also do "pre-compiling" work. Things like stripping comments; and doing constant folding, dead code elimination, inlining functions that are only used once, common subexpression elimination, etc. It could even insert various hints that the end-user's compiler can use, like how likely it is that an "if" statement's condition will be true or whatever.
That way the end-user's compiler doesn't just get a very simple language that it can compile very quickly, it gets a very simple language where half the work is already done. With half the work already done, something like Bochs wouldn't just be compiled in 12 seconds - it'd be down to 6 seconds.
Most compilers have intermediate representation languages, and from what I'm hearing it sounds like you're almost designing such one. Clang provides a 'portable' low-level-code target language (namely the LLVM language), which can at execution time be compiled to native assembly, or run on a virtual machine. I'm curious as to how your approach is different from this.
Brendan wrote:
Now let's think about your typical executables - often they're a big blob of code for a single process. For my OS (where the OS decides which computer on the LAN a process should be executed on for load balancing purposes) this isn't ideal. Instead of having one process per application, you want that broken up into several processes per application so that the load can be distributed across several computers. For a GUI application you'd expect at least 2 processes (user interface and "back end"). For something like Bochs I'd be wanting lots of processes (one for each emulated device, and one for the virtual CPU/s and RAM). When something is split into several processes, those processes can be run on different computers, but those executable files can also be compiled on different computers in parallel. That doesn't help when there's only one computer though.
The other thing is that if you're splitting applications into multiple processes, you notice that some can be used by multiple applications. A spell checker is a good example - there's no real reason you can't have one spell checker "service" that's used by many different applications (at that same time). You could have 3 different emulators all sharing the same "virtual PCI device" executables. Mostly anything that does a reasonable amount of processing (enough to justify the IPC costs) can be ripped out and implemented as a service.
You also notice that not all processes are needed when the application starts - you might compile and start a word processor "user interface" process, then (after it started) find out it wants a word processor "back end" process and compile that, then find out it wants a spell checker that's already been compiled because some other application used it before. Something like Bochs might have a "virtual machine manager" that starts first, and while the user is thinking (about whether to drag and drop the "NE2000 ethernet" icon into the virtual machine they're configuring) the OS could be compiling the process that emulates a sound card.
Basically, by ignoring "standard practice", using careful design and doing a few clever tricks; I think I can get just about anything compiled fast enough.
I love the idea of splitting a program into services or tools, and sharing these between several programs, and I'm indeed doing so with my projects, and just to quote the Doug McIlroy;
"Write programs that do one thing and do it well. Write programs to work together."
As for the lazy compilation of units, I kinda like that, at least if it doesn't have result in unpredictable hang times, which I guess can be solved by doing lambda closure on the callable functions and compiling just-before-time.
Also for your project, are you considering have a compiler service running, or?
Brendan wrote:skeen wrote:Brendan wrote:So you see a line of code that says "a = c * b;" and you decide it'd be a little nicer in alphabetical order and change it to "a = b * c;". It's just multiplication, so obviously this is fine. You commit it to whatever versioning system (commit note is "Minor code clean ups") and forget about it.
4 weeks later a release is made. When you get the bug report from frustrated users in 6 weeks time (which mostly all just say "code doesn't work right" with no useful details) and then spend 4 entire days trying to find a problem before you eventually discover the matrix multiplication bug; then you can pat yourself on the back and say "that wasn't complex at all!"..
First of all, I don't see why changing the order of the arguments should break the code. Multiplication has the commutative property, so should it's implementation, and hence changing the order of the arguments, shouldn't have any effect at all on the output.
Matrix multiplication is not commutative (except for a few special cases).
You're right, and I should stop answering these forums threads at 2am.
However this just means, that because mathematicians haven't been consistent with the commutativity of multiplication, then you can't assume that multiplication is commutative whenever you see it. And you'll have to lookup what the types are. And I guess that you'll have to do the same thing with 'matrix_mul', assuming you're supporting function overloading.
Brendan wrote:
skeen wrote:But assuming that your implementation of the multiplication operator does not do what it should, then I don't see how this is different from; "a = matrix_mul(c,b)" ==> "a = matrix_mul(b,c)", if the implementation of this function doesn't have the commutative property either, then you have the exact same issue.
You don't have the exact same issue. Anyone that knows matrix multiplication isn't commutative wouldn't have created the bug (as you've made it obvious that it's not normal multiplication), and anyone that doesn't know what a matrix is wouldn't have created the bug (as you've made it obvious that it's not normal multiplication). Basically, the bug is still possible, but the chance of the bug actually happening is now a lot lower (as the number of people that now about matrices but also don't know matrix multiplication isn't commutative is very low).
I guess that I'm one of those people, at least at 2am! - However in C++ what you could do, is to have the sizes of a matrix be given by template arguments, that would; 1. Help the compiler generate faster code (as the matrix is now static in size), and 2. Yield a compilation error at the assignment (because you're assigning matrices of different dimensions).
However I agree the issue is still there, whenever you need dynamic sized matrices, but the issue has more to do with the assumption; that multiplication is commutative rather than with the code itself, at least in my opinion (or with the assumption that multiplication is only defined for integers and floats).
Brendan wrote:
skeen wrote:Brendan wrote:What you're saying is that templates are good as long as you only use existing templates and nobody ever writes them, maintains them, improves them or debugs them.
My point is that, when you're using existing templates, then you don't have to maintain, improve or debug them, and that's good!
Why are you a programmer at all? Surely it's easier to use software that someone else wrote for you; so you don't have to maintain, improve or debug anything!
I agree, it would be easier to get someone else to write my software for me! - So uhm, will you work for free? - Because I doubt all the software of my liking will be created already.
Also, I like to think of my self as the factory worker a little up the conveyor, namely where the product is being assembled, rather than down the conveyor, where the pieces are being made. That is, if somebody else has already created the piece I need, then I'll use it, rather than spend time recreating it. So in short; I put pieces together, that's my thing.
So why am I building an OS, when I could just piece together something; well because it's a hobby OS.
Brendan wrote:
skeen wrote:However I do think you're exaggerating, on how advanced it is to write templates; for containers it's mostly replacing your payload data type with T. For template meta-programming, then yes, things may seem somewhat advances, but this is mainly because you leave OOP and procedural programming behind, and venture into the world of functional programming (which is truly a challenge if you never been there before).
If all you do is replace your payload data type with T, then you've probably ended up with a template that works with some data types (e.g. integers) and fails with others (e.g. strings); and while you may think that was easy some other sucker that tries to use it only to find it doesn't work probably won't think it's so easy.
Why would it break with strings? - The idea of a container is mainly that you store the data, and is able to retrieve it later. Whether I hand you back an int, or a string doesn't matter a lot.
Brendan wrote:
skeen wrote:Brendan wrote:
Ah, I see. Before using "std::list" you send an email to the person that wrote it and let them know which pieces of your data are hot and which pieces are cold; and they optimise the generic implementation for your specific case (and "de-optimise" it for other people's specific cases). That way, you don't have to look at or understand the template, or deal with the complexity of templates, because you're still only using existing templates. Nice!
You do not send an email to the person, who wrote the list, instead you hint the implementation with information, i.e. hint which pieces of the data are cold/hot. One of the things I'm working with currently, is using the profiler's output to generate a header file, which is used to hint my generic libraries of optimization possibilities, in each given usage context.
C++11 provides a powerful way to do this, namely user defined literals, which are pretty much strings, that can be parsed at compile time, and used to create objects. The literals can for instance profiler information.
Given that C (and I assume C++ too) isn't even allowed to do basic things like reorder a structure's members; it must be very simple to use a simple template and a simple user defined literal and make the compiler split a single structure into 2 completely different structures.
C++ allows you to 'mark' you structure members to be reorder, that is via access specifiers. Example;
Code: Select all
struct reorder_members
{
public: int a;
public: int b;
};
The compiler is not allowed to reorder members within a access specifier, but it is allowed to reorder access specifier blocks (section 9.2.12 of the ISO standard, if you care).
As for splitting data into two different structures; It's 'just' a matter of doing compile-time reflection on the given data structure, and generating compile-time data structures for hot and cold section data (Remember that C++ templates are Turing Complete, and hence you can generate code for pretty much everything).
The template for doing this, is not super simple, but with the introduction of constexpr functions (C++11) and the relaxation of these (C++14), it has become a lot easier, and can now be done using the procedural paradigm.
Also if this kind of thing is something that is used frequently, it could be candidate to the standard library (likely by Work Group7; Reflection), whereby you don't have to implement it, just use it, or make use of standard data structures, which use it.
Brendan wrote:
skeen wrote:
I truly agree that one should be able to understand the code by inspecting a few lines at a time. Nobody has the ability to comprehend the entire complexity of large software projects, and hence we need to hide complexity to the best of our knowledge. However I do not think that "a * b", is any worse than "matrix_mul(a, b)", and as for your example, one could also use a macro, to replace "matrix_mul" with something that does addition.
Hiding complexity just makes finding the complexity more complex! You don't want to hide it, you want to shift it to an obvious place so that it's out of the way but not hidden. A single '*' character doesn't tell you anything is different or where the operator was overloaded.
Note: For my language, there's no pre-processor and no macros either. For a tiny piece of code like "a * b" (and without any other information whatsoever) you know it's definitely normal multiplication, and (if it compiles without error) you also know that both 'a' and 'b' use a numerical type (integer or floating point) and neither can be a pointer, boolean, structure, etc.
I agree, that having the complexity be visible is good, assuming you can deal with the entire complexity of the system, however in large scale software, you simply cannot deal with the complexity of the entire system, and hence you are forced to hide it. This makes it more complex finding the complexity, true, but why would you want to even deal with it, when you have an interface?
If the interface isn't fulfilled then you file a ticket, easy as that; you don't have to deal with the implementation, just the interface! The implementation is somebody else's problem.
As for the multiplication operation, while it doesn't tell you where the operator was overloaded, neither does a function name, and the lookup (for both) can easily be done, via automated tools, to quote yourself; "We have computers with more RAM now", so there's not really a reason not to do so.
And the argument that maybe the language is just soo complex that you need tools to deal with it; well you need an assembler to deal with assembly