Hi,
AlexHully wrote:The language is managed at the compilation time. Not at runtime. So we get the best of both worlds.
That is not managed at all.
Lets look at compilers and give them a rating from 1 to 100 representing how good they are at detecting bugs at compile time; where 1 represents "only able to detect syntax errors" and 100 represents "able to detect all possible bugs". Assemblers would get an extremely low rating (maybe 1). C compilers are better because there's some type checking and other features (maybe a rating of 15). A compiler for a language like Rust might get a much higher rating (maybe 50).
What you want is a compiler that gets a rating of 100 on this scale. Except for that; it's no different to Rust or C or assembly.
Ironically, I also want a compiler that's able to detect as many bugs as possible during compile time because it's far better to tell the developer there's a bug and refuse to compile before software is released than it is to blow up in the end user's face at run-time after software is released. This is also part of the reason why I think managed environments (e.g. Java, which can blow up in the end user's face when the JVM detects a variety of problems) are a worthless joke - they don't detect problems until after it's too late.
The difference between you and me is that I've spent time researching what is/isn't possible and I know that getting a rating of 100 on that scale is impossible, and you probably haven't even started thinking about how you're going to achieve it. With this in mind; prepare yourself for a crash course in "anti-crash" technology!
Imagine there's a line of code like this "x = myArray
;". You have to guarantee that the range of possible values of i is within the bounds of the array (e.g. if its an array of 100 things, then you have to guarantee that i can only hold a value from 0 to 99). This mostly means the compiler must track the maximum possible ranges of all variables.
Now imagine "i = abs(1000/y); x = myArray;". Division is annoying. If its an array of 100 things; then this code is safe if i ranges from 0 to 99; and if we know that y ranges from -99999999 to +99999999 that does not help to determine if i ranges from 0 to 99. You need to know that the possible values of y exclude the range from -10 to +10 (because values in that range cause i to be greater than 99). Because of division; for a compiler to track the maximum possible ranges of all variables, a compiler must actually track the ranges of possible negative values and the ranges of possible positive values separately.
Now imagine this:
Code: Select all
if( (myFloat > 100000.0) && (myFloat < 100000000.0) ) {
m = myFloat + 3.0;
n = m - myFloat;
o = n / 3.0;
p = o - 1;
i = p * 123.0;
x = myArray[i];
}
Because of the "if" you know that:
- m will range from approximately 100003.0 to approximately 100000003.0
- n will range from approximately 3.0 to approximately 3.0
- o will range from approximately 1.0 to approximately 1.0
- p will range from approximately 0.0 to approximately 0.0
- i will range from approximately 0.0 to approximately 0.0
However, single precision floating point doesn't have infinite precision and (for large values) "myFloat + 3.0" may equal myFloat; which actually means that n may be zero, o may be zero, p may be -1 and i may be -123. For some values of myFloat, n may be 0.5, o may be approximately 0.166666, p may be approximately -0.833333 and i may be -102.5.
Basically; "approximately 0.0" isn't good enough. For a compiler to realise this code is not safe it must keep track of the precision of all variables (in addition to the range of possible positive values and the range of possible negative values).
Now consider this:
Code: Select all
m = myDouble + 1.0;
n = m - myDouble;
o = n - 1;
if( (myDouble > 100000.0) && (myDouble < 100000000.0) ) {
i = o * 123.0;
x = myArray[i];
}
This code is safe; because double precision floating point does have enough precision for the compiler to know that when myDouble is in the range from 100000.0 to 100000000.0 the value in o will be zero. However; if myDouble could be anything from -infinity to +infinity at the start of the program, before the "if" you have to assume that the possible values in o include "NaN". How does the compiler know that o will be zero after the "if"?
There are 2 answers; but the first answer (keep track of the relationships between all variables) is extremely impractical. The second answer is that the compiler has to transform the code into something like this:
Code: Select all
if( (myDouble > 100000.0) && (myDouble < 100000000.0) ) {
m = myDouble + 1.0;
n = m - myDouble;
o = n - 1;
i = o * 123.0;
x = myArray[i];
} else {
m = myDouble + 1.0;
n = m - myDouble;
o = n - 1;
}
This makes it easy for the compiler to see that the code is safe without having to track the relationships between variables. However, this trick does not work for more complicated cases (e.g. when o is calculated by a completely different function and stored as a global variable 3 days before this code is called).
Now consider something like this:
Code: Select all
scanf ("%u",&user_input);
c = abs(user_input) & 0x1F + 1;
b = 1.1 * c;
do {
b = b * 1234.5678;
a = b % 1.0;
} while (a != 0);
i = b/99999.0;
x = myArray[i];
Can you guarantee that i be in the range 0 to 99? You know that c will be from 1 to 32, which isn't very many values. In theory you can execute the loop in an interpreter or something to determine what happens for each of value of c; and monitor the value of b to determine if it repeats a previously seen value of b (to detect if the code loops forever or not). In that way you could determine if it's safe, and if it is safe you can convert the code into this:
Code: Select all
scanf ("%u",&user_input);
c = abs(user_input) & 0x1F;
if(lookup_table[c].haltsFlag) {
for(;;) { }
}
b = lookup_table[c].b;
a = b % 1.0;
c = c + 1;
i = b/99999.0;
x = myArray[i];
The first problem here is storing all previously seen values of b. For example, (assuming b is 64-bit double precision floating point) you might need 1 bit per possible value, or (once you realise b can never be negative and you don't need the sign bit) a total of about 1048576 TiB of virtual address space, and to achieve that (on 64-bit 80x86) you need a minimum of 8192 processes each with 128 TiB of space. Note that this is only space and not RAM (you'd hope that nearly all pages aren't modified) so it's technically possible on sane OSs. Alternatively you could have a list of "previously seen values of b" and search it, but that's not likely to be fast and (for worst case) would consume 64 times as much virtual address space. Faster alternatives (e.g. binary trees) consume even more RAM for worst case.
The second problem is how long it's going to take. Most people don't want to wait for 3 centuries while their code compiles; so you have to set some sort of time limit and give up if that time limit is reached.
That brings me to unsolvable problems. What does the compiler do if it can't possibly determine if a piece of code is safe or not?
The only thing it can do is "false negatives". If it can't determine if a piece of code is safe or not it has to assume the code is not safe. This means that some 100% safe code will fail to compile. For an extremely simple compiler you get a lot of false negatives and a lot of 100% safe code fails to compile; and for an extremely complex compiler more 100% safe code compiles. Note that I do not consider this a deal breaker - it just means that programmers get annoyed at you because they have to modify 100% safe code just so the compiler is able to realise it is safe. What it does mean is that a very complex compiler is required (just for the safety checking alone, ignoring optimisation, etc) to ensure programmers are able to tolerate it. Once you combine the complexity involved in safety checking with the complexity involved in modern optimisation techniques; we're no longer in the realm of ordinary "very complex" - we're talking about pushing beyond anything that all of the compiler developers who came before us have achieved into a whole new category of "complex".
Now; what do we know about "very complex"? We know that there's a strong relationship between code complexity and bugs. This is why we want to sophisticated compiler that's able to detect bugs in the first place. You must assume that a compiler that's complicated enough to be tolerated by programmers and also complicated enough to do acceptable optimisation is also full of bugs and therefore you must assume that the compiler is unable to guarantee that the code it compiled is safe.
Let me repeat that last part for you:
- You must assume that the compiler is unable to guarantee that the code it compiled is safe.
Cheers,
Brendan