Page 2 of 5

Re: Language Design

Posted: Sun Oct 02, 2016 3:50 am
by Kevin
Schol-R-LEA wrote:This brings to mind something else I forgot to mention: given the specific address used in the example, I am guessing that the correct type declarations probably should be more along the lines of:
[...]
This syntax is just something off of the top of my head, but it should give you a general sense of what you could use.
Something like this, yes. The important part for making things type-safe is that your pointer knows the size of the object it points to (and if you want to do pointer arithmetics like in C, the size of the array it is part of), i.e. not just a pointer to the first u8 of the video memory, but a pointer to an array with a specific size.

Re: Language Design

Posted: Sun Oct 02, 2016 4:11 am
by glauxosdever
Hi,

Kevin wrote:
Schol-R-LEA wrote:This brings to mind something else I forgot to mention: given the specific address used in the example, I am guessing that the correct type declarations probably should be more along the lines of:
[...]
This syntax is just something off of the top of my head, but it should give you a general sense of what you could use.
Something like this, yes. The important part for making things type-safe is that your pointer knows the size of the object it points to (and if you want to do pointer arithmetics like in C, the size of the array it is part of), i.e. not just a pointer to the first u8 of the video memory, but a pointer to an array with a specific size.
Exactly. As shown in an earlier post, I am thinking of something like...

Code: Select all

u8 b[80*25] = 0x000B8000;
...which is an array that starts at a specified address.


Regards,
glauxosdever

Re: Language Design

Posted: Sun Oct 02, 2016 11:12 am
by Rusky
glauxosdever wrote:

Code: Select all

u8 b[80*25] = 0x000B8000;
How is this not a type error? 0xb8000 is not a u8[80*25]. I of course understand what you're trying to do, but if this is the syntax you pick how would you disambiguate it from this?

Code: Select all

u8 b[3] = { 1, 2, 3 };
For that matter, why bother sticking with C's nonsense type syntax at all?

Re: Language Design

Posted: Sun Oct 02, 2016 11:50 am
by glauxosdever
Hi,

Rusky wrote:How is this not a type error?
As you showed yourself, array elements are enclosed in curly brackets. But, yes, I may pick a better syntax for this. Some ideas for this, if interested?
Rusky wrote:For that matter, why bother sticking with C's nonsense type syntax at all?
What do you suggest adopting instead?


Regards,
glauxosdever

Re: Language Design

Posted: Sun Oct 02, 2016 12:11 pm
by simeonz
If the compiler can't ensure the divisor is non-zero, it will error right at compilation time.
First, forget about proving all program invariants automatically. It may be possible sometimes, but not in general. And this does not depend on the goal of the proof, but on the complexity of the program. Proving that a pointer is non-null could be as complex as proving that a number is always a perfect number. Why? Because the pointer may turn out to be non-null when the variable holds a perfect number.

There are static verification tools for doing compile time analysis for C and C++ (as well as other languages.) They can analyze the source code and signal errors through inference using only axioms from the language specification. Sometimes using other platform information as well. Some projects even try to specialize for driver development. In other words, there are tools that can find errors that the type system permitted and manifest during run-time. They may act as pre-processor during compilation and provide additional stream of warnings and errors. If a value held in a variable is inferred to be always positive, you may get warned of a redundant branch (indicating a reasoning flaw). In general, most proofs will be unfeasible for the static analyzer and can only be provided by the programmer. Thus proving correctness is not a mandatory language infrastructure.

The fact that it is hard to prove correctness does not mean that it is hard to engineer a program correctly. The programmer is using formulaic steps and conquers complexity methodically. Think, the difference between authoring a math problem and solving it. The only way to enable the programmer to demonstrate the correctness of all its programs it to enable him to provide hints for your compiler using a metalanguage. For complex algorithms, this will be a boon. But sometimes basic program code will require manual proof feeding and will incur significant loss of time for the development team. We are talking about seriously verbose formalism here, whatever the expression medium might otherwise be (could be extension of the source language itself).

Overall, I agree that C(++) lacks some constraints that would be desirable. But it is sometimes a matter of perspective. Pointers have notoriety for being dangerous low-level concept. Yet, C++'s iterators are pointer arithmetic over arbitrary sequences. And those are high-level constructs that you could create in Java. So, while language design choices impact the programming discipline, the general issue is broader than that.

Re: Language Design

Posted: Sun Oct 02, 2016 12:35 pm
by Schol-R-LEA
glauxosdever wrote:Hi,

Rusky wrote:How is this not a type error?
As you showed yourself, array elements are enclosed in curly brackets. But, yes, I may pick a better syntax for this. Some ideas for this, if interested?
Rusky wrote:For that matter, why bother sticking with C's nonsense type syntax at all?
What do you suggest adopting instead?
TBH, the Pascal/Ada syntax, while slightly more verbose, always made more sense to me:

Pascal

Code: Select all

var
    a: char;
    b: char^;
Modula-2

Code: Select all

(* unlike Pascal, Modula-2 is case-sensitive, 
   and all keywords are in ALLCAPS *)
VAR        
    a: CHAR;
    b: POINTER TO CHAR;

Ada

Code: Select all

declare
    type char_access is access character;   
    -- because Ada does not permit type equivalence, you never declare a
    -- a single variable as 'access foo', because that would make it a
    -- unique type; two variables, each declared 'access foo' but in
    -- separate declarations, would not be considered to be of the 
    -- same type.

    a: character;
    b: char_access;
Then again, I also prefer the Algol/Pascal/Ada assignment operator := , too; it makes it clear that you are setting a value, rather than asserting equality. A left arrow or left-bracket/hyphen combination <- would make even more sense to me, though.

Re: Language Design

Posted: Sun Oct 02, 2016 12:49 pm
by glauxosdever
Hi,

simeonz wrote:Proving that a pointer is non-null could be as complex as proving that a number is always a perfect number.
Shouldn't the programmer always check for null pointers? I would be interested to hear some examples where a programmer knows a pointer is non-null and the compiler does not.
simeonz wrote:Because the pointer may turn out to be non-null when the variable holds a perfect number.
I don't see how that relates.

I see however what you mean in general. You say static analysers can't detect everything. It might be the case. But what is wrong with having a language specifically designed for correctness and security?


Regards,
glauxosdever

Re: Language Design

Posted: Sun Oct 02, 2016 1:04 pm
by glauxosdever
Hi,

Schol-R-LEA wrote:Then again, I also prefer the Algol/Pascal/Ada assignment operator := , too; it makes it clear that you are setting a value, rather than asserting equality. A left arrow or left-bracket/hyphen combination <- would make even more sense to me, though.
This is something I have thought before about, and I am likely to think about again at a later time.

However, I am not sure whether to support C-style assignment shortcuts like += or ^=. If so, then writing +:= or +<- does not look like the best idea. That's why for now I'm writing snippets using = for assignment.

On a related note, = is planned not to return a value, therefore not being really an operator.


Regards,
glauxosdever

Re: Language Design

Posted: Sun Oct 02, 2016 2:03 pm
by Rusky
glauxosdever wrote:Shouldn't the programmer always check for null pointers? I would be interested to hear some examples where a programmer knows a pointer is non-null and the compiler does not.
A better idea might just be to provide a non-null pointer type that enables the compiler to enforce that no null value is ever constructed to begin with, and provide optional values some other way- an Option<T> or T? type, for example, depending on how deeply you want to integrate it into the language.

Re: Language Design

Posted: Sun Oct 02, 2016 2:12 pm
by glauxosdever
Hi,

Rusky wrote:
glauxosdever wrote:Shouldn't the programmer always check for null pointers? I would be interested to hear some examples where a programmer knows a pointer is non-null and the compiler does not.
A better idea might just be to provide a non-null pointer type that enables the compiler to enforce that no null value is ever constructed to begin with, and provide optional values some other way- an Option<T> or T? type, for example, depending on how deeply you want to integrate it into the language.
For a pointer that will never be null, the solution is to exclude the value 0 from the pointer's range. Something like...

Code: Select all

u8* range(0x00000000, 0xFFFFFFFF] ptr = something_non_null;
...can be used to define a pointer that will never be null.


Regards,
glauxosdever

Re: Language Design

Posted: Sun Oct 02, 2016 2:16 pm
by Kevin
glauxosdever wrote:
simeonz wrote:Proving that a pointer is non-null could be as complex as proving that a number is always a perfect number.
Shouldn't the programmer always check for null pointers? I would be interested to hear some examples where a programmer knows a pointer is non-null and the compiler does not.
I think such cases will always exist. In C, the compiler can't tell much about pointers stored in random heap objects because it would have to understand the data flow across the whole program in order to do that. But you can do at least a little better with a different language.

Anyway, the good thing is that the compiler doesn't even need to understand every possible program. It is already useful if you make sure that it errors out when it can't prove that the pointer is non-null. Then the programmer has to explicitly add either a null check or an assertion that will help the compiler to understand the situation.

One thing that you will probably want to have In order to keep the need for manual programmer interventions low, is a way to declare pointer types that can't hold null, so that you can simply declare a function parameter as "not null" and keep all the analysis done by the compiler local to a single function.

Re: Language Design

Posted: Sun Oct 02, 2016 2:26 pm
by Rusky
glauxosdever wrote:For a pointer that will never be null, the solution is to exclude the value 0 from the pointer's range. Something like...

Code: Select all

u8* range(0x00000000, 0xFFFFFFFF] ptr = something_non_null;
...can be used to define a pointer that will never be null.
Seems like a lot of extra work for something that should be the default, and non-portable work at that.

Re: Language Design

Posted: Sun Oct 02, 2016 2:34 pm
by simeonz
Shouldn't the programmer always check for null pointers? I would be interested to hear some examples where a programmer knows a pointer is non-null and the compiler does not.
A simple circular list. You will have to show that, because each node points to another node or itself, there can be no null "next" pointer while iterating. You have to cover the mutator methods of the list. This particular problem can be fixed by using special non-null pointer type as Rusky pointed out. But if you have generic list functionality, the basic methods will not use non-null type. The result will be casted in the circular list code, and your compiler will have to demonstrate correctness.

Another example. A dynamic object is created during initialization. How do you know that the initialization has occurred at least once before the call? The programmer could know.

If we are not talking about simply making a more fine-grained type system, but about actual inference, the programmer cannot know that the program will be validated. That is, if I write a program and the static analysis deems it correct, the program is correct. If the static analysis cannot deliver, it is not correct. This means that the code validity depends on the dynamic choices of the compiler.

The way I see it, there are three options:
  • Extend the core language with a fine-grained type system. You have to consider the utility of user-defined types and compare the trade-offs. Anything that the core types do not validate will force run-time checks.
  • Provide correctness rules. But those will have limited scope, and you will have to fall back to run-time checks as well.
  • Use static analysis and leave the correctness of the program open to interpretation. The programmer will not be able to asses the validity of the program independently of the compiler.
There are other options that I favor. Like making interactive environments with structured source code storage. This may offer better user experience. But this is a different method entirely.

Re: Language Design

Posted: Sun Oct 02, 2016 2:59 pm
by glauxosdever
Hi,

Rusky wrote:
glauxosdever wrote:For a pointer that will never be null, the solution is to exclude the value 0 from the pointer's range. Something like...

Code: Select all

u8* range(0x00000000, 0xFFFFFFFF] ptr = something_non_null;
...can be used to define a pointer that will never be null.
Seems like a lot of extra work for something that should be the default, and non-portable work at that.
I tend to agree. I really should invent a way to allow or disallow only specific values. This way the range would be full, minus zero. The equivalent symbolism in mathematics would be {1, 3, 7, 15} for allowed values and ℝ-{0} or ℕ-{0} for disallowed values.

Next thing to substitute the 0xFFFFFFFF above could be defining a PTR_MAX built-in constant which depends on the target architecture. This way, the above example would be rewritten as...

Code: Select all

u8* range(0x00000000, PTR_MAX] ptr = something_non_null;
...to have it portable between architectures.

As for the lot of extra work for something that should be the default (in this case the pointer that is never null), it would be nice to hear how to specify something that is not the default (in this case the pointer that can be null).


Regards,
glauxosdever

Re: Language Design

Posted: Sun Oct 02, 2016 3:09 pm
by glauxosdever
Hi,

simeonz wrote:
Shouldn't the programmer always check for null pointers? I would be interested to hear some examples where a programmer knows a pointer is non-null and the compiler does not.
A simple circular list. You will have to show that, because each node points to another node or itself, there can be no null "next" pointer while iterating.
The programmer should exclude 0 from the pointer's range then. The way to do it portably is yet to be specified unfortunately.
simeonz wrote:Another example. A dynamic object is created during initialization. How do you know that the initialization has occurred at least once before the call? The programmer could know.
Everything that is created by the code is being initialised. Therefore...

Code: Select all

u8 a;
...is an error, or causes a to be implicitly initialised to 0. I'm undecided.
simeonz wrote:If we are not talking about simply making a more fine-grained type system, but about actual inference, the programmer cannot know that the program will be validated. That is, if I write a program and the static analysis deems it correct, the program is correct. If the static analysis cannot deliver, it is not correct. This means that the code validity depends on the dynamic choices of the compiler.

The way I see it, there are three options:
  • Extend the core language with a fine-grained type system. You have to consider the utility of user-defined types and compare the trade-offs. Anything that the core types do not validate will force run-time checks.
  • Provide correctness rules. But those will have limited scope, and you will have to fall back to run-time checks as well.
  • Use static analysis and leave the correctness of the program open to interpretation. The programmer will not be able to asses the validity of the program independently of the compiler.
There are other options that I favor. Like making interactive environments with structured source code storage. This may offer better user experience. But this is a different method entirely.
I will look into that.


Regards,
glauxosdever