Unions, program proofs and the Halting Problem

Brendan · Post by **Brendan** » Sat Mar 08, 2014 3:42 am

Hi,

Rusky wrote:
Brendan wrote:I need to support assembly language or it'd be useless; and there won't be any standard library (or any standard library lock implementation). I'd also want to ensure that its possible to use structure/union members in block-free code.
Rust supports inline assembly, and the lock part of its standard library should be trivially portable to or reimplemented in a freestanding environment. There's no reason you couldn't use structures/unions in block-free code either. A lock is only required if the variable is mutable and shared between threads (actually this is enforced by the type signatures of the library functions for threads).

Rust supports inline assembly [and therefore there's no guarantee that the "type" was set correctly when any member of the union was set], and the lock part of its standard library should be trivially portable to or reimplemented in a freestanding environment [which is entirely irrelevant for me given that there is no libraries at all and therefore no difference between hosted and freestanding]. There's no reason you couldn't use structures/unions in block-free code either [except that you'd have to atomically write both the union member and the "type" at the same time, which is simply not possible in most cases]. A lock is only required if the variable is mutable and shared between threads (actually this is enforced by the type signatures of the library functions for threads [which means it's not enforced at all because anyone can write their own alternatives to the library functions instead]).

Rusky wrote:
Brendan wrote:Given the choice between failing to co-locate structure members when it is possible, and making programmers learn and use an extra union thing plus an explicit tag; then the former is the "least worst" alternative.

For co-locating structure members; the halting problem is a broken anology. It assumes that I must have a correct decision (no "false positives" and no "false negatives") and this is a very wrong assumption. I do need "no false positives", because this leads to co-locating structure members that shouldn't be co-located (a compiler that automatically inserts bugs into perfect source code). However; I do not need "no false negatives" because this only means a missed opportunity for optimisation. If I only co-locate structure members when it's trivial to prove that it's safe, then that's good enough (but obviously "better" would be better).

The only real question here is; how much better than "only when trivial" is possible?
The main reasons for using unions are generally not the trivial case, and I suspect they are not the possible cases either. The main reasons for unions always use tags anyway (or they're bitcasting, which you can do other ways). Thus, trying to auto-detect potentially overlapping members in structs is probably not worth it.

So you're suggesting that, given that I will not bother with unions at all, I should also not bother trying to co-locate structure members?

Rusky wrote:The alternative, enforcing safe tagged unions, is not really anything new (although I would ask, why is giving programmers a new tool a bad thing per se?). Not only that, it enables several useful patterns that are otherwise either unenforceable or overly verbose. In Rust, it looks like this:
Code: Select all
enum Shape {
    Circle { center: Point, radius: f64 },
    Rectangle { top_left: Point, bottom_right: Point }
}
It is accessed like this:
Code: Select all
match shape {
    Circle { radius: radius, .. } => f64::consts::PI * square(radius),
    Rectangle { top_left: top_left, bottom_right: bottom_right } => {
        (bottom_right.x - top_left.x) * (top_left.y - bottom_right.y)
    }
}

As far as a programmer should care, that's functionally identical to structure with a "type" field and a "switch(something->type) {". Why do you think programmers using a high level language should need to do explicit micro-optimisations by hand? A programmer shouldn't have to think about all of the potential use cases of their data structure before being forced to decide whether to use a structure or union.

What happens in 3 years time when the programmer that decided to use your unions realises that they need to add a "type = both_circle_and_rectangle" to the enum? Do they have to find all of the existing code that does anything with the old union and rewrite all of it so it works for structures instead before they can start thinking about adding any new code for the "both_circle_and_rectangle" case?

Rusky wrote:One example of a new feature (that's been mentioned before in this thread) is null safety using Option types. In Rust, pointers cannot be null. Thus, if you need to store a nullable pointer, you wrap the usual pointer type in a union:

If it allows assembly then pointers can be null. Of course null is only one of a very large number of invalid values that a pointer could contain; so caring about null and not caring about all of the other invalid values is relatively short-sighted.

Cheers,

Brendan

Owen · Post by **Owen** » Sat Mar 08, 2014 9:39 am

Brendan wrote:Hi,

Rusky wrote:
Brendan wrote:I need to support assembly language or it'd be useless; and there won't be any standard library (or any standard library lock implementation). I'd also want to ensure that its possible to use structure/union members in block-free code.
Rust supports inline assembly, and the lock part of its standard library should be trivially portable to or reimplemented in a freestanding environment. There's no reason you couldn't use structures/unions in block-free code either. A lock is only required if the variable is mutable and shared between threads (actually this is enforced by the type signatures of the library functions for threads).
Rust supports inline assembly [and therefore there's no guarantee that the "type" was set correctly when any member of the union was set], and the lock part of its standard library should be trivially portable to or reimplemented in a freestanding environment [which is entirely irrelevant for me given that there is no libraries at all and therefore no difference between hosted and freestanding]. There's no reason you couldn't use structures/unions in block-free code either [except that you'd have to atomically write both the union member and the "type" at the same time, which is simply not possible in most cases]. A lock is only required if the variable is mutable and shared between threads (actually this is enforced by the type signatures of the library functions for threads [which means it's not enforced at all because anyone can write their own alternatives to the library functions instead]).

Rust supports inline assembly in unsafe blocks. You know, the bits which get access to raw pointers and all. The bits which should immediately jump out at you because they say unsafe right next to them!

The bits which let you implement useful, safe-to-use libraries. The bits which are included in the language so you can build new useful components the standard library didn't anticipate (or because you are doing freestanding work and don't have access to the standard library)

Rust has "unsafe" because there are always going to be edge cases where it is tricky or impossible for the compiler to check. Instead, we have a bunch of humans check, and then as long as those humans were correct the rest of our code is safe. Or, in other words, it allows us to contain our dangerous bits of code in self-contained, easy to find units.

Yes, you can break things from the unsafe blocks. "Doctor, it hurts when I do this" "So don't do it?"

Brendan wrote:
Rusky wrote:
Brendan wrote:Given the choice between failing to co-locate structure members when it is possible, and making programmers learn and use an extra union thing plus an explicit tag; then the former is the "least worst" alternative.

For co-locating structure members; the halting problem is a broken anology. It assumes that I must have a correct decision (no "false positives" and no "false negatives") and this is a very wrong assumption. I do need "no false positives", because this leads to co-locating structure members that shouldn't be co-located (a compiler that automatically inserts bugs into perfect source code). However; I do not need "no false negatives" because this only means a missed opportunity for optimisation. If I only co-locate structure members when it's trivial to prove that it's safe, then that's good enough (but obviously "better" would be better).

The only real question here is; how much better than "only when trivial" is possible?
The main reasons for using unions are generally not the trivial case, and I suspect they are not the possible cases either. The main reasons for unions always use tags anyway (or they're bitcasting, which you can do other ways). Thus, trying to auto-detect potentially overlapping members in structs is probably not worth it.
So you're suggesting that, given that I will not bother with unions at all, I should also not bother trying to co-locate structure members?
Rusky wrote:The alternative, enforcing safe tagged unions, is not really anything new (although I would ask, why is giving programmers a new tool a bad thing per se?). Not only that, it enables several useful patterns that are otherwise either unenforceable or overly verbose. In Rust, it looks like this:
Code: Select all
enum Shape {
    Circle { center: Point, radius: f64 },
    Rectangle { top_left: Point, bottom_right: Point }
}
It is accessed like this:
Code: Select all
match shape {
    Circle { radius: radius, .. } => f64::consts::PI * square(radius),
    Rectangle { top_left: top_left, bottom_right: bottom_right } => {
        (bottom_right.x - top_left.x) * (top_left.y - bottom_right.y)
    }
}
As far as a programmer should care, that's functionally identical to structure with a "type" field and a "switch(something->type) {". Why do you think programmers using a high level language should need to do explicit micro-optimisations by hand? A programmer shouldn't have to think about all of the potential use cases of their data structure before being forced to decide whether to use a structure or union.

What happens in 3 years time when the programmer that decided to use your unions realises that they need to add a "type = both_circle_and_rectangle" to the enum? Do they have to find all of the existing code that does anything with the old union and rewrite all of it so it works for structures instead before they can start thinking about adding any new code for the "both_circle_and_rectangle" case?

You need to rewrite it all anyway, because you just added a new value of type. Or was there no "type" member, and your code was doing random things to an uninitialized "circle" or "rectangle" member?

Brendan wrote:
Rusky wrote:One example of a new feature (that's been mentioned before in this thread) is null safety using Option types. In Rust, pointers cannot be null. Thus, if you need to store a nullable pointer, you wrap the usual pointer type in a union:
If it allows assembly then pointers can be null. Of course null is only one of a very large number of invalid values that a pointer could contain; so caring about null and not caring about all of the other invalid values is relatively short-sighted.

Cheers,

Brendan

In Rust, unless unsafe blocks were used, all pointers are valid. Of course every program contains some unsafe code (hopefully only that from the standard library. Servo, Mozilla's experimental HTML rendering engine, manages this for example) that some human must check, but as long as the human was right then the compiler's assertions are true.

Brendan · Post by **Brendan** » Sat Mar 08, 2014 10:19 am

Hi,

Owen wrote:Rust supports inline assembly in unsafe blocks. You know, the bits which get access to raw pointers and all. The bits which should immediately jump out at you because they say unsafe right next to them!

The bits which let you implement useful, safe-to-use libraries. The bits which are included in the language so you can build new useful components the standard library didn't anticipate (or because you are doing freestanding work and don't have access to the standard library)

So, I'll just write myself a small function in assembly that accepts a pair of 64-bit integers and uses the first integer as an address to write the second integer, and then mark it as "unsafe". Now; either:

code that calls this function also has to be marked as unsafe (and therefore most code in most projects ends up correctly marked as "unsafe" due to "internally unsafe" library functions; and everyone learns to just slap "unsafe" on everything and ignore it); or
code that calls a function that's marked as unsafe can pretend that it's still safe even though it's not, and therefore most code ends up being unsafe but not marked as unsafe (creating a false sense of safety that's worse than nothing)

Owen wrote:Rust has "unsafe" because there are always going to be edge cases where it is tricky or impossible for the compiler to check. Instead, we have a bunch of humans check, and then as long as those humans were correct the rest of our code is safe. Or, in other words, it allows us to contain our dangerous bits of code in self-contained, easy to find units.

Excellent. I'll have to remember that Rust developers provide this service the next time I write thousands of lines of assembly and want someone to check it all.

Owen wrote:
Brendan wrote:As far as a programmer should care, that's functionally identical to structure with a "type" field and a "switch(something->type) {". Why do you think programmers using a high level language should need to do explicit micro-optimisations by hand? A programmer shouldn't have to think about all of the potential use cases of their data structure before being forced to decide whether to use a structure or union.

What happens in 3 years time when the programmer that decided to use your unions realises that they need to add a "type = both_circle_and_rectangle" to the enum? Do they have to find all of the existing code that does anything with the old union and rewrite all of it so it works for structures instead before they can start thinking about adding any new code for the "both_circle_and_rectangle" case?
You need to rewrite it all anyway, because you just added a new value of type. Or was there no "type" member, and your code was doing random things to an uninitialized "circle" or "rectangle" member?

I mostly use "default" to handle missing cases if they actually occur. This means I can add a new value for the "type" member without breaking anything, and then worry about implementing the new code after.

Cheers,

Brendan

Rusky · Post by **Rusky** » Sat Mar 08, 2014 11:14 am

Brendan wrote:So, I'll just write myself a small function in assembly that accepts a pair of 64-bit integers and uses the first integer as an address to write the second integer, and then mark it as "unsafe". Now; either:
code that calls this function also has to be marked as unsafe (and therefore most code in most projects ends up correctly marked as "unsafe" due to "internally unsafe" library functions; and everyone learns to just slap "unsafe" on everything and ignore it); or

code that calls a function that's marked as unsafe can pretend that it's still safe even though it's not, and therefore most code ends up being unsafe but not marked as unsafe (creating a false sense of safety that's worse than nothing)

Owen wrote:Rust has "unsafe" because there are always going to be edge cases where it is tricky or impossible for the compiler to check. Instead, we have a bunch of humans check, and then as long as those humans were correct the rest of our code is safe. Or, in other words, it allows us to contain our dangerous bits of code in self-contained, easy to find units.
Excellent. I'll have to remember that Rust developers provide this service the next time I write thousands of lines of assembly and want someone to check it all.

In Rust, certain operations are considered unsafe- raw pointers (i.e. non-owned and non-borrowed; it's nothing to do with runtime representation), inline assembly, etc. Functions can also be marked unsafe, in which case they can use unsafe operations and call other unsafe functions internally. If that were it, you would be right.

Instead, there are also unsafe blocks. These allow unsafe operations inside, but are a message to the compiler and the programmer that "I've made sure what's in here is safe, even though the compiler doesn't know it." APIs that wrap unsafe operations use safe functions with unsafe blocks inside, so that if you do run into a bug like a dangling pointer or a memory leak or whatever, you know the problem is in an unsafe block.

This way, you write the majority of your code with the safe features of the language and the safe APIs from libraries (remember the language features are verified as safe by the humans who designed the language, just like the libraries are verified by the humans who wrote them).

Brendan wrote:I mostly use "default" to handle missing cases if they actually occur. This means I can add a new value for the "type" member without breaking anything, and then worry about implementing the new code after.

Rust has a default case too...