Unions, program proofs and the Halting Problem

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Unions, program proofs and the Halting Problem

Post by Rusky »

A continuation of the tangent that Brendan started in this post - Owen

Not to derail the thread, but I think that might even be impossible in theory. Imagine a halting-problem-like formulation- whatever algorithm the compiler uses to determine which member of the union is used can be implemented in the program itself, and then run on itself, and the program can then violate that result by using a member of the union the algorithm thought it wouldn't.

What you'd need to do is track which member is used, which is potentially changed when the union is written to. Because of the above problem, this state would in some cases need to be reified in the actual program, and you end up with a tagged union/algebraic data type. The Rust language actually has that built in as an extension to enums.

Which I guess is back on topic- sometimes you need dynamic behavior, and in the case of executing machine instructions, the best place for that is in the CPU.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Owen »

Proof that what Brendan is seeking is equivalent to the halting problem:

Code: Select all

union x { int i; float f; }

void func(union x *ptr)
{
    if(getc(stdin) & 1) {
        ptr->i = 1;
    } else {
        ptr->f = 1.f;
    }
}
What state does this leave *ptr in?

The only practical solution to the problem is tagged unions. Fortunately, this doesn't necessarily imply any additional overhead because you need to carry around type information anyway. Most other uses are (A) "Generic parameters" where the member used is by some opaque agreement between the sides, and (B) type punning (probably better supported as a "bitwise_cast") (which C offers in a defined manner via memcpy)
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: The Mill: a new low-power, high-performance CPU design

Post by Rusky »

The halting problem has nothing to do with I/O- it applies even if you know all inputs to the program.

Assume you have a function bool halts(program*) that takes a program definition. What does this code do?

Code: Select all

if (halts(&this_program)) { while (1); } else { exit(0); }
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Brendan »

Hi,
Owen wrote:Proof that what Brendan is seeking is equivalent to the halting problem:

Code: Select all

union x { int i; float f; }

void func(union x *ptr)
{
    if(getc(stdin) & 1) {
        ptr->i = 1;
    } else {
        ptr->f = 1.f;
    }
}
What state does this leave *ptr in?
Each member has 2 states: possibly not set last and definitely set last. After this function is called both members would be in the "possibly not set last" state; so if either member is read from after calling this function (but before a member is set again) you've proven that the union was not used safely. Whether the program halts or not is irrelevant.

For example:

Code: Select all

void halting(void) {
    union x myUnion;
    int foo;

    while( getc(stdin) != 'q') {  // Halting problem

        func(&myUnion);
        foo = myUnion.i;          // Definitely unsafe use of union

        myUnion.i = 3;
        foo = myUnion.i;          // Definitely safe use of union
    }
}
Rusky wrote:The halting problem has nothing to do with I/O- it applies even if you know all inputs to the program.

Assume you have a function bool halts(program*) that takes a program definition. What does this code do?

Code: Select all

if (halts(&this_program)) { while (1); } else { exit(0); }
If no IO is involved, you can determine if it halts or not by executing it and (at each step during execution) detecting if the current state matches any previous state. If it exits it definitely doesn't halt; and if it reaches a previous state it definitely does halt. The total number of possible states has a fixed upper bound; so you never need to execute it forever to determine if it halts or not.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Owen »

Brendan wrote:Hi,
Owen wrote:Proof that what Brendan is seeking is equivalent to the halting problem:

Code: Select all

union x { int i; float f; }

void func(union x *ptr)
{
    if(getc(stdin) & 1) {
        ptr->i = 1;
    } else {
        ptr->f = 1.f;
    }
}
What state does this leave *ptr in?
Each member has 2 states: possibly not set last and definitely set last. After this function is called both members would be in the "possibly not set last" state; so if either member is read from after calling this function (but before a member is set again) you've proven that the union was not used safely. Whether the program halts or not is irrelevant.
Hmm. I think that /for the conservative/ case you may be right. However, I also think that the conservative case is liable to be too conservative to be useful for practical.

I think "bare unions" are a bad thing in general code and suggest tagged unions (I recommend seeing Rust's Enumerations on this - note Rust's Option<T> union is how they implement safe nullable pointers, and the compiler has special logic which optimizes pointer + 1 other empty member enumerations as nullable pointers) instead. You then make the user handle the possibilities at consumption.
Brendan wrote:
Rusky wrote:The halting problem has nothing to do with I/O- it applies even if you know all inputs to the program.

Assume you have a function bool halts(program*) that takes a program definition. What does this code do?

Code: Select all

if (halts(&this_program)) { while (1); } else { exit(0); }
If no IO is involved, you can determine if it halts or not by executing it and (at each step during execution) detecting if the current state matches any previous state. If it exits it definitely doesn't halt; and if it reaches a previous state it definitely does halt. The total number of possible states has a fixed upper bound; so you never need to execute it forever to determine if it halts or not.
Actually, you can't. The halting problem is devised as function "h(x, i) takes function x and input i, and computes whether x halts when given input x", and is demonstrably undecidable. Note that for any function x(i) you can reformulate it as x'() = x(i) - so the presence of input or not is immaterial.

Consider the following program:

Code: Select all

int main()
{
    void *p = NULL;
    while(true) {
        void **q = malloc(sizeof(void*));
        *q = p;
        p = q;
    }
}
I think you'll agree it is immediately obvious (to human observers) that said program will never terminate on a machine with infinite memory (and also that it will eventually fail by SIGSEGV, SIGBUS or similar on a machine with finite memory). However, additionally, the state space is sufficient that it is infeasible to compute whether it will terminate (by such an algorithm as you propose) on a machine with finite memory. Indeed, a function which contains a single 64-bit number contains more states than can be represented by all the RAM in the world (assuming a bitfield representation of "seen"/"unseen" states). Stick four of those numbers together and it becomes in-computable if you were to transform the entire solar system into a gigantic maximally efficient quantum computer.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Brendan »

Hi,
Owen wrote:
Brendan wrote:Each member has 2 states: possibly not set last and definitely set last. After this function is called both members would be in the "possibly not set last" state; so if either member is read from after calling this function (but before a member is set again) you've proven that the union was not used safely. Whether the program halts or not is irrelevant.
Hmm. I think that /for the conservative/ case you may be right. However, I also think that the conservative case is liable to be too conservative to be useful for practical.

I think "bare unions" are a bad thing in general code and suggest tagged unions (I recommend seeing Rust's Enumerations on this - note Rust's Option<T> union is how they implement safe nullable pointers, and the compiler has special logic which optimizes pointer + 1 other empty member enumerations as nullable pointers) instead. You then make the user handle the possibilities at consumption.
I'm not convinced that (given the presence of pointers) tagged unions would solve anything. For a simple example:

Code: Select all

enum myEnum { IS_INT, IS_FLOAT };
union x { enum myEnum tag; int myInt; float myFloat; }

union x myUnion;
int *foo = &myUnion.i;

void func(union x *ptr) {
    if(getc(stdin) & 1) {
        *foo = 1;
    } else {
        ptr->tag = IS_FLOAT;
        ptr->f = 1.0;
    }
    if(ptr->tag == IS_FLOAT) {
        printf("%f\n", ptr->f);
    }
}
Also note that what I really want to do is optimise structures where possible (so that programmers don't need to use unions of any kind as a work around for the compiler's inability to optimise structures).
Owen wrote:
Brendan wrote:
Rusky wrote:The halting problem has nothing to do with I/O- it applies even if you know all inputs to the program.
If no IO is involved, you can determine if it halts or not by executing it and (at each step during execution) detecting if the current state matches any previous state. If it exits it definitely doesn't halt; and if it reaches a previous state it definitely does halt. The total number of possible states has a fixed upper bound; so you never need to execute it forever to determine if it halts or not.
Actually, you can't. The halting problem is devised as function "h(x, i) takes function x and input i, and computes whether x halts when given input x", and is demonstrably undecidable. Note that for any function x(i) you can reformulate it as x'() = x(i) - so the presence of input or not is immaterial.

Consider the following program:

Code: Select all

int main()
{
    void *p = NULL;
    while(true) {
        void **q = malloc(sizeof(void*));
        *q = p;
        p = q;
    }
}
I think you'll agree it is immediately obvious (to human observers) that said program will never terminate on a machine with infinite memory (and also that it will eventually fail by SIGSEGV, SIGBUS or similar on a machine with finite memory). However, additionally, the state space is sufficient that it is infeasible to compute whether it will terminate (by such an algorithm as you propose) on a machine with finite memory.
So you're saying that in theory (where you can have infinite memory) it's impossible to solve the halting problem, but in practice (where you can't have infinite memory) it's possible to solve the halting problem?
Owen wrote:Indeed, a function which contains a single 64-bit number contains more states than can be represented by all the RAM in the world (assuming a bitfield representation of "seen"/"unseen" states). Stick four of those numbers together and it becomes in-computable if you were to transform the entire solar system into a gigantic maximally efficient quantum computer.
Um, what? A single 64-bit number has a large number of states, but those large number of states can be stored in a 64-bit number. A set of four of these 64-bit numbers will add up to a "massive" total of 256 bits of state. If you remember the state after each step, then that's a total of "bits of state * number of steps" bits.

Let's think about "steps". What is the largest possible step that's needed? Each step could start and end on control transfers (e.g. a function call, function return or branch would end one step and start the next step) and any temporary variables (variables that are created and destroyed within the same step) needn't be stored as part of the state.

You'll also find that function calls increase the amount of state you need to store after each step, and function returns decrease the amount of state that you need to store after each step. More importantly; any state from all steps between a function call and that function's return can be discarded/freed when that function returns. In addition; when you store a new state you can store it as the differences between the new state and a previous state (you don't need to store all bits of state after each step).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Owen »

Brendan wrote:Hi,
Owen wrote:
Brendan wrote:Each member has 2 states: possibly not set last and definitely set last. After this function is called both members would be in the "possibly not set last" state; so if either member is read from after calling this function (but before a member is set again) you've proven that the union was not used safely. Whether the program halts or not is irrelevant.
Hmm. I think that /for the conservative/ case you may be right. However, I also think that the conservative case is liable to be too conservative to be useful for practical.

I think "bare unions" are a bad thing in general code and suggest tagged unions (I recommend seeing Rust's Enumerations on this - note Rust's Option<T> union is how they implement safe nullable pointers, and the compiler has special logic which optimizes pointer + 1 other empty member enumerations as nullable pointers) instead. You then make the user handle the possibilities at consumption.
I'm not convinced that (given the presence of pointers) tagged unions would solve anything. For a simple example:

Code: Select all

enum myEnum { IS_INT, IS_FLOAT };
union x { enum myEnum tag; int myInt; float myFloat; }

union x myUnion;
int *foo = &myUnion.i;

void func(union x *ptr) {
    if(getc(stdin) & 1) {
        *foo = 1;
    } else {
        ptr->tag = IS_FLOAT;
        ptr->f = 1.0;
    }
    if(ptr->tag == IS_FLOAT) {
        printf("%f\n", ptr->f);
    }
}
Also note that what I really want to do is optimise structures where possible (so that programmers don't need to use unions of any kind as a work around for the compiler's inability to optimise structures).
So before you can borrow u->i in Rust, the Enum must be of type Int. Once you've done the borrow, then you can't fiddle with the type of the union until the borrow concludes. If the compiler can't prove this, that is an error (i.e. your code contains unsafe behavior)
Brendan wrote:
Owen wrote:
Brendan wrote:
If no IO is involved, you can determine if it halts or not by executing it and (at each step during execution) detecting if the current state matches any previous state. If it exits it definitely doesn't halt; and if it reaches a previous state it definitely does halt. The total number of possible states has a fixed upper bound; so you never need to execute it forever to determine if it halts or not.
Actually, you can't. The halting problem is devised as function "h(x, i) takes function x and input i, and computes whether x halts when given input x", and is demonstrably undecidable. Note that for any function x(i) you can reformulate it as x'() = x(i) - so the presence of input or not is immaterial.

Consider the following program:

Code: Select all

int main()
{
    void *p = NULL;
    while(true) {
        void **q = malloc(sizeof(void*));
        *q = p;
        p = q;
    }
}
I think you'll agree it is immediately obvious (to human observers) that said program will never terminate on a machine with infinite memory (and also that it will eventually fail by SIGSEGV, SIGBUS or similar on a machine with finite memory). However, additionally, the state space is sufficient that it is infeasible to compute whether it will terminate (by such an algorithm as you propose) on a machine with finite memory.
So you're saying that in theory (where you can have infinite memory) it's impossible to solve the halting problem, but in practice (where you can't have infinite memory) it's possible to solve the halting problem?
So your proposed algorithm would detect that said program would eventually halt on a finite memory machine? Or would it fail when the prover ran out of memory?

(N.B. consider a nontrivial program and trying to prove whether or not that will halt. Especially consider the case where the program will run to completion successfully in the RAM your machine has, yet will cause your prover to exhaust all of the machine's memory)
Brendan wrote:
Owen wrote:Indeed, a function which contains a single 64-bit number contains more states than can be represented by all the RAM in the world (assuming a bitfield representation of "seen"/"unseen" states). Stick four of those numbers together and it becomes in-computable if you were to transform the entire solar system into a gigantic maximally efficient quantum computer.
Um, what? A single 64-bit number has a large number of states, but those large number of states can be stored in a 64-bit number. A set of four of these 64-bit numbers will add up to a "massive" total of 256 bits of state. If you remember the state after each step, then that's a total of "bits of state * number of steps" bits.

Let's think about "steps". What is the largest possible step that's needed? Each step could start and end on control transfers (e.g. a function call, function return or branch would end one step and start the next step) and any temporary variables (variables that are created and destroyed within the same step) needn't be stored as part of the state.

You'll also find that function calls increase the amount of state you need to store after each step, and function returns decrease the amount of state that you need to store after each step. More importantly; any state from all steps between a function call and that function's return can be discarded/freed when that function returns. In addition; when you store a new state you can store it as the differences between the new state and a previous state (you don't need to store all bits of state after each step).
Your algorithm is explicitly record steps which have occurred, when one repeats the program does not halt (it is deterministic).

The most compact representation of a set of states which have occurred is a bitfield of steps (So 2^64 in this trivial case).

Let us pit your algorithm (any algorithm!) up against the following:

Code: Select all

int main()
{
    trivium_ctx ctx;
    memset(&ctx, 0, sizeof ctx);

    trivium_init(&ctx);
    while(trivium_next64(&ctx));

    return 0;
}
Where trivium_next64 is a function which extracts the next 64-bits of output from the Trivium trivial (but so far secure, not that it matters here!) stream cipher.

Now: Trivium will produce 2^64 output bits before repetition. More concretely, it is guaranteed that it will at a minimum take 2^64 output bits, or 2^56 iterations of the above loop, before a state repeats.

Now, neither of us knows how many iterations it will take before (by lucky chance) it spits out a 64-bit zero result. But, assuming I haven't accidentally picked a duff starting condition, we know that it will take 2^55 iterations on average.

Can you manage that (When storing each state requires 288-bits)?
embryo

Re: Unions, program proofs and the Halting Problem

Post by embryo »

Owen wrote:

Code: Select all

int main()
{
    void *p = NULL;
    while(true) {
        void **q = malloc(sizeof(void*));
        *q = p;
        p = q;
    }
}
You are trying to have everything without giving everything. Your code looks better in such form:

Code: Select all

int main()
{
    while(true)
        malloc(sizeof(void*));
}
And even in such form:

Code: Select all

int main()
{
    while(true);
}
But what for we need such code?

It is wrong to expect the compiler will always perform efficiently when you will provide it with a senseless code.
Last edited by Owen on Fri Mar 07, 2014 7:06 am, edited 1 time in total.
Reason: Change title
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Unions, program proofs and the Halting Problem

Post by Rusky »

Not only is the halting problem independent of I/O, it's also independent of the number of possible states. It's not impossible to check if a program halts because there are too many states to check, it's impossible because it's a paradox.

To explain my previous example more in depth:

Assume you have a turing-complete language.
Assume you have written some function to determine if a program written in this language halts.
Now, write a program that calls that function on the program itself, and branches on the result.
If the function says the program will halt, go into an infinite loop.
If the function says the program will not halt, exit.
Thus, the function is incorrect.

It is logically impossible for a function to exist that will determine correctly in all cases whether a program written in a turing-complete language will halt. It simply cannot exist, because its existence would be a paradox.
embryo

Re: Unions, program proofs and the Halting Problem

Post by embryo »

Rusky wrote:It is logically impossible for a function to exist that will determine correctly in all cases whether a program written in a turing-complete language will halt. It simply cannot exist, because its existence would be a paradox.
Ok, there are halting functions. But do we need them? We need working functions, useful functions. And we can write such functions. And we can optimize such functions. Then why should we look at the halting problem? The halting problem and useful functions are different things. There is no connection between them. And a good compiler is a working function, not halting. That's why there shouldn't be a halting stopper in the discussion about compiler.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Unions, program proofs and the Halting Problem

Post by Rusky »

You don't know what you're talking about.

Halting functions are just functions that finish on their own, as opposed to running forever. The halting problem says you can't determine in the general case if a function will halt. Even on its own, this would be nice for a compiler to be able to check (but obviously not essential, as we get along fine without it).

It becomes more relevant when you notice that the halting problem is equivalent to many other problems that would be even nicer for a compiler to be able to solve, like Brendan wanting to statically determine if unions are used safely. This means there is a limit to what the compiler can know about programs (even/especially useful ones)- and that limit can be surpassed only at runtime, like with tagged unions or false alias resolution in the CPU.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Brendan »

Hi,
Owen wrote:
Brendan wrote:I'm not convinced that (given the presence of pointers) tagged unions would solve anything. For a simple example:
Brendan wrote:Also note that what I really want to do is optimise structures where possible (so that programmers don't need to use unions of any kind as a work around for the compiler's inability to optimise structures).
So before you can borrow u->i in Rust, the Enum must be of type Int. Once you've done the borrow, then you can't fiddle with the type of the union until the borrow concludes. If the compiler can't prove this, that is an error (i.e. your code contains unsafe behavior)
So, one thread sets the enum to "int" while another thread sets the enum to "float", then both threads...
Owen wrote:Let us pit your algorithm (any algorithm!) up against the following:

Code: Select all

int main()
{
    trivium_ctx ctx;
    memset(&ctx, 0, sizeof ctx);

    trivium_init(&ctx);
    while(trivium_next64(&ctx));

    return 0;
}
Where trivium_next64 is a function which extracts the next 64-bits of output from the Trivium trivial (but so far secure, not that it matters here!) stream cipher.

Now: Trivium will produce 2^64 output bits before repetition. More concretely, it is guaranteed that it will at a minimum take 2^64 output bits, or 2^56 iterations of the above loop, before a state repeats.

Now, neither of us knows how many iterations it will take before (by lucky chance) it spits out a 64-bit zero result. But, assuming I haven't accidentally picked a duff starting condition, we know that it will take 2^55 iterations on average.

Can you manage that (When storing each state requires 288-bits)?
You're saying that there's a finite/guaranteed maximum amount of memory (and also time?) that it will take before you can determine if this code definitely does halt or definitely does not halt; and therefore you're agreeing with me (whether it halts or not isn't an undecidable problem, it just takes sufficient resources). You're also going one step further and calculating the average amount of memory it will take.

Of course in the context of the original discussion (optimising structures) it still doesn't matter if the program halts or not. You only need to prove that for each place that a member is read, the member that's read is the member that was set last, and if (for any reason) you can't do that then you can't co-locate members.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: The Mill: a new low-power, high-performance CPU design

Post by Rusky »

Brendan wrote:So, one thread sets the enum to "int" while another thread sets the enum to "float", then both threads...
Rust statically enforces the absence of race conditions. You cannot share mutable values without a lock or marking the use unsafe (which is used internally in the standard library lock implementation).
Brendan wrote:Of course in the context of the original discussion (optimising structures) it still doesn't matter if the program halts or not. You only need to prove that for each place that a member is read, the member that's read is the member that was set last, and if (for any reason) you can't do that then you can't co-locate members.
The halting problem was an analogy. Determining which member was read last, without a run-time tag, is at least related, if not equivalent. Because of the possibility of converging code paths writing different members of a union (in which case you would use a tag even in C), if you want your optimizer to be at all useful it will in many cases reduce to checking whether you use the tag correctly. Doing that would be much easier if you just built the tag into the language, rather than trying to identify what the tag is, how it's encoded, etc.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: The Mill: a new low-power, high-performance CPU design

Post by Brendan »

Hi,
Rusky wrote:
Brendan wrote:So, one thread sets the enum to "int" while another thread sets the enum to "float", then both threads...
Rust statically enforces the absence of race conditions. You cannot share mutable values without a lock or marking the use unsafe (which is used internally in the standard library lock implementation).
I need to support assembly language or it'd be useless; and there won't be any standard library (or any standard library lock implementation). I'd also want to ensure that its possible to use structure/union members in block-free code.
Rusky wrote:
Brendan wrote:Of course in the context of the original discussion (optimising structures) it still doesn't matter if the program halts or not. You only need to prove that for each place that a member is read, the member that's read is the member that was set last, and if (for any reason) you can't do that then you can't co-locate members.
The halting problem was an analogy. Determining which member was read last, without a run-time tag, is at least related, if not equivalent. Because of the possibility of converging code paths writing different members of a union (in which case you would use a tag even in C), if you want your optimizer to be at all useful it will in many cases reduce to checking whether you use the tag correctly. Doing that would be much easier if you just built the tag into the language, rather than trying to identify what the tag is, how it's encoded, etc.
Given the choice between failing to co-locate structure members when it is possible, and making programmers learn and use an extra union thing plus an explicit tag; then the former is the "least worst" alternative.

For co-locating structure members; the halting problem is a broken anology. It assumes that I must have a correct decision (no "false positives" and no "false negatives") and this is a very wrong assumption. I do need "no false positives", because this leads to co-locating structure members that shouldn't be co-located (a compiler that automatically inserts bugs into perfect source code). However; I do not need "no false negatives" because this only means a missed opportunity for optimisation. If I only co-locate structure members when it's trivial to prove that it's safe, then that's good enough (but obviously "better" would be better).

The only real question here is; how much better than "only when trivial" is possible?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: The Mill: a new low-power, high-performance CPU design

Post by Rusky »

Brendan wrote:I need to support assembly language or it'd be useless; and there won't be any standard library (or any standard library lock implementation). I'd also want to ensure that its possible to use structure/union members in block-free code.
Rust supports inline assembly, and the lock part of its standard library should be trivially portable to or reimplemented in a freestanding environment. There's no reason you couldn't use structures/unions in block-free code either. A lock is only required if the variable is mutable and shared between threads (actually this is enforced by the type signatures of the library functions for threads).
Brendan wrote:Given the choice between failing to co-locate structure members when it is possible, and making programmers learn and use an extra union thing plus an explicit tag; then the former is the "least worst" alternative.

For co-locating structure members; the halting problem is a broken anology. It assumes that I must have a correct decision (no "false positives" and no "false negatives") and this is a very wrong assumption. I do need "no false positives", because this leads to co-locating structure members that shouldn't be co-located (a compiler that automatically inserts bugs into perfect source code). However; I do not need "no false negatives" because this only means a missed opportunity for optimisation. If I only co-locate structure members when it's trivial to prove that it's safe, then that's good enough (but obviously "better" would be better).

The only real question here is; how much better than "only when trivial" is possible?
The main reasons for using unions are generally not the trivial case, and I suspect they are not the possible cases either. The main reasons for unions always use tags anyway (or they're bitcasting, which you can do other ways). Thus, trying to auto-detect potentially overlapping members in structs is probably not worth it.

The alternative, enforcing safe tagged unions, is not really anything new (although I would ask, why is giving programmers a new tool a bad thing per se?). Not only that, it enables several useful patterns that are otherwise either unenforceable or overly verbose. In Rust, it looks like this:

Code: Select all

enum Shape {
    Circle { center: Point, radius: f64 },
    Rectangle { top_left: Point, bottom_right: Point }
}
It is accessed like this:

Code: Select all

match shape {
    Circle { radius: radius, .. } => f64::consts::PI * square(radius),
    Rectangle { top_left: top_left, bottom_right: bottom_right } => {
        (bottom_right.x - top_left.x) * (top_left.y - bottom_right.y)
    }
}
One example of a new feature (that's been mentioned before in this thread) is null safety using Option types. In Rust, pointers cannot be null. Thus, if you need to store a nullable pointer, you wrap the usual pointer type in a union:

Code: Select all

enum Option<T> {
    None,
    Some(T)
}

let msg = Some(~"hello"); // or let msg = None

match msg {
    Some(m) => io::println(*m),
    None => ()
}
This way, code that has an Option<T> is forced to do a null check by the compiler (of course you could wrap that in an "unwrap()" function that just fails on None if you want).
Post Reply