OSDev.org

Posted: **Tue Mar 11, 2014 9:53 pm**

Brendan wrote:So in general, Rust's Options don't avoid branches in the caller but sometimes branches can be optimised out, while for my alternative exits the caller didn't have a branch to optimise out.

So in general, your language doesn't catch errors when you do something that could be caught in the type system but sometimes unused structure members can be optimized out, while with tagged unions the language just does what you mean.

Brendan wrote:I'm not dismissing pointers - for that case the flexibility/power of pointers outweigh the "harder to learn" disadvantage; and I can't think of any alternative that isn't worse (for low level programming).

I am dismissing unions - for that case there are no advantages to outweigh the "harder to learn" disadvantage. It's just pure hassle for no sane reason (other than making a compiler's optimiser easier to write).

I know you're not dismissing pointers. They were an example. You don't dismiss pointers because they are more useful than they are hard. I'm claiming that unions are also more useful than they are hard. Tagged unions are not about optimization, they are about intent and correctness. Whether you acknowledge it or not, there are useful types that are represented correctly by unions (and polymorphism) and incorrectly by structs. Semantically speaking.

For example, a superset of everything you would represent with a C enum.

Posted: **Tue Mar 11, 2014 11:00 pm**

Hi,

thepowersgang wrote:(this will possibly get me slightly flamed)

Yes, it's a direct consequence of the scientific method (which is a good thing).

Basically, a scientist proposes a theory (I propose the "unions aren't worthwhile" theory), then other scientists try to prove the theory is invalid (people try to show that unions are worthwhile), and the original scientist has to defend their theory by showing that the other scientist's proofs were flawed.

thepowersgang wrote:I prefer the idea of having tagged unions avaliable. Sure they're complex to learn, but the huge benefit is you can annotate that these two fields should NEVER be used at the same time, and the compiler will either compile-time check that each access is valid, or when it can't, emit an assertion to prevent undefined behavior (I guess it's kinda like C++ inheritance and dynamic casting, but with a simpler represenation)

With tagged unions, you communicate to the compiler your intention, and it makes sure that you don't accedentally go against that intention.

Is there a reason why programmers should care if two fields are used at the same time or not? From my point of view, it's a "tail wagging the dog" thing.

Programmers should only need to care about describing the actions that should be performed to achieve a desired result (and the programmer's "description of actions" determines if fields are used at the same time or not). A programmer should not describe if fields should be used at the same time or not (and the programmer's "description of fields" should not determine the actions that should be performed).

In other words, "ERROR: You can't access fields 'foo' and 'bar' in 'myThing' at the same time (because 'myThing' is a union)" should actually be "ERROR: You can't use a union for 'myThing' (because fields 'foo' and 'bar' are used at the same time)".

Of course the second error message makes it much more obvious that the language designers should apologise for failing to relieve the programmer of the responsibility of having to decide in the first place.

Also note that this is not a managed language. There will not be any automatically inserted run-time checks (e.g. assertions); and therefore it's impossible for the compiler to ensure that two fields are never be used at the same time.

Cheers,

Brendan

Posted: **Tue Mar 11, 2014 11:27 pm**

Hi,

Rusky wrote:
Brendan wrote:So in general, Rust's Options don't avoid branches in the caller but sometimes branches can be optimised out, while for my alternative exits the caller didn't have a branch to optimise out.
So in general, your language doesn't catch errors when you do something that could be caught in the type system but sometimes unused structure members can be optimized out, while with tagged unions the language just does what you mean.

My language doesn't catch "you should've used a structure but you used a union by mistake" errors because there are no unions and no way for programmers to cause this error in the first place. As an added bonus (for the error handling example) it avoids run-time branches, run-time checks and the extra space wasted by an Option's valid/invalid flag.

If you're trying to say this makes it several orders of magnitude more awesome than Rust, then I agree.

Rusky wrote:I'm claiming that unions are also more useful than they are hard. Tagged unions are not about optimization, they are about intent and correctness. Whether you acknowledge it or not, there are useful types that are represented correctly by unions (and polymorphism) and incorrectly by structs. Semantically speaking.

For example, a superset of everything you would represent with a C enum.

I'm arguing against untagged unions, and you're arguing against untagged unions.

Tagged unions are a separate issue. They simply don't work (they attempt to guarantee that unsafe usage will be detected, and then fail to detect unsafe usage).

Note: I'm not necessarily saying that tagged unions aren't worthwhile in theory; only that they fail to detect unsafe usage at compile time. If I could find a way to ensure that unsafe usage is always detected at compile time (including cases involving "pointer to union member" and inline assembly) then I'd be interested in having tagged unions in addition to the structure member co-location optimisation.

Cheers,

Brendan

Posted: **Wed Mar 12, 2014 1:02 am**

Hi,

Brendan wrote:Note: I'm not necessarily saying that tagged unions aren't worthwhile in theory; only that they fail to detect unsafe usage at compile time. If I could find a way to ensure that unsafe usage is always detected at compile time (including cases involving "pointer to union member" and inline assembly) then I'd be interested in having tagged unions in addition to the structure member co-location optimisation.

As an alternative, programmers could just do something like this:

Code: Select all

enum tagTypes = {
    typeNone = 0
    typeFoo
    typeBar
}

structdef myTaggedUnion = {
    type as tagTypes
    foo as u32              ;Trivial structure member co-location!
    bar as bool             ;Trivial structure member co-location!
}

functiondef setFoo(union as @myTaggedUnion, value as u32) (void) {
    union->type = typeFoo
    union->foo = value
}

functiondef getFoo(union as @myTaggedUnion, badType as exit) (value as u32) {
    if(union->type != typeFoo) {
        exit badType
    }
    value = union->foo
}

functiondef setBar(union as @myTaggedUnion, value as bool) (void) {
    union->type = typeBar
    union->foo = value
}

functiondef getBar(union as @myTaggedUnion, badType as exit) (value as bool) {
    if(union->type != typeBar) {
        exit badType
    }
    value = union->foo
}

functiondef clear(union as @myTaggedUnion) (void) {
    union->type = typeNone
}

functiondef isClear(union as @myTaggedUnion) (result as bool) {
    result = (union->type == typeNone)
}

This isn't safe because nothing prevents other code from accessing the "myTaggedUnion" structure member's fields directly; which is the same problem I'd have with "built-in" tagged unions (except that the compiler is not making any safety guarantees in this case).

It'd also be simple for a programmer to add "both foo and bar" if they need to later - they'd only need to change 2 existing lines of code before adding their "getBoth()" and "setBoth()" functions.

Cheers,

Brendan

Posted: **Wed Mar 12, 2014 8:31 am**

Brendan wrote:I'm arguing against untagged unions, and you're arguing against untagged unions.

Yes.

Brendan wrote:Tagged unions are a separate issue. They simply don't work (they attempt to guarantee that unsafe usage will be detected, and then fail to detect unsafe usage).

In Rust, they guarantee that tagged unions are used safely to the same degree that you can guarantee a ranged integer is used safely, using a combination of pointer lifetimes and scoping. Inline assembly is unsafe anyway, so if it uses a union incorrectly then you know exactly where the problem is. In other cases, it forces all enum member access through a match (basically a glorified switch on the tag):

Code: Select all

enum TaggedUnion {
  None,
  Foo(u32),
  Bar(bool)
}

// these are rather redundant as using their contents directly is simpler/more flexible, doesn't require mutable variables, and doesn't use Option

fn setFoo(union : &mut TaggedUnion, value : u32) { *union = Foo(value); }
fn getFoo(union : &TaggedUnion) -> Option<u32> {
    match *union {
        Foo(value) => value,
        _ => None
    }
}

fn setBar(union : &mut TaggedUnion, value : bool) { *union = Bar(value); }
fn getBar(union : &TaggedUnion) -> Option<bool> {
    match *union {
        Bar(value) => value,
        _ => None
    }
}

fn clear(union : &mut TaggedUnion) { *union = None; }
fn isClear(union : &TaggedUnion) -> bool {
    match *union {
        None => true,
        _ => false
    }
}

// 'x is a lifetime; globals can only hold pointers with lifetime 'static, meaning the whole program's lifetime
static mut p : &'static u32;

fn do_something(x : &mut u32) {
    p = *x; // ERROR: x does not live long enough
    *x = 36; // this is fine
}

struct Struct {
    p : &u32
}

// specify that s and x both have the same lifetime, 'r, whatever that is when this gets called
fn do_something_else<'r>(s : &'r mut Struct, x : &'r u32) {
    s.p = x; // good: x will not go out of scope before s
}

fn main() {
    // with other enums in scope that use None, like Option, the type could not be inferred, so let mut t : TaggedUnion = None;
    let mut t = None;

    setFoo(t, 42); // or just t = Foo(42);
    println!("foo = {}", getFoo(f).unwrap()); // just fail!() on None; unwrap will be inlined and optimized out

    t.Bar = false; // ERROR: can't access enum members this way, must use match

    let p;
    let mut s1 : Struct; // type specified for clarity
    match t {
        Foo(ref foo) => {
            // ERROR: foo does not live long enough; reference must be valid for the body of main...
            // ...but borrowed value is only valid for the match
            p = &foo;

            // this is fine because the reference only lives for the body of do_something
            do_something(&foo);

            let mut s2 : Struct;
            do_something_else(&s1, &foo); // ERROR: foo does not live long enough
            do_something_else(&s2, &foo); // this is fine
        }
        _ => {}
    }

    let p2 = &t; // this is fine
}

As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.

Posted: **Thu Mar 13, 2014 12:02 am**

Hi,

Rusky wrote:
Brendan wrote:Tagged unions are a separate issue. They simply don't work (they attempt to guarantee that unsafe usage will be detected, and then fail to detect unsafe usage).
In Rust, they guarantee that tagged unions are used safely to the same degree that you can guarantee a ranged integer is used safely, using a combination of pointer lifetimes and scoping. Inline assembly is unsafe anyway, so if it uses a union incorrectly then you know exactly where the problem is. In other cases, it forces all enum member access through a match (basically a glorified switch on the tag):

I don't want to force all enum member accesses through a glorified switch. For example, imagine if you've got some TaggedUnion structures floating around your code (where a switch is needed), plus a linked list of TaggedUnion structures where you know "TaggedUnion.Foo" is used and don't need to check, plus a linked list of TaggedUnion structures where you know "TaggedUnion.Bar" is used and don't need to check.

Rusky wrote:As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.

I've just been talking to the project manager. Apparently the QA team noticed some stability issues (on a 2-core test machine with 3 users) and traced it back to race conditions in "myTaggedUnion". The project manager was really worried because the production machine is expected to be a 32 core machine handling 500 or more users, and "myTaggedUnion" is going to get pounded from all directions. Basically; it has to be thread safe and it currently isn't. I also asked the project manager if "both foo and bar" will be needed or not, and they won't know until other parts of the project are done.

Here's my thread safe version. Acquiring and releasing a mutex isn't too fast, so I added some optimised versions for 80x86 (atomic reads/writes to avoid the mutex). For the optimised version the compiler should reduce the "myTaggedUnion" down to a single 64-bit integer (removing all other fields), while for the HLL version (for other CPUs) the compiler would remove the "asmdata" field.

Code: Select all

enum tagTypes = {
    typeNone = 0
    typeFoo
    typeBar
}

structdef myTaggedUnion = {
    mutex lock
    type as tagTypes
    foo as u32              ;Trivial structure member co-location!
    bar as bool             ;Trivial structure member co-location!
    asmdata as u64          ;Only used by assembly
}

multitarget setFoo(union as @myTaggedUnion, value as u32) (void) {

    functiondef setFoo(union as @myTaggedUnion, value as u32) (void) {
        acquireMutex(@(union->lock))
        union->type = typeFoo
        union->foo = value
        releaseMutex(@(union->lock))
    }

    asmfunction x86_64 setFoo(union as @myTaggedUnion in rdi, value as u32 in rbx) (void) {
        mov rax,typeFoo << 32
        or rax,rbx
        mov [rdi+offset union.asmdata],rax
    }

    asmfunction x86_32 setFoo(union as @myTaggedUnion in edi, value as u32 in ebx) (void) {
        mov ecx,typeFoo
        mov eax,[edi+offset union.asmdata]
        mov edx,[edi+offset union.asmdata+4]
    retry:
        lock cmpxchg8b [edi+offset union.asmdata]     ;"cmpxchg8b" used as an atomic 64-bit write
        jne retry
    }

}


multitarget getFoo(union as @myTaggedUnion, badType as exit) (value as u32) {

    functiondef getFoo(union as @myTaggedUnion, badType as exit) (value as u32) {
        acquireMutex(@(union->lock))
        if(union->type != typeFoo) {
            releaseMutex(@(union->lock))
            exit badType
        }
        value = union->foo
        releaseMutex(@(union->lock))
    }

    asmfunction x86_64 getFoo(union as @myTaggedUnion in rsi, badType as exit in ebp) (value as u32 in rax) {
        mov rax,[rdi+offset union.asmdata]
        shld rbx,rax,32
        cmp ebx,typeFoo
        je OK
        exit ebp
    OK:
        movzx rax,eax
    }

    asmfunction x86_32 getFoo(union as @myTaggedUnion in esi, badType as exit in ebp) (value as u32 in eax) {
        clr eax
        clr edx
        clr ebx
        clr ecx
        lock cmpxchg8b [edi+offset union.asmdata]     ;"cmpxchg8b" used as an atomic 64-bit read
        cmp edx,typeFoo
        je OK
        exit ebp
    OK:
        movzx rax,eax
    }

}

It'd still be easy to have a "both foo and bar" (even with thread safe code and the optimised assembly versions). Also note that no code in any caller had to be changed at all to do any of this.

Cheers,

Brendan

Posted: **Thu Mar 13, 2014 7:19 am**

Brendan wrote:
Code: Select all
mov eax,[edi+offset union.asmdata]

Sorry, but to my opinion the indirect addressing in the code looks a bit unnatural. The offset variable is undefined and then I suppose it is a language keyword, but wouldn't it be more natural to write:

Code: Select all

mov eax,[edi+union.asmdata]

If we remove the extra keyword, then the result looks more like assembly and still is quite clear.

Posted: **Thu Mar 13, 2014 7:48 am**

Brendan wrote:I don't want to force all enum member accesses through a glorified switch. For example, imagine if you've got some TaggedUnion structures floating around your code (where a switch is needed), plus a linked list of TaggedUnion structures where you know "TaggedUnion.Foo" is used and don't need to check, plus a linked list of TaggedUnion structures where you know "TaggedUnion.Bar" is used and don't need to check.

Safety, or no glorified switch? Your choice (in fact in Rust it really is your choice- use unsafe). Your example is flawed here because you've incorrectly typed your linked list- if one list contains all Foo and one all Bar than saying they're lists of TaggedUnion is incorrect. Using TaggedUnion as the type means "this can be Foo or Bar," just like using a pointer in C means "this can be the address of some object or NULL." The difference is that in Rust, you can have (literally and by analogy) pointers that are not "or NULL" and when something can be other values than you expect, the compiler catches that error for you.

And if your type could include both, it's not a good use case for a tagged union. It's a collection of optional values. Tagged unions are a hammer. Your example is not a nail.

Posted: **Thu Mar 13, 2014 9:18 am**

Brendan wrote:
Rusky wrote:As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.
I've just been talking to the project manager. Apparently the QA team noticed some stability issues (on a 2-core test machine with 3 users) and traced it back to race conditions in "myTaggedUnion". The project manager was really worried because the production machine is expected to be a 32 core machine handling 500 or more users, and "myTaggedUnion" is going to get pounded from all directions. Basically; it has to be thread safe and it currently isn't. I also asked the project manager if "both foo and bar" will be needed or not, and they won't know until other parts of the project are done.

Why is there a race condition? Good languages don't let you write race conditions (Rust doesn't let you write race conditions)

(OK, unsafe blocks do, but complaining about unsafe blocks is morally equivalent to complaining that you can trash memory when interacting with Java through the JNI: Completely missing the point)

Posted: **Thu Mar 13, 2014 7:23 pm**

Hi,

embryo wrote:
Brendan wrote:
Code: Select all
mov eax,[edi+offset union.asmdata]
Sorry, but to my opinion the indirect addressing in the code looks a bit unnatural. The offset variable is undefined and then I suppose it is a language keyword, but wouldn't it be more natural to write:
Code: Select all
mov eax,[edi+union.asmdata]
If we remove the extra keyword, then the result looks more like assembly and still is quite clear.

In this case EDI contains the address of the structure, "offset" is a keyword (not a variable) which means "get the offset of a member within a structure", and "union.asmdata" says which member of which structure to get the offset of.

It is a confusing issue though. For example, in HLL "myStructure.myMember" is the value in the member, but in NASM "myStructure.myMember" would be the offset of the member and not its value. This is what I want to avoid (the same thing means different things in different places). For HLL (e.g. if EDI was a pointer) you'd do "edi->myMember" or "(*edi).member", but that's strange in assembly (where square brackets are used for the equivalent of pointer dereferencing) and would mean the assembler has to know the types of all registers (which would be complicated and likely to fail in non-trivial cases) just to figure out what type of structure it is.

To confuse things more; some assemblers (e.g. MASM) also have an offset keyword; but it's used for stupid things (finding the address of a variable, or more correctly, the offset of a variable in a segment). To find an address of a variable I'd want the same syntax as HLL, which would be "&myVariable" in C or "#myVariable" in my language.

I do have a concept called "data properties" though. For example, if you've got a variable "foo" then "foo.max" gets the maximum value the variable can hold, "foo.size" gets the amount of bytes the variable consumes, etc. What I'm considering doing is adding an "offset" property, so that people can do "myStructure.myMember.offset" to get the offset of myMember in the structure. In that case the original instruction would become "mov eax,[edi+union.asmdata.offset]" or (using the structure type's name directly instead of the function's input parameter name) "mov eax,[edi+myTaggedUnion.asmdata.offset]".

Cheers,

Brendan

Posted: **Thu Mar 13, 2014 7:57 pm**

Hi,

Rusky wrote:
Brendan wrote:I don't want to force all enum member accesses through a glorified switch. For example, imagine if you've got some TaggedUnion structures floating around your code (where a switch is needed), plus a linked list of TaggedUnion structures where you know "TaggedUnion.Foo" is used and don't need to check, plus a linked list of TaggedUnion structures where you know "TaggedUnion.Bar" is used and don't need to check.
Safety, or no glorified switch? Your choice (in fact in Rust it really is your choice- use unsafe). Your example is flawed here because you've incorrectly typed your linked list- if one list contains all Foo and one all Bar than saying they're lists of TaggedUnion is incorrect.

Let's add a "processLinkedList(firstEntryInList as @myTaggedUnion)" function that's used to processes both lists.

Also note that what I said is what I meant - a "linked list of TaggedUnion structures". This is quite different to a "linked list of pointers to TaggedUnion structures". For example:

Code: Select all

structdef myTaggedUnion = {
    next as @myTaggedUnion  ;Pointer to next structure on linked list
    type as tagTypes
    foo as u32              ;Trivial structure member co-location!
    bar as bool             ;Trivial structure member co-location!
}

Cheers,

Brendan

Posted: **Thu Mar 13, 2014 8:30 pm**

Whether it's a list of unions or pointers to unions is irrelevant. It's still the wrong type of the whole list is guaranteed to be Foos.

Posted: **Thu Mar 13, 2014 8:58 pm**

Hi,

Owen wrote:
Brendan wrote:
Rusky wrote:As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.
I've just been talking to the project manager. Apparently the QA team noticed some stability issues (on a 2-core test machine with 3 users) and traced it back to race conditions in "myTaggedUnion". The project manager was really worried because the production machine is expected to be a 32 core machine handling 500 or more users, and "myTaggedUnion" is going to get pounded from all directions. Basically; it has to be thread safe and it currently isn't. I also asked the project manager if "both foo and bar" will be needed or not, and they won't know until other parts of the project are done.
Why is there a race condition?

One thread on one CPU checks the tag; then some other thread on a different CPU changes the tag and the data; then the first thread (that has already checked the tag) reads the data.

Owen wrote:Good languages don't let you write race conditions (Rust doesn't let you write race conditions)

If race conditions aren't possible; then lock-free and block-free algorithms aren't possible either. In that case it's not a good language, it's an unusable language.

Cheers,

Brendan

Posted: **Thu Mar 13, 2014 11:42 pm**

Brendan wrote:If race conditions aren't possible; then lock-free and block-free algorithms aren't possible either. In that case it's not a good language, it's an unusable language.

They're only impossible without unsafe blocks. The standard library currently has message passing, shared immutable refcounted data, and mutex-protected shared mutable data, based on unsafe blocks in functions that expose interfaces that, because of the type system, are impossible to create races with. Just wrap your lock-free implementation in unsafe blocks behind a similar interface.

Thus, you won't get that bug report about race conditions, because just flinging some shared state in there will error, and you'll have to either put a mutex on it (possibly from the standard library, this part of which can easily be freestanding) which is what you did in response to a bug report rather than a compiler error, or use/implement a lock-free system. Either way the language forces you to acknowledge the possibility of a race condition, just like it forces you to acknowledge the possibility of a null pointer, or the possibility of other union types.

Posted: **Fri Mar 14, 2014 1:12 am**

Hi,

Rusky wrote:
Brendan wrote:If race conditions aren't possible; then lock-free and block-free algorithms aren't possible either. In that case it's not a good language, it's an unusable language.
They're only impossible without unsafe blocks. The standard library currently has message passing, shared immutable refcounted data, and mutex-protected shared mutable data, based on unsafe blocks in functions that expose interfaces that, because of the type system, are impossible to create races with. Just wrap your lock-free implementation in unsafe blocks behind a similar interface.

So what you're saying is that the reason you didn't provide an example to show that it's possible (while still making it look the same to other code) is that it's not possible to do this in Rust without breaking all the existing code that uses it.

Cheers,

Brendan

OSDev.org

Unions, program proofs and the Halting Problem

Re: The Mill: a new low-power, high-performance CPU design

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem

Re: Unions, program proofs and the Halting Problem