Page 4 of 10
Re: The Mill: a new low-power, high-performance CPU design
Posted: Tue Mar 11, 2014 9:53 pm
by Rusky
Brendan wrote:So in general, Rust's Options don't avoid branches in the caller but sometimes branches can be optimised out, while for my alternative exits the caller didn't have a branch to optimise out.
So in general, your language doesn't catch errors when you do something that could be caught in the type system but sometimes unused structure members can be optimized out, while with tagged unions the language just does what you mean.
Brendan wrote:I'm not dismissing pointers - for that case the flexibility/power of pointers outweigh the "harder to learn" disadvantage; and I can't think of any alternative that isn't worse (for low level programming).
I am dismissing unions - for that case there are no advantages to outweigh the "harder to learn" disadvantage. It's just pure hassle for no sane reason (other than making a compiler's optimiser easier to write).
I know you're not dismissing pointers. They were an example. You don't dismiss pointers because they are more useful than they are hard. I'm claiming that unions are also more useful than they are hard. Tagged unions are not about optimization, they are about intent and correctness. Whether you acknowledge it or not, there
are useful types that are represented correctly by unions (and polymorphism) and incorrectly by structs. Semantically speaking.
For example, a superset of everything you would represent with a C enum.
Re: Unions, program proofs and the Halting Problem
Posted: Tue Mar 11, 2014 11:00 pm
by Brendan
Hi,
thepowersgang wrote:(this will possibly get me slightly flamed)
Yes, it's a direct consequence of the scientific method (which is a good thing).
Basically, a scientist proposes a theory (I propose the "unions aren't worthwhile" theory), then other scientists try to prove the theory is invalid (people try to show that unions are worthwhile), and the original scientist has to defend their theory by showing that the other scientist's proofs were flawed.
thepowersgang wrote:I prefer the idea of having tagged unions avaliable. Sure they're complex to learn, but the huge benefit is you can annotate that these two fields should NEVER be used at the same time, and the compiler will either compile-time check that each access is valid, or when it can't, emit an assertion to prevent undefined behavior (I guess it's kinda like C++ inheritance and dynamic casting, but with a simpler represenation)
With tagged unions, you communicate to the compiler your intention, and it makes sure that you don't accedentally go against that intention.
Is there a reason why programmers should care if two fields are used at the same time or not? From my point of view, it's a "tail wagging the dog" thing.
Programmers should only need to care about describing the actions that should be performed to achieve a desired result (and the programmer's "description of actions" determines if fields are used at the same time or not). A programmer should not describe if fields should be used at the same time or not (and the programmer's "description of fields" should not determine the actions that should be performed).
In other words, "
ERROR: You can't access fields 'foo' and 'bar' in 'myThing' at the same time (because 'myThing' is a union)" should actually be "
ERROR: You can't use a union for 'myThing' (because fields 'foo' and 'bar' are used at the same time)".
Of course the second error message makes it much more obvious that the language designers should apologise for failing to relieve the programmer of the responsibility of having to decide in the first place.
Also note that this is not a managed language. There will not be any automatically inserted run-time checks (e.g. assertions); and therefore it's impossible for the compiler to ensure that two fields are never be used at the same time.
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Tue Mar 11, 2014 11:27 pm
by Brendan
Hi,
Rusky wrote:Brendan wrote:So in general, Rust's Options don't avoid branches in the caller but sometimes branches can be optimised out, while for my alternative exits the caller didn't have a branch to optimise out.
So in general, your language doesn't catch errors when you do something that could be caught in the type system but sometimes unused structure members can be optimized out, while with tagged unions the language just does what you mean.
My language doesn't catch "you should've used a structure but you used a union by mistake" errors because there are no unions and no way for programmers to cause this error in the first place. As an added bonus (for the error handling example) it avoids run-time branches, run-time checks and the extra space wasted by an Option's valid/invalid flag.
If you're trying to say this makes it several orders of magnitude more awesome than Rust, then I agree.
Rusky wrote:I'm claiming that unions are also more useful than they are hard. Tagged unions are not about optimization, they are about intent and correctness. Whether you acknowledge it or not, there are useful types that are represented correctly by unions (and polymorphism) and incorrectly by structs. Semantically speaking.
For example, a superset of everything you would represent with a C enum.
I'm arguing against untagged unions, and you're arguing against untagged unions.
Tagged unions are a separate issue. They simply don't work (they attempt to guarantee that unsafe usage will be detected, and then fail to detect unsafe usage).
Note: I'm not necessarily saying that tagged unions aren't worthwhile in theory; only that they fail to detect unsafe usage at compile time. If I could find a way to ensure that unsafe usage is always detected at compile time (including cases involving "pointer to union member" and inline assembly) then I'd be interested in having tagged unions in addition to the structure member co-location optimisation.
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Wed Mar 12, 2014 1:02 am
by Brendan
Hi,
Brendan wrote:Note: I'm not necessarily saying that tagged unions aren't worthwhile in theory; only that they fail to detect unsafe usage at compile time. If I could find a way to ensure that unsafe usage is always detected at compile time (including cases involving "pointer to union member" and inline assembly) then I'd be interested in having tagged unions in addition to the structure member co-location optimisation.
As an alternative, programmers could just do something like this:
Code: Select all
enum tagTypes = {
typeNone = 0
typeFoo
typeBar
}
structdef myTaggedUnion = {
type as tagTypes
foo as u32 ;Trivial structure member co-location!
bar as bool ;Trivial structure member co-location!
}
functiondef setFoo(union as @myTaggedUnion, value as u32) (void) {
union->type = typeFoo
union->foo = value
}
functiondef getFoo(union as @myTaggedUnion, badType as exit) (value as u32) {
if(union->type != typeFoo) {
exit badType
}
value = union->foo
}
functiondef setBar(union as @myTaggedUnion, value as bool) (void) {
union->type = typeBar
union->foo = value
}
functiondef getBar(union as @myTaggedUnion, badType as exit) (value as bool) {
if(union->type != typeBar) {
exit badType
}
value = union->foo
}
functiondef clear(union as @myTaggedUnion) (void) {
union->type = typeNone
}
functiondef isClear(union as @myTaggedUnion) (result as bool) {
result = (union->type == typeNone)
}
This isn't safe because nothing prevents other code from accessing the "myTaggedUnion" structure member's fields directly; which is the same problem I'd have with "built-in" tagged unions (except that the compiler is not making any safety guarantees in this case).
It'd also be simple for a programmer to add "both foo and bar" if they need to later - they'd only need to change 2 existing lines of code before adding their "getBoth()" and "setBoth()" functions.
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Wed Mar 12, 2014 8:31 am
by Rusky
Brendan wrote:I'm arguing against untagged unions, and you're arguing against untagged unions.
Yes.
Brendan wrote:Tagged unions are a separate issue. They simply don't work (they attempt to guarantee that unsafe usage will be detected, and then fail to detect unsafe usage).
In Rust, they guarantee that tagged unions are used safely to the same degree that you can guarantee a ranged integer is used safely, using a combination of pointer lifetimes and scoping. Inline assembly is unsafe anyway, so if it uses a union incorrectly then you know exactly where the problem is. In other cases, it forces all enum member access through a match (basically a glorified switch on the tag):
Code: Select all
enum TaggedUnion {
None,
Foo(u32),
Bar(bool)
}
// these are rather redundant as using their contents directly is simpler/more flexible, doesn't require mutable variables, and doesn't use Option
fn setFoo(union : &mut TaggedUnion, value : u32) { *union = Foo(value); }
fn getFoo(union : &TaggedUnion) -> Option<u32> {
match *union {
Foo(value) => value,
_ => None
}
}
fn setBar(union : &mut TaggedUnion, value : bool) { *union = Bar(value); }
fn getBar(union : &TaggedUnion) -> Option<bool> {
match *union {
Bar(value) => value,
_ => None
}
}
fn clear(union : &mut TaggedUnion) { *union = None; }
fn isClear(union : &TaggedUnion) -> bool {
match *union {
None => true,
_ => false
}
}
// 'x is a lifetime; globals can only hold pointers with lifetime 'static, meaning the whole program's lifetime
static mut p : &'static u32;
fn do_something(x : &mut u32) {
p = *x; // ERROR: x does not live long enough
*x = 36; // this is fine
}
struct Struct {
p : &u32
}
// specify that s and x both have the same lifetime, 'r, whatever that is when this gets called
fn do_something_else<'r>(s : &'r mut Struct, x : &'r u32) {
s.p = x; // good: x will not go out of scope before s
}
fn main() {
// with other enums in scope that use None, like Option, the type could not be inferred, so let mut t : TaggedUnion = None;
let mut t = None;
setFoo(t, 42); // or just t = Foo(42);
println!("foo = {}", getFoo(f).unwrap()); // just fail!() on None; unwrap will be inlined and optimized out
t.Bar = false; // ERROR: can't access enum members this way, must use match
let p;
let mut s1 : Struct; // type specified for clarity
match t {
Foo(ref foo) => {
// ERROR: foo does not live long enough; reference must be valid for the body of main...
// ...but borrowed value is only valid for the match
p = &foo;
// this is fine because the reference only lives for the body of do_something
do_something(&foo);
let mut s2 : Struct;
do_something_else(&s1, &foo); // ERROR: foo does not live long enough
do_something_else(&s2, &foo); // this is fine
}
_ => {}
}
let p2 = &t; // this is fine
}
As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 12:02 am
by Brendan
Hi,
Rusky wrote:Brendan wrote:Tagged unions are a separate issue. They simply don't work (they attempt to guarantee that unsafe usage will be detected, and then fail to detect unsafe usage).
In Rust, they guarantee that tagged unions are used safely to the same degree that you can guarantee a ranged integer is used safely, using a combination of pointer lifetimes and scoping. Inline assembly is unsafe anyway, so if it uses a union incorrectly then you know exactly where the problem is. In other cases, it forces all enum member access through a match (basically a glorified switch on the tag):
I don't want to force all enum member accesses through a glorified switch. For example, imagine if you've got some TaggedUnion structures floating around your code (where a switch is needed), plus a linked list of TaggedUnion structures where you know "TaggedUnion.Foo" is used and don't need to check, plus a linked list of TaggedUnion structures where you know "TaggedUnion.Bar" is used and don't need to check.
Rusky wrote:As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.
I've just been talking to the project manager. Apparently the QA team noticed some stability issues (on a 2-core test machine with 3 users) and traced it back to race conditions in "myTaggedUnion". The project manager was really worried because the production machine is expected to be a 32 core machine handling 500 or more users, and "myTaggedUnion" is going to get pounded from all directions. Basically; it has to be thread safe and it currently isn't. I also asked the project manager if "both foo and bar" will be needed or not, and they won't know until other parts of the project are done.
Here's my thread safe version. Acquiring and releasing a mutex isn't too fast, so I added some optimised versions for 80x86 (atomic reads/writes to avoid the mutex). For the optimised version the compiler should reduce the "myTaggedUnion" down to a single 64-bit integer (removing all other fields), while for the HLL version (for other CPUs) the compiler would remove the "asmdata" field.
Code: Select all
enum tagTypes = {
typeNone = 0
typeFoo
typeBar
}
structdef myTaggedUnion = {
mutex lock
type as tagTypes
foo as u32 ;Trivial structure member co-location!
bar as bool ;Trivial structure member co-location!
asmdata as u64 ;Only used by assembly
}
multitarget setFoo(union as @myTaggedUnion, value as u32) (void) {
functiondef setFoo(union as @myTaggedUnion, value as u32) (void) {
acquireMutex(@(union->lock))
union->type = typeFoo
union->foo = value
releaseMutex(@(union->lock))
}
asmfunction x86_64 setFoo(union as @myTaggedUnion in rdi, value as u32 in rbx) (void) {
mov rax,typeFoo << 32
or rax,rbx
mov [rdi+offset union.asmdata],rax
}
asmfunction x86_32 setFoo(union as @myTaggedUnion in edi, value as u32 in ebx) (void) {
mov ecx,typeFoo
mov eax,[edi+offset union.asmdata]
mov edx,[edi+offset union.asmdata+4]
retry:
lock cmpxchg8b [edi+offset union.asmdata] ;"cmpxchg8b" used as an atomic 64-bit write
jne retry
}
}
multitarget getFoo(union as @myTaggedUnion, badType as exit) (value as u32) {
functiondef getFoo(union as @myTaggedUnion, badType as exit) (value as u32) {
acquireMutex(@(union->lock))
if(union->type != typeFoo) {
releaseMutex(@(union->lock))
exit badType
}
value = union->foo
releaseMutex(@(union->lock))
}
asmfunction x86_64 getFoo(union as @myTaggedUnion in rsi, badType as exit in ebp) (value as u32 in rax) {
mov rax,[rdi+offset union.asmdata]
shld rbx,rax,32
cmp ebx,typeFoo
je OK
exit ebp
OK:
movzx rax,eax
}
asmfunction x86_32 getFoo(union as @myTaggedUnion in esi, badType as exit in ebp) (value as u32 in eax) {
clr eax
clr edx
clr ebx
clr ecx
lock cmpxchg8b [edi+offset union.asmdata] ;"cmpxchg8b" used as an atomic 64-bit read
cmp edx,typeFoo
je OK
exit ebp
OK:
movzx rax,eax
}
}
It'd still be easy to have a "both foo and bar" (even with thread safe code and the optimised assembly versions). Also note that no code in any caller had to be changed at all to do any of this.
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 7:19 am
by embryo
Brendan wrote:Code: Select all
mov eax,[edi+offset union.asmdata]
Sorry, but to my opinion the indirect addressing in the code looks a bit unnatural. The
offset variable is undefined and then I suppose it is a language keyword, but wouldn't it be more natural to write:
If we remove the extra keyword, then the result looks more like assembly and still is quite clear.
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 7:48 am
by Rusky
Brendan wrote:I don't want to force all enum member accesses through a glorified switch. For example, imagine if you've got some TaggedUnion structures floating around your code (where a switch is needed), plus a linked list of TaggedUnion structures where you know "TaggedUnion.Foo" is used and don't need to check, plus a linked list of TaggedUnion structures where you know "TaggedUnion.Bar" is used and don't need to check.
Safety, or no glorified switch? Your choice (in fact in Rust it really is your choice- use unsafe). Your example is flawed here because you've incorrectly typed your linked list- if one list contains all Foo and one all Bar than saying they're lists of TaggedUnion is incorrect. Using TaggedUnion as the type means "this can be Foo or Bar," just like using a pointer in C means "this can be the address of some object or NULL." The difference is that in Rust, you can have (literally and by analogy) pointers that are not "or NULL" and when something
can be other values than you expect, the compiler catches that error for you.
And if your type could include both, it's not a good use case for a tagged union. It's a collection of optional values. Tagged unions are a hammer. Your example is not a nail.
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 9:18 am
by Owen
Brendan wrote:Rusky wrote:As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.
I've just been talking to the project manager. Apparently the QA team noticed some stability issues (on a 2-core test machine with 3 users) and traced it back to race conditions in "myTaggedUnion". The project manager was really worried because the production machine is expected to be a 32 core machine handling 500 or more users, and "myTaggedUnion" is going to get pounded from all directions. Basically; it has to be thread safe and it currently isn't. I also asked the project manager if "both foo and bar" will be needed or not, and they won't know until other parts of the project are done.
Why is there a race condition? Good languages don't let you write race conditions (Rust doesn't let you write race conditions)
(OK, unsafe blocks do, but complaining about unsafe blocks is morally equivalent to complaining that you can trash memory when interacting with Java through the JNI: Completely missing the point)
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 7:23 pm
by Brendan
Hi,
embryo wrote:Brendan wrote:Code: Select all
mov eax,[edi+offset union.asmdata]
Sorry, but to my opinion the indirect addressing in the code looks a bit unnatural. The
offset variable is undefined and then I suppose it is a language keyword, but wouldn't it be more natural to write:
If we remove the extra keyword, then the result looks more like assembly and still is quite clear.
In this case EDI contains the address of the structure, "offset" is a keyword (not a variable) which means "get the offset of a member within a structure", and "union.asmdata" says which member of which structure to get the offset of.
It is a confusing issue though. For example, in HLL "myStructure.myMember" is the value in the member, but in NASM "myStructure.myMember" would be the offset of the member and not its value. This is what I want to avoid (the same thing means different things in different places). For HLL (e.g. if EDI was a pointer) you'd do "edi->myMember" or "(*edi).member", but that's strange in assembly (where square brackets are used for the equivalent of pointer dereferencing) and would mean the assembler has to know the types of all registers (which would be complicated and likely to fail in non-trivial cases) just to figure out what type of structure it is.
To confuse things more; some assemblers (e.g. MASM) also have an offset keyword; but it's used for stupid things (finding the address of a variable, or more correctly, the offset of a variable in a segment). To find an address of a variable I'd want the same syntax as HLL, which would be "&myVariable" in C or "#myVariable" in my language.
I do have a concept called "data properties" though. For example, if you've got a variable "foo" then "foo.max" gets the maximum value the variable can hold, "foo.size" gets the amount of bytes the variable consumes, etc. What I'm considering doing is adding an "offset" property, so that people can do "myStructure.myMember.offset" to get the offset of myMember in the structure. In that case the original instruction would become "mov eax,[edi+union.asmdata.offset]" or (using the structure type's name directly instead of the function's input parameter name) "mov eax,[edi+myTaggedUnion.asmdata.offset]".
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 7:57 pm
by Brendan
Hi,
Rusky wrote:Brendan wrote:I don't want to force all enum member accesses through a glorified switch. For example, imagine if you've got some TaggedUnion structures floating around your code (where a switch is needed), plus a linked list of TaggedUnion structures where you know "TaggedUnion.Foo" is used and don't need to check, plus a linked list of TaggedUnion structures where you know "TaggedUnion.Bar" is used and don't need to check.
Safety, or no glorified switch? Your choice (in fact in Rust it really is your choice- use unsafe). Your example is flawed here because you've incorrectly typed your linked list- if one list contains all Foo and one all Bar than saying they're lists of TaggedUnion is incorrect.
Let's add a "processLinkedList(firstEntryInList as @myTaggedUnion)" function that's used to processes both lists.
Also note that what I said is what I meant - a "linked list of TaggedUnion structures". This is quite different to a "linked list of pointers to TaggedUnion structures". For example:
Code: Select all
structdef myTaggedUnion = {
next as @myTaggedUnion ;Pointer to next structure on linked list
type as tagTypes
foo as u32 ;Trivial structure member co-location!
bar as bool ;Trivial structure member co-location!
}
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 8:30 pm
by Rusky
Whether it's a list of unions or pointers to unions is irrelevant. It's still the wrong type of the whole list is guaranteed to be Foos.
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 8:58 pm
by Brendan
Hi,
Owen wrote:Brendan wrote:Rusky wrote:As for "foo and bar", that doesn't seem to come up much in my experience, but Combuster did show some ways to deal with it using tagged unions.
I've just been talking to the project manager. Apparently the QA team noticed some stability issues (on a 2-core test machine with 3 users) and traced it back to race conditions in "myTaggedUnion". The project manager was really worried because the production machine is expected to be a 32 core machine handling 500 or more users, and "myTaggedUnion" is going to get pounded from all directions. Basically; it has to be thread safe and it currently isn't. I also asked the project manager if "both foo and bar" will be needed or not, and they won't know until other parts of the project are done.
Why is there a race condition?
One thread on one CPU checks the tag; then some other thread on a different CPU changes the tag and the data; then the first thread (that has already checked the tag) reads the data.
Owen wrote:Good languages don't let you write race conditions (Rust doesn't let you write race conditions)
If race conditions aren't possible; then lock-free and block-free algorithms aren't possible either. In that case it's not a good language, it's an unusable language.
Cheers,
Brendan
Re: Unions, program proofs and the Halting Problem
Posted: Thu Mar 13, 2014 11:42 pm
by Rusky
Brendan wrote:If race conditions aren't possible; then lock-free and block-free algorithms aren't possible either. In that case it's not a good language, it's an unusable language.
They're only impossible without unsafe blocks. The standard library currently has message passing, shared immutable refcounted data, and mutex-protected shared mutable data, based on unsafe blocks in functions that expose interfaces that, because of the type system, are impossible to create races with. Just wrap your lock-free implementation in unsafe blocks behind a similar interface.
Thus, you won't get that bug report about race conditions, because just flinging some shared state in there will error, and you'll have to either put a mutex on it (possibly from the standard library, this part of which can easily be freestanding) which is what you did in response to a bug report rather than a compiler error, or use/implement a lock-free system. Either way the language forces you to acknowledge the possibility of a race condition, just like it forces you to acknowledge the possibility of a null pointer, or the possibility of other union types.
Re: Unions, program proofs and the Halting Problem
Posted: Fri Mar 14, 2014 1:12 am
by Brendan
Hi,
Rusky wrote:Brendan wrote:If race conditions aren't possible; then lock-free and block-free algorithms aren't possible either. In that case it's not a good language, it's an unusable language.
They're only impossible without unsafe blocks. The standard library currently has message passing, shared immutable refcounted data, and mutex-protected shared mutable data, based on unsafe blocks in functions that expose interfaces that, because of the type system, are impossible to create races with. Just wrap your lock-free implementation in unsafe blocks behind a similar interface.
So what you're saying is that the reason you didn't provide an example to show that it's possible (while still making it look the same to other code) is that it's not possible to do this in Rust without breaking all the existing code that uses it.
Cheers,
Brendan