Re: Unions, program proofs and the Halting Problem
Posted: Mon Mar 17, 2014 12:03 am
Hi,
For example, I don't understand how the compiler knows that you haven't used a borrowed pointer when you should've used an owned pointer (or used owned when you should've used borrowed). If it can't determine that you've used the right pointer, then it just converts one problem into a different problem without reducing the total number of problems. If it does know which pointer you should've used, then it could've auto-detected the owned/borrowed part instead of placing the burden on the programmer. Regardless of how you look at it, it's useless.
I also don't understand how it works for shared global data. For an example; let's imagine you've got a "main" function that detects if the CPU supports AVX or not and sets a global function pointer; then starts 10 threads, and then the main/initialisation thread terminates. After that those 10 threads use the function pointed to by the function pointer. Who owns this function pointer and who has borrowed it? If the main/initialisation thread owned it what happens when the main/initialisation thread terminates (the lifetime of the owned pointer expires while 10 threads are still borrowing it)?
For another example, what happens if you've got one global shared pointer to an "8:8:8" pixel buffer (a pointer to u32 data that was allocated on the heap) and another global shared pointer to a "5:6:5" pixel buffer (a pointer to u16 data that was allocated on the heap); and you've got 3 threads. One thread converts "8-bit green" from one buffer into "5-bit green" and stores it in the other buffer while doing Floyd–Steinberg dithering for green; and the other 2 threads do the same for red and blue at the same time (note: this is the only way I've found to use multiple CPUs to speed up Floyd–Steinberg dithering). Now let's also say that there's other threads drawing data in the "8:8:8" pixel buffer, except there's actually 2 of these "8:8:8" pixel buffers and you do something like page flipping (e.g. swap the global pointers to the "8:8:8" pixel buffers before starting the conversion/dithering so that other threads can be drawing in one buffer while your 3 threads are processing the other). Which of all these pointers are owned and which are borrowed? How does Rust know that the "blue thread" is only using the pointer/s to access the bits corresponding to blue? How does Rust know that no thread accesses anything beyond the end of a buffer?
I do have "multi-target", but this is only useful for a specific use case; and for that specific use case it's less hassle for the programmer and much more powerful than conditional code can be. For example; one of my ideas (for unit testing) is to compile all versions of a multi-target function that can be executed, and automatically check if they're all equivelent (e.g. so that it's easy to determine if your optimised assembly version/s of a function behave the same as the original higher level code version). Also note that my "multi-target" is designed to allow (e.g.) many different versions of a function for 32-bit 80x86, where different versions use different CPU features (e.g. with/without MMX, with/without SSE, etc) and the compiler automatically selects the best version that the target CPU supports (based on target CPU's supported features).
For an example; let's say someone needs a list sorted in descending order. The best way is to create the list in descending order in the first place. The worst way is to create the list in random order because it's easier, then use a generic "sort()" to sort it in ascending order because it's easier, then use a generic "reverse()" to convert it into descending order because it's easier.
Cheers,
Brendan
Nobody should need to understand any of the things on that my "puke list". I do superficially understand all them, but I don't understand all the subtle little details of any of them.Rusky wrote:I think you have a problem of "anything I don't understand is puke."
For example, I don't understand how the compiler knows that you haven't used a borrowed pointer when you should've used an owned pointer (or used owned when you should've used borrowed). If it can't determine that you've used the right pointer, then it just converts one problem into a different problem without reducing the total number of problems. If it does know which pointer you should've used, then it could've auto-detected the owned/borrowed part instead of placing the burden on the programmer. Regardless of how you look at it, it's useless.
I also don't understand how it works for shared global data. For an example; let's imagine you've got a "main" function that detects if the CPU supports AVX or not and sets a global function pointer; then starts 10 threads, and then the main/initialisation thread terminates. After that those 10 threads use the function pointed to by the function pointer. Who owns this function pointer and who has borrowed it? If the main/initialisation thread owned it what happens when the main/initialisation thread terminates (the lifetime of the owned pointer expires while 10 threads are still borrowing it)?
For another example, what happens if you've got one global shared pointer to an "8:8:8" pixel buffer (a pointer to u32 data that was allocated on the heap) and another global shared pointer to a "5:6:5" pixel buffer (a pointer to u16 data that was allocated on the heap); and you've got 3 threads. One thread converts "8-bit green" from one buffer into "5-bit green" and stores it in the other buffer while doing Floyd–Steinberg dithering for green; and the other 2 threads do the same for red and blue at the same time (note: this is the only way I've found to use multiple CPUs to speed up Floyd–Steinberg dithering). Now let's also say that there's other threads drawing data in the "8:8:8" pixel buffer, except there's actually 2 of these "8:8:8" pixel buffers and you do something like page flipping (e.g. swap the global pointers to the "8:8:8" pixel buffers before starting the conversion/dithering so that other threads can be drawing in one buffer while your 3 threads are processing the other). Which of all these pointers are owned and which are borrowed? How does Rust know that the "blue thread" is only using the pointer/s to access the bits corresponding to blue? How does Rust know that no thread accesses anything beyond the end of a buffer?
Given that we've been wandering all over the place (untagged unions vs. structure member co-location, tagged unions/enums, thread safety, pointer safety, etc) I wrote a "puke list" for the entire language rather than focusing on pointers alone.Rusky wrote:The only thing in that list that's absolutely necessary is marking pointer ownership.
I'm not sure I want to get into memory management; but I've come to the conclusion that only having a single memory allocator is stupid (bad for cache coherency, bad for type checking, bad for profiling memory usage, etc). Also note that my OS has almost always had "process space" and "thread space" (with different allocators for different spaces) too. Once you start looking at having lots of special purpose allocators everywhere, marking all those places sounds like a nightmare.Rusky wrote:Marking pointers as owned and borrowed (managed is gone and has been replaced with standard library types Rc<T> and Gc<T>) is the core feature of Rust pointer safety. Much like your language marks variables with ranges, Rust marks which place in the code is responsible for managing memory. It doesn't do anything behind your back, it doesn't generate any extra operations at runtime, it just enables better static analysis.
To improve performance I often deliberately "leak" memory - e.g. rather than freeing all the little pieces one at a time before a thread or process terminates, I'll just terminate the thread or process and let the kernel free the underlying pages. Of course (for thread termination) this only works when the OS has "thread space" (a large part of the virtual address space that is thread specific).Rusky wrote:You just have two symbols for pointer types in most code- ~u32 is an owned pointer and &u32 is a borrowed pointer. Owned pointers are freed when they go out of scope (so you can't leak memory) and track moving ownership (so you can't do multiple frees or use them from multiple threads, etc.), and borrowed pointers are verified not to outlive the owner (so you can't use after free or keep a reference when transferring ownership).
So the average programmer would get tired of bothering with owned/borrowed, and would just use raw pointers for everything? Cool!Rusky wrote:Marking safety extends this so you (or a library) can define new uses of pointers and enforce that they are used as intended. There is a third type- *u32 is a raw pointer and is used to implement new abstractions (like shared atomic types) and to interface with C. Safe is the default, so you only have to mark things unsafe in the implementation of new abstractions, like mutexes, reference counters, etc. This is a good thing, as the compiler makes sure you know the places you could screw up, and encourages you to put them behind interfaces rather than scattering them everywhere.
Once you get rid of owned/borrowed everything can be copied safely without traits. Apart from that; if you can do something with a type then there's a function to do something with a type, and you only need to check if the function exists (unless some fool decides to have generic functions and you're screwed).Rusky wrote:Traits and implementations are, in general, not related to pointers. They are used to specify interfaces and that specific types implement them. Some of these are built-in and automatically determined by the compiler. For example, there's a Copy trait (it could have been renamed, not sure) that means a type is copyable without extra care like dealing with ownership moves.
I'll be doing bounds-checking (for arrays and "bounded pointers") at compile-time; and prefer to let people create their own higher level stuff (e.g. resizing) on top of that.Rusky wrote:Vector types are just arrays, without C's pointer decay or other such nonsense. They are bounds-checked at runtime to maintain pointer safety, but I agree it would be more useful to do this with dependent types. They haven't done this because other language features are higher priority- the team is not opposed to it and it could be added in the future. Vectors do currently have some built-in behavior like resizing for heap-allocated vectors that is in the process of moving to the standard library.
I don't have a preprocessor at all (no macros and no conditional compiling). Instead of conditional compiling I rely on code optimisers and dead code elimination (e.g. rather than doing "#define USE_SOMETHING" and "#ifdef USE_SOMETHING" I'll do "const bool use_something = true" and "if(use_something) {"). Instead of macros I just have functions. Java does something similar.Rusky wrote:Attributes have nothing to do with pointer safety. They're used for what is essentially the equivalent of command line arguments that are stored in the source file. Linking, warnings, etc. They're also used for conditional compilation, which I don't get your objection to since your language has it in a very similar form.
I do have "multi-target", but this is only useful for a specific use case; and for that specific use case it's less hassle for the programmer and much more powerful than conditional code can be. For example; one of my ideas (for unit testing) is to compile all versions of a multi-target function that can be executed, and automatically check if they're all equivelent (e.g. so that it's easy to determine if your optimised assembly version/s of a function behave the same as the original higher level code version). Also note that my "multi-target" is designed to allow (e.g.) many different versions of a function for 32-bit 80x86, where different versions use different CPU features (e.g. with/without MMX, with/without SSE, etc) and the compiler automatically selects the best version that the target CPU supports (based on target CPU's supported features).
In my opinion, once you fix all the problems with "C like" macros (e.g. including the ability to do type checking on macro parameters), you end up with functions.Rusky wrote:Macros have nothing to do with pointer safety in Rust. They are, however, better than C macros (which are why, I assume, you call them puke)- they are based on the AST rather than text, so there's no problems with e.g. extra parentheses around arguments, and they are clearly marked- all macro names end with !. So assert!(), fail!(), println!(), are basically functions on actual language constructs that run at compile time. They also let printf be type checked at compiler time.
Just use function pointers.Rusky wrote:Lambdas and closures have nothing to do with pointer safety. They are made safe by it, but nothing else. They don't hide anything behind your back either- their environment, the compiler-checked version of the "void *user" parameter for callbacks in C can be specified to live on the stack or heap just like in C.
I want to make it easier for programmers to write fast code than it is for them to write slow code. I need to encourage programmers to write the best code for their specific case and force them to actually think about what their code is asking the CPU to do; and I need to prevent them from just using "works for everything but ideal for nothing" trash.Rusky wrote:Generic functions have nothing to do with pointer safety. Instead, they have to do with not forcing the programmer to write braindead copy-pasted versions of functions on things like lists, trees, etc. without losing the ability of the compiler to check things.
For an example; let's say someone needs a list sorted in descending order. The best way is to create the list in descending order in the first place. The worst way is to create the list in random order because it's easier, then use a generic "sort()" to sort it in ascending order because it's easier, then use a generic "reverse()" to convert it into descending order because it's easier.
Cheers,
Brendan