Unions, program proofs and the Halting Problem

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
Post Reply
embryo

Re: Unions, program proofs and the Halting Problem

Post by embryo »

Brendan wrote:In this case EDI contains the address of the structure, "offset" is a keyword (not a variable) which means "get the offset of a member within a structure", and "union.asmdata" says which member of which structure to get the offset of.
It means you want to access the value of "union.asmdata", but for some reason is unable to use "union.asmdata" construct from C directly. Then may be it would be better if the code will look like:

Code: Select all

void fieldAccessFunction(GPR32 objectAddressReg, GPR32 fieldOffsetReg, GPR32 resultReg)
{
  mov resultReg, [objectAddressReg + fieldOfffsetReg]
}
?
Brendan wrote:but that's strange in assembly (where square brackets are used for the equivalent of pointer dereferencing) and would mean the assembler has to know the types of all registers (which would be complicated and likely to fail in non-trivial cases) just to figure out what type of structure it is.
As I understand your language is able to mix C-like code with Assembler-like one. Why not to use the same algorithm for type determination in case of assembly part, that is already used for C part of a program?
Brendan wrote:I do have a concept called "data properties" though. For example, if you've got a variable "foo" then "foo.max" gets the maximum value the variable can hold, "foo.size" gets the amount of bytes the variable consumes, etc.
It's a nice idea. But it is important to describe the area where the idea will be applicable. To my opinion the area is just a connection between high level code and the hardware level. And such connection, if exposed to the high level code, will compromise the "high" part of the level definition. Then such properties should be exposed in the connection level only. And a whole connection level should be rarely used by a developer to allow the high level view to be the most important for developer. In jEmbryoS I just wrap low level structure details within high level classes even when assembly is used, so it lets me to concentrate on the high level code.
Brendan wrote:What I'm considering doing is adding an "offset" property, so that people can do "myStructure.myMember.offset" to get the offset of myMember in the structure. In that case the original instruction would become "mov eax,[edi+union.asmdata.offset]" or (using the structure type's name directly instead of the function's input parameter name) "mov eax,[edi+myTaggedUnion.asmdata.offset]"
But even if it was decided to use such access method - why not to use compiler's ability to determine variable's type? Then it will be possible to write function like this:

Code: Select all

mov(eax, myTaggedUnion.asmdata);
// mov definition
void mov(GPR32 resultReg, int sourceValueAddress)
{
...
}
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
embryo wrote:
Brendan wrote:In this case EDI contains the address of the structure, "offset" is a keyword (not a variable) which means "get the offset of a member within a structure", and "union.asmdata" says which member of which structure to get the offset of.
It means you want to access the value of "union.asmdata", but for some reason is unable to use "union.asmdata" construct from C directly. Then may be it would be better if the code will look like:

Code: Select all

void fieldAccessFunction(GPR32 objectAddressReg, GPR32 fieldOffsetReg, GPR32 resultReg)
{
  mov resultReg, [objectAddressReg + fieldOfffsetReg]
}
?
I want assembly to be consistent with higher level language (if/where possible); and that doesn't match the way functions are defined in any high level language.
embryo wrote:
Brendan wrote:but that's strange in assembly (where square brackets are used for the equivalent of pointer dereferencing) and would mean the assembler has to know the types of all registers (which would be complicated and likely to fail in non-trivial cases) just to figure out what type of structure it is.
As I understand your language is able to mix C-like code with Assembler-like one. Why not to use the same algorithm for type determination in case of assembly part, that is already used for C part of a program?
For the higher level language the compiler can (and must) keep track of what is in each register. For assembly language the assembler can't/shouldn't keep track of what is in each register (it's virtually impossible to get right). For example:

Code: Select all

myGlobal as u32 = 1234

asmfunction myFunction(first as u32 in edx), second as u32) (void) {
    mov eax,first      ;An error (assembler doesn't know which register "first" is in)
    mov eax,edx        ;Correct way to get the value of "first"
    mov eax,second     ;Correct way to get the value of "second" ("second" is on the stack and not in a register)
    mov eax,myGlobal   ;Correct way to get the value of "myGlobal" ("myGlobal" is not in a register)
}
embryo wrote:
Brendan wrote:I do have a concept called "data properties" though. For example, if you've got a variable "foo" then "foo.max" gets the maximum value the variable can hold, "foo.size" gets the amount of bytes the variable consumes, etc.
It's a nice idea. But it is important to describe the area where the idea will be applicable. To my opinion the area is just a connection between high level code and the hardware level. And such connection, if exposed to the high level code, will compromise the "high" part of the level definition. Then such properties should be exposed in the connection level only. And a whole connection level should be rarely used by a developer to allow the high level view to be the most important for developer. In jEmbryoS I just wrap low level structure details within high level classes even when assembly is used, so it lets me to concentrate on the high level code.
Most of these "data properties" are needed (and supported) in high level language anyway. For example, in C you'd use things like "INT_MAX" (which is a macro) and "sizeof(int)" (which is a unary operator trying to look like a function) to determine the properties of different pieces of data.
embryo wrote:
Brendan wrote:What I'm considering doing is adding an "offset" property, so that people can do "myStructure.myMember.offset" to get the offset of myMember in the structure. In that case the original instruction would become "mov eax,[edi+union.asmdata.offset]" or (using the structure type's name directly instead of the function's input parameter name) "mov eax,[edi+myTaggedUnion.asmdata.offset]"
But even if it was decided to use such access method - why not to use compiler's ability to determine variable's type? Then it will be possible to write function like this:

Code: Select all

mov(eax, myTaggedUnion.asmdata);
// mov definition
void mov(GPR32 resultReg, int sourceValueAddress)
{
...
}
Creating a function to execute an instruction in assembly would be like extending the "Integer" class in Java just to create a "return first + second" method to add 2 integers. ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Unions, program proofs and the Halting Problem

Post by Rusky »

Brendan wrote:So what you're saying is that the reason you didn't provide an example to show that it's possible (while still making it look the same to other code) is that it's not possible to do this in Rust without breaking all the existing code that uses it.
How on earth does that follow from what I said? There will probably be no existing code using it because you can't have race conditions. When moving from single-threaded to multi-threaded, however, there's no reason you couldn't use accessor functions, just like in your example, to keep usage the same. The difference would be that the language would force you to do it correctly (when using built-in locks/etc), or force you to do it according to your specification (when implementing your own synchronization primitives) which is hopefully correct. Your language has no such guarantees.
embryo

Re: Unions, program proofs and the Halting Problem

Post by embryo »

Brendan wrote:I want assembly to be consistent with higher level language (if/where possible);
And your way of being consistent is with high level language constructs. Then here is the question - how tightly the assembly language should be integrated with the high level? Where is the limit?
Brendan wrote:For assembly language the assembler can't/shouldn't keep track of what is in each register (it's virtually impossible to get right). For example:

Code: Select all

myGlobal as u32 = 1234

asmfunction myFunction(first as u32 in edx), second as u32) (void) {
    mov eax,first      ;An error (assembler doesn't know which register "first" is in)
    mov eax,edx        ;Correct way to get the value of "first"
    mov eax,second     ;Correct way to get the value of "second" ("second" is on the stack and not in a register)
    mov eax,myGlobal   ;Correct way to get the value of "myGlobal" ("myGlobal" is not in a register)
}
It's not about tracking register content, but it's about an interface between high and low levels. When you write mov eax,second it's ok, but why not to write mov eax,someStructure.second? It's about the 'offset' keyword.
Brendan wrote:Creating a function to execute an instruction in assembly would be like extending the "Integer" class in Java just to create a "return first + second" method to add 2 integers. ;)
Assembly instruction, if it was called from high level ,usually is not as primitive as a summation of two numbers. It can be an atomic update of some bit flags in a field, for example. It's just additional functionality that is not implemented in the high level.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
Rusky wrote:
Brendan wrote:So what you're saying is that the reason you didn't provide an example to show that it's possible (while still making it look the same to other code) is that it's not possible to do this in Rust without breaking all the existing code that uses it.
How on earth does that follow from what I said? There will probably be no existing code using it because you can't have race conditions. When moving from single-threaded to multi-threaded, however, there's no reason you couldn't use accessor functions, just like in your example, to keep usage the same. The difference would be that the language would force you to do it correctly (when using built-in locks/etc), or force you to do it according to your specification (when implementing your own synchronization primitives) which is hopefully correct. Your language has no such guarantees.
I obviously don't know as much about Rust as someone that cares. I assumed that (like most languages designed to be usable for low-level work) it couldn't know if the code being compiled was single-threaded, or used something external (e.g. kernel API) for threading, or implemented it's own threading (e.g. an OS kernel), or if it did something more strange (like AP CPU startup code).

Of course I was wrong. Yesterday I found out it can know if there's concurrency or not because it rams a half-baked "user-space m:n threading" thing down everyone's throat, and that it can't even be used as free-standing in the first place.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
embryo wrote:
Brendan wrote:I want assembly to be consistent with higher level language (if/where possible);
And your way of being consistent is with high level language constructs. Then here is the question - how tightly the assembly language should be integrated with the high level? Where is the limit?
For my languages; the only major differences between assembly and higher level code is that assembly uses instructions with operands instead of statements and expressions, and the compiler won't do some of the safety checks it does for higher level code.
embryo wrote:
embryo wrote:
Brendan wrote:For assembly language the assembler can't/shouldn't keep track of what is in each register (it's virtually impossible to get right). For example:

Code: Select all

myGlobal as u32 = 1234

asmfunction myFunction(first as u32 in edx), second as u32) (void) {
    mov eax,first      ;An error (assembler doesn't know which register "first" is in)
    mov eax,edx        ;Correct way to get the value of "first"
    mov eax,second     ;Correct way to get the value of "second" ("second" is on the stack and not in a register)
    mov eax,myGlobal   ;Correct way to get the value of "myGlobal" ("myGlobal" is not in a register)
}
It's not about tracking register content, but it's about an interface between high and low levels. When you write mov eax,second it's ok, but why not to write mov eax,someStructure.second? It's about the 'offset' keyword.
For "asmfunction foo(someStructure as myStructureType) (void)" you didn't tell the assembler to put "someStructure" in a register, so it puts it on the stack, so "mov eax,someStructure.second" is fine. Note: to be technically correct, "someStructure.second" is an expression, but that expression can be converted into a form that the CPU's "mov" instruction accepts (either "mov eax,[address]" or "mov eax,[esp+stack_position]"); and this conversion is not doing anything that wouldn't be obvious to an assembly language programmer.

For "asmfunction foo(someStructure as myStructureType in edx) (void)" you told the assembler to put "someStructure" in the EDX register, so "mov eax,someStructure.second" is not fine (because the assembler doesn't track the contents of registers).

For "asmfunction foo(someStructurePointer as @myStructureType) (void)" you didn't tell the assembler to put "someStructurePointer" in a register, so it puts it on the stack. However, in this case "mov eax,someStructurePointer->second" is not fine; because this expression can not be converted into a form that the CPU's "mov" instruction accepts. The expression would have to be written as 2 instructions (e.g. "mov esi,someStructurePointer" to get the value/address stored in the pointer, followed by either "mov eax,[esi+myStructureType.second.offset]" or "mov eax,[esi+someStructurePointer.second.offset]" to get the value of the structure's member).

For "asmfunction foo(someStructurePointer as @myStructureType in edx) (void)" you told the assembler to put "someStructurePointer" in the EDX register, so "mov eax,someStructurePointer->second" is not fine (because the assembler doesn't track the contents of registers).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Owen »

Brendan wrote:Hi,
Rusky wrote:
Brendan wrote:So what you're saying is that the reason you didn't provide an example to show that it's possible (while still making it look the same to other code) is that it's not possible to do this in Rust without breaking all the existing code that uses it.
How on earth does that follow from what I said? There will probably be no existing code using it because you can't have race conditions. When moving from single-threaded to multi-threaded, however, there's no reason you couldn't use accessor functions, just like in your example, to keep usage the same. The difference would be that the language would force you to do it correctly (when using built-in locks/etc), or force you to do it according to your specification (when implementing your own synchronization primitives) which is hopefully correct. Your language has no such guarantees.
I obviously don't know as much about Rust as someone that cares. I assumed that (like most languages designed to be usable for low-level work) it couldn't know if the code being compiled was single-threaded, or used something external (e.g. kernel API) for threading, or implemented it's own threading (e.g. an OS kernel), or if it did something more strange (like AP CPU startup code).

Of course I was wrong. Yesterday I found out it can know if there's concurrency or not because it rams a half-baked "user-space m:n threading" thing down everyone's throat, and that it can't even be used as free-standing in the first place.
Critical research failure. There are Rust kernels already. It rams no form of threading down your throat (though there are two threading option in the standard library)

Rust doesn't care if your code is single or multi threaded. For most cases, you should just pass objects between threads (Rust statically guarantees that the old thread isn't holding onto a reference to the object). If you need shared state, then you use something like the RW<T> type, which uses unsafe blocks to implement a mutex-protected reference-counted pointer. The implementation of this uses Rust's lifetimes to ensure that you don't hold onto a reference to the contained object for longer than you hold onto the lock.

It is all simple. It all works. Of course, you'll complain that you can't automatically validate that the unsafe code is correct (true), however likewise I will point out that I doubt you exhaustively validate your compiler is correct either.

I'll take pointers which protect me from lots of errors most of the time over unsafe pointers which protect me from none any day of the week.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
Owen wrote:
Brendan wrote:I obviously don't know as much about Rust as someone that cares. I assumed that (like most languages designed to be usable for low-level work) it couldn't know if the code being compiled was single-threaded, or used something external (e.g. kernel API) for threading, or implemented it's own threading (e.g. an OS kernel), or if it did something more strange (like AP CPU startup code).

Of course I was wrong. Yesterday I found out it can know if there's concurrency or not because it rams a half-baked "user-space m:n threading" thing down everyone's throat, and that it can't even be used as free-standing in the first place.
Critical research failure. There are Rust kernels already. It rams no form of threading down your throat (though there are two threading option in the standard library)
Yes. I got the "half-baked user-space m:n threading thing" from the section about Rust's coherency model in Rust's own manual. I also got the "can't be used as free-standing" from doing research - specifically, by googling for "rust free standing" and clicking on the top search result. It is not my fault that attempting to do research leads to failure, because (as far as I can tell) the developers can't figure out what Rust is and repeatedly change their mind and 90% of the information on the internet about Rust is obsolete (and the remaining 10% will probably be obsolete next week anyway).

Of course it doesn't help that the only reason I'm doing the research is because people (Rusky) keep bring up Rust over and over and over and over and ..... I do not care about Rust, I will never care about Rust, and every minute I spend attempting to decode the (lack of) information for yet another "flavour of the month vapour-ware language" is just another minute wasted.

There are "Rust kernels", but only if a kernel is a thing that boots and makes the screen turn red. As far as I can tell, there are more kernels written in Java, and the Java kernels have more of the features you'd expect from a kernel (in the same way that 0.1% is more than none). Just because these things exist does not mean the tools used were suited to kernel development in any way.
Owen wrote:Rust doesn't care if your code is single or multi threaded.
Then the Rust manual (which clearly states that "Rust has a memory model centered around concurrently-executing tasks") must be wrong.
Owen wrote:For most cases, you should just pass objects between threads (Rust statically guarantees that the old thread isn't holding onto a reference to the object). If you need shared state, then you use something like the RW<T> type, which uses unsafe blocks to implement a mutex-protected reference-counted pointer. The implementation of this uses Rust's lifetimes to ensure that you don't hold onto a reference to the contained object for longer than you hold onto the lock.
For a simple example; let's say I want to implement an IRQ handler that does "tick++" and almost nothing else (an EOI and an IRET might be nice). Now, let's try to decipher the techno-babble...

If I need shared state (I do - the "tick" variable would be global data read by many CPUs and modified by an IRQ handler running on any CPU), then you use something like a <vomit> type, which uses "hidden behind your back" unsafe blocks to implement a deadlock (IRQ handler waiting for the code it interrupted to release a mutex) that protects a <slimy turd>. The implementation of this uses Rust's lifetimes (because lifetimes are extremely important for global data that lives "forever") to ensure that you don't hold onto a <puss> to the <bucket of puss> for longer than you hold onto the deadlock.

Of course I'd just do "lock inc dword [tick]".
Owen wrote:I'll take pointers which protect me from lots of errors most of the time over unsafe pointers which protect me from none any day of the week.
The beautiful thing about C is that it's quite low level. It's easy to look at code written in C and imagine the assembly you expect and "know" what effect the code has on the CPU, memory, etc. Without this relatively strong connection between the source code and the underlying hardware, it's far too easy for programmers to ignore what their code actually does (e.g. which cache lines are touched, how expensive a function is, how much RAM is used where and why, etc). It's not something I want to destroy - it's one of the most necessary parts of the language. I want to force programmers to see their code in terms of micro-ops, loads and stores (not in terms of abstract black boxes).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Unions, program proofs and the Halting Problem

Post by Rusky »

It is a goal of the Rust creators that Rust code be able to run freestanding (a goal which is currently met). The language is still in development- why do you expect the first search result to automatically override the claims of people who actually use the language? Why does a "memory model centered around concurrently-executing tasks" mean the compiler has to know if your code is single or multi-threaded and not just that the memory model takes into account the possibility of concurrent code? Why do you bother to research and argue about Rust if you don't care about it?
Brendan wrote:For a simple example; let's say I want to implement an IRQ handler that does "tick++" and almost nothing else (an EOI and an IRET might be nice). Now, let's try to decipher the techno-babble...
If you wanted to portray Rust as poorly-designed and bad for systems programming, you could use an Rc<i32>... or you could just use an atomic type like a sane person.
Brendan wrote:The beautiful thing about C is that it's quite low level. It's easy to look at code written in C and imagine the assembly you expect and "know" what effect the code has on the CPU, memory, etc. Without this relatively strong connection between the source code and the underlying hardware, it's far too easy for programmers to ignore what their code actually does (e.g. which cache lines are touched, how expensive a function is, how much RAM is used where and why, etc). It's not something I want to destroy - it's one of the most necessary parts of the language. I want to force programmers to see their code in terms of micro-ops, loads and stores (not in terms of abstract black boxes).
How does this have anything to do with (let alone preclude) static analysis of pointers?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
Rusky wrote:It is a goal of the Rust creators that Rust code be able to run freestanding (a goal which is currently met). The language is still in development- why do you expect the first search result to automatically override the claims of people who actually use the language?
The first search result was written by someone that actually used the language. It automatically overrode nothing, as there was no other information about whether or not Rust can be used in a free standing environment that I was aware of at the time.
Rusky wrote:Why does a "memory model centered around concurrently-executing tasks" mean the compiler has to know if your code is single or multi-threaded and not just that the memory model takes into account the possibility of concurrent code?
Because to take into account the possibility of concurrent code (e.g. thread safety) you have to know which pieces of the code are or aren't used by multiple threads. Otherwise you end up with the hassle of making everything thread safe when it's unnecessary (e.g. when there are no threads).
Rusky wrote:Why do you bother to research and argue about Rust if you don't care about it?
Sadly, I've been discussing ideas with people that use "example of an implementation of an idea" instead of discussing an idea directly. Rather than saying "a language that implements foo can get around problem bar by doing x and y and z" these people just say "waffle waffle Rust". This means that I have to trawl through whatever information is available about the implementation of the idea just to figure out what these people are trying to say.

Worse, these people only ever seem to mention the advantages of "example of an implementation of an idea", and never seem to mention any of the disadvantages. I can't trust them as a source of unbiased information.
Rusky wrote:
Brendan wrote:The beautiful thing about C is that it's quite low level. It's easy to look at code written in C and imagine the assembly you expect and "know" what effect the code has on the CPU, memory, etc. Without this relatively strong connection between the source code and the underlying hardware, it's far too easy for programmers to ignore what their code actually does (e.g. which cache lines are touched, how expensive a function is, how much RAM is used where and why, etc). It's not something I want to destroy - it's one of the most necessary parts of the language. I want to force programmers to see their code in terms of micro-ops, loads and stores (not in terms of abstract black boxes).
How does this have anything to do with (let alone preclude) static analysis of pointers?
I'll take unsafe pointers over "obfuscated cluster-bork of puke needed to attempt to make pointers safe" any day of the week.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Unions, program proofs and the Halting Problem

Post by Rusky »

So, your first search result is a year-old, closed issue in the Rust tracker about progress toward making Rust freestanding. You also missed the fact that multiple users brought up Rust in a thread about OS development, on a site about OS development. Rust markets itself as a systems language. There's also been a class on OS dev taught using Rust at the University of Virginia. And as you found when prodded, people have built the equivalent of the barebones tutorials here in Rust.

The "obfuscated cluster-bork of puke needed to attempt to make pointers safe" doesn't have any more "abstract black boxes" than C:

Rust pointers are just as transparent as C pointers, with some added compile-time restrictions. They have the concept of "ownership," where only one variable can own the pointed-to memory at a time. Other variables (usually arguments) can "borrow" a reference as well, and the compiler will verify that these references do not outlive the owner, which frees the memory at the end of its lifetime- when it goes out of scope, or gets something else assigned to it, etc.

Rust makes code thread safe without adding any overhead to unthreaded code. When you spawn a new thread or send it a message, it either takes ownership of pointers passed to it, or copies values passed to it. The sender can no longer reference "moved" data. When you need shared data instead of message passing, the compiler makes sure you still don't have race conditions using the types of the transferred data.

For example, an atomic type would be a wrapper that tells the compiler "it's okay to copy this reference, because it will only be accessed atomically." A refcounted pointer can be copied around because it's a wrapper that tells the compiler "it's okay to copy me, I'll make sure the memory/resource gets free'd." How do these wrappers tell the compiler this information? They are outwardly marked like value types ("it's okay to copy me") but internally wrap pointers in unsafe blocks. All this information is at compile time- the actual code looks just like its C counterpart; just the types are different. No black boxes, just extra type information like your language's ranges.
embryo

Re: Unions, program proofs and the Halting Problem

Post by embryo »

Brendan wrote:the only major differences between assembly and higher level code is that assembly uses instructions with operands instead of statements and expressions, and the compiler won't do some of the safety checks it does for higher level code.
But what about mixing of the instructions with expressions? The mov is an instruction, but one of it's operands can be an expression. The offset keyword is an example of such expression. And now the limit of the expression penetration into the assembly is important.
Brendan wrote:For "asmfunction foo(someStructure as myStructureType in edx) (void)" you told the assembler to put "someStructure" in the EDX register, so "mov eax,someStructure.second" is not fine (because the assembler doesn't track the contents of registers).
So we have an issue of a connection from high level structure to it's hardware placement (in memory or in registers). There should be some simple rules of applying such connection. If you use "in edx" clause then the rule requires from the compiler to place the someStructure address in edx, but it is the high level part of the new language and the clause constrains the compiler in it's ability to optimize high level code. May be it is better not to constrain the compiler and just remove "in edx"?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
Rusky wrote:The "obfuscated cluster-bork of puke needed to attempt to make pointers safe" doesn't have any more "abstract black boxes" than C:

Rust pointers are just as transparent as C pointers, with some added compile-time restrictions. They have the concept of "ownership," where only one variable can own the pointed-to memory at a time. Other variables (usually arguments) can "borrow" a reference as well, and the compiler will verify that these references do not outlive the owner, which frees the memory at the end of its lifetime- when it goes out of scope, or gets something else assigned to it, etc.
How much of Rust's "safety" would work if all of the following "pieces of puke" were removed from the language:
  • Macros
  • Being forced to explicitly mark various things as safe or unsafe
  • Being forced to explicitly mark pointers as owned or borrowed (or managed?)
  • Their "traits"
  • Their "implementations"
  • Their "attributes" (including lint check attributes)
  • Macros
  • Conditional compilation
  • Lamba expressions
  • Generic functions
  • Vector types
  • Recursive types
  • Closure types
  • Their built in inter-task communication, scheduler, and everything that depends on their run-time

Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Unions, program proofs and the Halting Problem

Post by Brendan »

Hi,
embryo wrote:
Brendan wrote:the only major differences between assembly and higher level code is that assembly uses instructions with operands instead of statements and expressions, and the compiler won't do some of the safety checks it does for higher level code.
But what about mixing of the instructions with expressions? The mov is an instruction, but one of it's operands can be an expression. The offset keyword is an example of such expression. And now the limit of the expression penetration into the assembly is important.
I think this limit is reasonably clear (or at least, clear to assembly language programmers): it must be possible for the expression to be converted at compile time into a form that the CPU can accept for the corresponding instruction's operand.
embryo wrote:
Brendan wrote:For "asmfunction foo(someStructure as myStructureType in edx) (void)" you told the assembler to put "someStructure" in the EDX register, so "mov eax,someStructure.second" is not fine (because the assembler doesn't track the contents of registers).
So we have an issue of a connection from high level structure to it's hardware placement (in memory or in registers). There should be some simple rules of applying such connection. If you use "in edx" clause then the rule requires from the compiler to place the someStructure address in edx, but it is the high level part of the new language and the clause constrains the compiler in it's ability to optimize high level code. May be it is better not to constrain the compiler and just remove "in edx"?
Without the "in edx", parameters get passed on the stack. In general, passing parameters on the stack is a bad idea - it tends to costs extra instructions to store and retrieve anything. For some expressions passing things on the stack would make it possible for the assembler to convert the expression into a usable operand; but for other expressions passing things on the stack would make it impossible for the assembler to convert the expression into a usable operand.

For a simple example, consider pointer dereferencing. If the pointer is on the stack then an expression like "*myPointer" would require 2 instructions and it's not possible to do "mov eax,*foo"; but if the pointer is in a register then it is very possible to convert the expression into a form that the CPU can accept (e.g. "mov eax,[edx]").


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Rusky
Member
Member
Posts: 792
Joined: Wed Jan 06, 2010 7:07 pm

Re: Unions, program proofs and the Halting Problem

Post by Rusky »

I think you have a problem of "anything I don't understand is puke."

The only thing in that list that's absolutely necessary is marking pointer ownership. Marking pointers as owned and borrowed (managed is gone and has been replaced with standard library types Rc<T> and Gc<T>) is the core feature of Rust pointer safety. Much like your language marks variables with ranges, Rust marks which place in the code is responsible for managing memory. It doesn't do anything behind your back, it doesn't generate any extra operations at runtime, it just enables better static analysis.

You just have two symbols for pointer types in most code- ~u32 is an owned pointer and &u32 is a borrowed pointer. Owned pointers are freed when they go out of scope (so you can't leak memory) and track moving ownership (so you can't do multiple frees or use them from multiple threads, etc.), and borrowed pointers are verified not to outlive the owner (so you can't use after free or keep a reference when transferring ownership).

Marking safety extends this so you (or a library) can define new uses of pointers and enforce that they are used as intended. There is a third type- *u32 is a raw pointer and is used to implement new abstractions (like shared atomic types) and to interface with C. Safe is the default, so you only have to mark things unsafe in the implementation of new abstractions, like mutexes, reference counters, etc. This is a good thing, as the compiler makes sure you know the places you could screw up, and encourages you to put them behind interfaces rather than scattering them everywhere.

Traits and implementations are, in general, not related to pointers. They are used to specify interfaces and that specific types implement them. Some of these are built-in and automatically determined by the compiler. For example, there's a Copy trait (it could have been renamed, not sure) that means a type is copyable without extra care like dealing with ownership moves.

Vector types are just arrays, without C's pointer decay or other such nonsense. They are bounds-checked at runtime to maintain pointer safety, but I agree it would be more useful to do this with dependent types. They haven't done this because other language features are higher priority- the team is not opposed to it and it could be added in the future. Vectors do currently have some built-in behavior like resizing for heap-allocated vectors that is in the process of moving to the standard library.

Attributes have nothing to do with pointer safety. They're used for what is essentially the equivalent of command line arguments that are stored in the source file. Linking, warnings, etc. They're also used for conditional compilation, which I don't get your objection to since your language has it in a very similar form.

Macros have nothing to do with pointer safety in Rust. They are, however, better than C macros (which are why, I assume, you call them puke)- they are based on the AST rather than text, so there's no problems with e.g. extra parentheses around arguments, and they are clearly marked- all macro names end with !. So assert!(), fail!(), println!(), are basically functions on actual language constructs that run at compile time. They also let printf be type checked at compiler time. ;)

Lambdas and closures have nothing to do with pointer safety. They are made safe by it, but nothing else. They don't hide anything behind your back either- their environment, the compiler-checked version of the "void *user" parameter for callbacks in C can be specified to live on the stack or heap just like in C.

Generic functions have nothing to do with pointer safety. Instead, they have to do with not forcing the programmer to write braindead copy-pasted versions of functions on things like lists, trees, etc. without losing the ability of the compiler to check things.

Recursive types have nothing to do with pointer safety. They're in C as well- "struct list { struct list *next; int value; };"

The runtime has nothing to do with pointer safety. It relies on pointer safety to provide verify its features, but it's completely removable as evidenced by existing freestanding Rust code.
Post Reply