OSDev.org

Posted: **Thu Dec 07, 2017 8:03 am**

So, this is just something I noticed, but my ownership model actually parallels true object-oriented programming and CSP (communicating sequential processes).

Objects (functions) can only share data with other objects if the data is immutable. The exception is that data can be sent (transferred) instead of shared. Once data has been sent, it can no longer be mutated by the sender, unless the receiver sends it back.

Traditional pass-by-value semantics entails creating a copy of data and then sending the copy. The use of "const" implies that data is immutable to both the callee and the caller for the lifetime of the callee. Problems arise when the callee shares the const data with another object that outlives the callee; because the data becomes mutable to the caller again and the object assumes that data is immutable when it's actually now volatile.

One possible solution to this problem is to use a reference counter, and not allow the object to become mutable to the caller again until there are no more references to the object.

Posted: **Thu Dec 07, 2017 11:06 am**

Wajideus wrote:So, this is just something I noticed, but my ownership model actually parallels true object-oriented programming and CSP (communicating sequential processes).

Objects (functions) can only share data with other objects if the data is immutable. The exception is that data can be sent (transferred) instead of shared. Once data has been sent, it can no longer be mutated by the sender, unless the receiver sends it back.

Traditional pass-by-value semantics entails creating a copy of data and then sending the copy. The use of "const" implies that data is immutable to both the callee and the caller for the lifetime of the callee. Problems arise when the callee shares the const data with another object that outlives the callee; because the data becomes mutable to the caller again and the object assumes that data is immutable when it's actually now volatile.

One possible solution to this problem is to use a reference counter, and not allow the object to become mutable to the caller again until there are no more references to the object.

If you think about it, this is a very similar problem to two other common issues in language design and semantics: the 'upward local reference problem', where a reference to a local variable is returned by the function or inserted into a mutable argument; and the 'upward funarg problem' for nested functions, in which returning a reference to a nested function creates a closure, capturing the outer function's state, which then has to become persistent.

So what you are actually talking about is capturing a part of the callee's state - but in this case, it isn't the variable, but the condition of the variable. However, since here you are trying to avoid capturing the state accidentally, it simplifies things a bit in terms of runtime handling, though possibly at the cost of some added compile-time analysis.

Now, as a not-very-brief digression, I should mention that this gets a bit involved when talking nested functions, hence why it is called the 'upward funarg problem' - the entire nested state needs to be captured to avoid dangling references in the enclosed function, which could involve capturing several layers of function environments (especially in functional languages and the Lisp family languages, where things like blocks and loops are often syntactics sugar over anonymous functions).

This presents problems in terms of both run-time and compile-time support, and in particular, naive approaches to run-time support either require that all function environments be in heap rather than on the stack, which presents serious problems WRT memory management, or else involve added a lot of behind-the-scenes 'spooky action at a distance' which the programmer wouldn't be able to see. Similarly, the compile-time handling can be pretty complicated, and can also complicate or hamper some optimization rules the compiler designer might want to apply.

This is why C simply forbids nested functions - supporting closures would be a serious break from the rest of the language's semantics, and the compile-time techniques needed to handle it efficiently would have added far more complexity to the compiler than the language designers wanted (especially in 1969, when some methods weren't available to them yet). Some compilers (including GCC) support them, but it is very much a non-standard extension and usually the compiler requires you to explicitly set an option to allow them. Most languages whose syntax and semantics are partially derived from C, such as C++, Java, C#, JavaScript, etc., follow suit for the general case, but many have added (either from the outset or later on) special cases for things like lambda functions, and/or allow nesting but have rules regarding returning function references.

End of digression.

Fortunately for you, this doesn't apply to this specific problem, which is closer to the problem of dangling references to local variables. Still, the compile-time solutions to all three of them are somewhat similar (for languages which do permit upward funargs, and thus support closures), which is why I mentioned it.

One model in particular might help here: copy-on-return semantics, which is similar to copy-on-write semantics but involves making a copy of all or part of the local state from the local environment to the upward one, or giving an error which requires the programmer to make such a copy in order to except the conflict. You probably don't want to have the compiler this automatically in a systems language (more on this below), but you could make it an idiom of the language, possibly enforced by compiler error checking.

The solution that seems most promising to me would be to add a compile-time check to see if references to const arguments might be returned or added to object which has a persistence lifespan beyond that of the function call.

Since (AFAIK) as a practical matter you cannot determine this absolutely through static analysis, you probably want to be conservative and take any case where you can't show it won't be (which should be easier to do in static analysis) as a case where it will occur.

You as the language and compiler designer have a few different options about what to do when it does find such a conflict:

Silently insert an operation that casts away const-ness in the upward reference, possibly by copying the data. This is a simple approach from the programmer's point of view, but it involves spooky-action-at-a-distance - the programmer doesn't know what the compiler and the run-time code are doing. It also adds some compiler complexity and overhead, and may add run-time overhead as well. It is a decent approach for application programming, where that is the kind of unnecessary detail which the language might want to suppress, but in a systems language it is unacceptable.
Make it a error, and require the programmer to explicitly change the code to eliminate the conflict, probably by explicitly casting away const-ness but possibly in some other way. it is a more flexible approach than the previous one, giving the programmer more control over how it is handled, avoids spooky-action-at-a-distance, and doesn't require as much compiler cleverness. This parallels Brendan's favored solution for detecting and resolving other kinds of potential run-time conflicts such as those for array bounds and type ranges, and is a solid model, though it does require additional programmer effort.
Take the previous method as a base case, but give additional syntax and semantics for abstracting it away for some common cases, which would somewhat reduce the cognitive load for the client-programmers while still allowing flexibility and transparency, but at the cost of additional (possible significant) compiler complexity. This can make sense for bounds checking in a type definitions (and is pretty much my own planned solution for those issues), but I am not sure how you would do this in this instance, as it is sort of a case-by-case issue. Still, it is something to think about I suppose - maybe you could add parameter modifiers that tell the compiler how to resolve it, and then give an error message if a conflict exists that doesn't have an applicable modifier so the programmer can either add one or address it in some other way.
Warn about it, but don't require the programmer to handle it. This is a very unsafe approach, but unfortunately is the kind of ostrich attitude seen all too often in existing languages. I would recommend staying away from this sort of wishful thinking.

Note that for strict-FP languages, where data mutability isn't allowed anyway, this wouldn't really come up. It also wouldn't apply non-strict Lisps, as well, as all function arguments follow pass-by-value semantics as a rule regardless of the implementation (which would be up to the compiler to resolve) - if you need to mutate an argument in Lisp, you need to use a macro (or a fexpr, in dialects that support them - though like with macros, it is always possible to add them by writing a pass-through preprocessor, then writing wrapper functions for the read, eval, and compile forms which apply that preprocessor, and finally writing your own REPL that calls your preprocessed reader instead of the standard one).

Posted: **Fri Dec 08, 2017 8:52 am**

One model in particular might help here: copy-on-return semantics, which is similar to copy-on-write semantics but involves making a copy of all or part of the local state from the local environment to the upward one, or giving an error which requires the programmer to make such a copy in order to except the conflict

The point of ownership semantics is to avoid having to copy by communicating data instead of allowing random access.

Imagine you have a book and a pen; and you can only talk with me by writing. There are 2 scenarios that can unfold.

1. You write your message in the book, and then give it to me. I can show the book to whoever I want; even put it on public display, however you're the only one with a pen so no one else can write to it. Also, you can't write to it either until the book is solely in your possession again. In this case, the book represents immutable data.

2. You write your message in the book, and then give me both the book and the pen. We've now reversed roles. I can write in the book and give you both the book and the pen back, or just the book. In this case, the book represents mutable data that is shared by communication.

This technique is more robust than const-ness because it doesn't allow the caller to change the value of an object that's passed by reference, which in turn prevents dangling pointers caused by the caller destroying an object that's referenced elsewhere because the callee promoted it's scope.

Posted: **Fri Dec 08, 2017 9:46 pm**

Hmmn, OK. I don't think I have fully understood what you had been saying before about that; I still not sure if I do now, and I will have to review your earlier posts again. This sounds interesting.

Posted: **Sat Dec 09, 2017 2:54 pm**

I meant to get back to this earlier, sorry.

Wajideus wrote:It's definitely better in terms of flexibility and simplicity. It's one of the things I love about languages like Lisp and Self (Smalltalk).

I assume that by that parenthetical that you are explaining that Self is a descendant of Smalltalk, not that it is a version of Smalltalk. The latter would be like saying that C# and Java are the same language (actually, while Self uses the Smalltalk message syntax, semantically they are probably more different than Java and C# are). Just sayin'.

(As a related aside, the tendency of both Lispers and non-Lispers to speak of the Lisp family as just 'Lisp', is... problematic. The differences between the languages in the family - Common Lisp, Scheme, Racket, Clojure, older versions of Dylan, nuLisp - are as great as those between C, C++, C#, Java, JavaScript, PHP, Perl, and the other Algol-family languages which use C's bracket syntax as their structural basis. However, there's a long history of seeing Lisp as a continuum rather than a single language, even among Lisp programmers at least up until the late-1980s, and the one thing they all have in common - using sexprs to represent both built-in forms and functions in a mostly-uniform syntax - means that you can mostly-imitate any other Lisp just with a few functions, macros and read-macros, or even by slapping together a rump meta-circular interpreter for the other language that you can call, so the lines can easily get blurred. steps off of soapbox)

Wajideus wrote:The downside though is that you pay for the overhead of runtime compilation and dynamic binding. This specific language is intended for the same niche as C/C++, so that's a tradeoff I'm not willing to make.

Yeah... well, that trade-off isn't a sharply divided as most people get the impression it is, and often depends less on the languages and more on the implementations. I've mentioned the Stalin∇ before, and while the language it is for (Vlad) isn't quite the same as any standard Scheme, it is definitely a Lisp; I see no reason why a similar super-optimizing finishing compiler couldn't be applied to any other type of language, though maybe not to any specific existing language. The key to this is in separating semantic model from implementation model in a way that let's you use different implementations so long as it results in the same the expected results.

More to the point, just because the language defaults to doing something one way, doesn't mean it can't have pragmas, modifier keywords, and specializing syntax that allow the programmer to tell (or at least request or suggest to) the compiler to use some other semantics. A simple example (if no long out of date) is the register keyword in C - yeah, it is pretty much a no-op today, but that's more to due with improvements in register painting algorithms since C was designed than a flaw in the idea of hints. The same holds as a runtime operation in the System.gc() method in Java (or the similar System.GC() in .Net languages) - the garbage collector is free to ignore the request, if it concludes that there's no need to collect garbage right now, but in most instances, it will take the hint and collect it.

An analogy - not a strong one, but still - could be made between this and using an instance of an abstract class. The parent class defines the expectation - that the variable accepts a message or method invocation with a given signature, in accordance with LSP, and that the method at least generally follow the intended semantics (e.g., a 'move' method causes the object to move, somehow) - but leaves the actual implementation to the child class. The child class's designer is still free to implement it as they choose, and more t my point here, can use overloading to have a method whose semantics are related, but signature is more specialized than that defined by the parent class, indicating that it should use a specific form of the idea rather than the generic one.

Here, the same relation holds between the language and the implementation - the language doesn't, and perhaps shouldn't, define how the compiler implements something, but probably should have modifiers, pragmas, etc. that say, "do it this way, please" with varying degrees of forcefulness.

Wajideus wrote:
Schol-R-LEA wrote:Maybe this is a Lisp-ish view, as non-toy Lisp compiler implementations are full of this sort of thing, and some are even used by (non-toy) interpreters. The highly dynamic approach inherent to Lisp makes them necessary for anything like acceptable performance. Even things as basic to the language as lists are often not what they seem to be - nearly ever serious Lisp interpreter since the days if the MIT AI Lab (circa 1965) has used 'CDR-coding' to snap the pointers for lists when the are created, turning the linked lists into tagged arrays of elements (which could be addresses to a datum or another list, or any tagged datum that would fit into the same space as an address, or in some cases more exotic things such as tagged indexes into blocks of homogeneous data elements which could be treated as a unit by the garbage collector) as a way of reducing both space and time overhead.
It sounds more or less like it's just an optimization for linked lists that happens to work because the lists in Lisp are value types (immutable).

While it is definitely an optimization, I should point out that lists aren't immutable in most Lisp family languages. The forms set-car! and set-cdr! (or replaca and replacd in Common Lisp and many older dialects) allow you to change the car (the first element of the list) and cdr (the remaining sub-list following the car), respectively. The actual semantics vary from language to language, though.

Wajideus wrote:
Schol-R-LEA wrote:In my view, it would make sense to have virtual (or at least potentially virtual) be the default, and have some modifier indicate when something has to be resolved early. Similarly, a variable in a 'parent class', 'interface', 'template', or 'typeclass' could be resolved as the actual type/class if the compiler can determine early on that it can only be an object of a certain sub-class, even if the programmer can't be certain.
Yeah, as I mentioned before, it's certainly debatable. Being virtual never causes problems, but being non-virtual makes a system more rigid and hard to change. That being said, I also think it's important to base the design of the language on what we do empirically rather than a one-size-fits-all approach. It would get old rather quickly if I had to constantly sprinkle keywords all over the place to improve performance because the compiler is making assumptions that are rarely the case.

OK, that is a fair point. The matter of defaulting things in a meaningful way is pretty much undecidable, because 'reasonable' is going to be different for different circumstances. Mind you, this is one of the reasons I am thinking in terms of have separate out-of-band modifiers, and I've been giving thought to ways to abstract them out as a matter of code as well - a kind of quasi-modal system where the default can be reset per module or even per-library, without it spilling over into other modules (well, not too much, at least - leaky abstractions and all that).

It's sort of like something I say sometimes about Linux versus Windows versus MacOS - they all suck, but with Linux, the existence of different distros, alternate file systems and system initialization models, redundant and competing GUIs and applications, etc. give you some measure of ability to choose the way your particular system sucks to let you keep the suckiness to a manageable level for yourself. Of course, most people would find any amount of Linux to be too much suck, and unless you are a programmer yourself, the consistency of Windows or MacOS is likely to count more than the flexibility of Linux (which is why I tend to see them as complementary rather than rivals - though they all still suck).

Posted: **Wed Dec 27, 2017 6:48 am**

Took a break from language design for a while, but I've got another update.

I've more or less figured out how I want error handling to work. It's sort of a hybrid of using error codes and exceptions. Below is an example of a new type of assert/else statement used to generate an error (ignore the `$` here, it's just a dummy syntax for a parametric type):

OSDev.org

Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language

Re: Opinions On A New Programming Language