Page 2 of 6

Re: C and undefined behavior

Posted: Mon Jun 07, 2021 5:40 am
by Solar
vvaltchev wrote:IMHO, formally, the language now is defined solely by the ISO standard, but that doesn't prove that the language didn't fundamentally change since its inception. What we call today "C", is not anymore what was "C" was meant to be by its original creators, that's ultimately my point here. I'm not absolutely sure about that theory, just.. I'm starting to believe in it.
You are absolutely correct. What "C" is meant to be is defined, and changed over time, by the committee. That's the way it is. If you think they did something wrong or could do something better, take it up with them. But don't get philosophical about some intangible "character" of the language, because "the language" is exactly what ISO / IEC 9899 says it is. No more, no less.
vvaltchev wrote:Some clues about that misinterpretation theory? Well, check the Dennis Richie's comments on his essay about "noalias" which, fortunately, was never included in the language, at least not in that form:
Because it was poorly worded. "restrict" basically does today what "noalias" intended to do, and among other things, allows the compiler to keep certain things in registers without having to check back with memory.

Looking at how Ritchie, in that same letter, condemned "volatile" and "const", shows that being the one who came up with something isn't always the best authority on how to carry forward. (Something I've been saying about Linus Thorvals for a very long time.)
So, he wanted the ISO document to codify the existing practices and preserve the spirit of C while other people obviously pushed in another direction. After the ISO C89 standard was released (which, in my view was a big compromise between all the parties), people belonging to the other school of thought continued pushing harder and harder. At the end, a small battle after another, they've won completely. Maybe for good, maybe for bad. I don't wanna judge.
The point being, once that committee was set up, C did no longer "belong" to Ritchie, just as C++ no longer "belongs" to Stroustrup. They certainly know a great deal about their brainchilds, but once you get dozens of implementations pulling at the language each their own way, you need a standard to avoid the language fracturing, and such a standard by necessity becomes a compromise.

Comittees make mistakes. gets() should never have happened, and std::auto_ptr was a mess (but the best that could be done given the language at the time).
Just, I care to remark that modern C is not what the C language was meant to be...
Again, philosophy. Modern C is the best that people could come up with. If you disagree, get involved in the committee. Just be aware that there are many other interests involved, and if you are calling for more complete static code analysis to handle UB gracefully, you will have to convince every single compiler company that investing that kind of resources will be worth it.

Legacy code -- all code, actually -- is known to contain bugs. There are various tools out there to track them down. Compiler warnings, static code analysis, runtime checkers. Just don't point to the compiler for being "the bad boy" when UB breaks your app.

Re: C and undefined behavior

Posted: Mon Jun 07, 2021 5:44 am
by vvaltchev
Solar wrote:I don't agree. That example program in that bug report is stupid, and it was stupid right from Ritchie's first iteration of C.
I don't believe you can prove it. Read my last post.
Solar wrote:assert( X + Y > X ) is always true in correct C. The compiler may rely on that, and as the statement is a non-op, it may be optimized away. The correct way to check is well-known and established (assert( INT_MAX - Y >= X )). Again, what we see here is somebody who has not understood UB.
I know very well how UB works, and I'm obsessed with it, actually. You're missing the whole point here. This is mostly a philosophical discussion about what "C" was meant to be in the eyes of its creators rather than talking about specific UB examples. Also, you're starting to be aggressive. You can disagree like the other people do without attacking and implying I don't know what I'm talking about.
Solar wrote:UB is a condition a compiler has no obligation to detect or handle.
Yes, true. Agreed.
Solar wrote:There is no "compiler's point of view" on UB.
Nope. I can write a double my_sqrt(double x) function and claim that x cannot be < 0. The compiler knows nothing about it. There is no UB from the compiler point of view. But, if the caller does my_sqrt(-1), I cannot promise anything. The same applies for strcpy() if it was a regular function.
Solar wrote:
vvaltchev wrote:But, in the general case of my_strcpy(), you don't need to involve UB at all. Just implement your function with your assumptions and add comments.
No I won't. Those are not "my" assumptions, they are preconditions for using strcpy() correctly.
Again, they are in the case of strcpy(), but you're completely missing the point of the conversation. You're assuming I've no idea what I'm talking about. Stop doing this and start trying to understand my point. Don't assume that everything controversial I say is because I don't know what I'm talking about. I've been writing C code since 1998.

Solar wrote:I am perfectly aware of that. I picked strcpy() as a simple and everyday example. I still uphold that the blogger (as well as vvaltchev and yourself) is barking up the wrong tree.
strcpy() is a terrible example because:
1. is a library function with some contract
2. is not a regular library function, but a well-known standard library function which the compiler knows about.

Good examples do not involve any CONTRACT defined by the interface of the function, but the contract between the our code and the C compiler. People can define any kind of functions that do not support the full range of their input. Again:

Code: Select all

void foo(int x). // x must be in [1,5].
What if x is not in that range? That's certainly NOT UB, from the compiler point of view. That's why I don't call it UB. The function might produce a wrong result or assert or crash the program in any way, but that's still not UB because the compiler doesn't know about it.

Re: C and undefined behavior

Posted: Mon Jun 07, 2021 6:08 am
by vvaltchev
Solar wrote:
So, he wanted the ISO document to codify the existing practices and preserve the spirit of C while other people obviously pushed in another direction. After the ISO C89 standard was released (which, in my view was a big compromise between all the parties), people belonging to the other school of thought continued pushing harder and harder. At the end, a small battle after another, they've won completely. Maybe for good, maybe for bad. I don't wanna judge.
The point being, once that committee was set up, C did no longer "belong" to Ritchie, just as C++ no longer "belongs" to Stroustrup. They certainly know a great deal about their brainchilds, but once you get dozens of implementations pulling at the language each their own way, you need a standard to avoid the language fracturing, and such a standard by necessity becomes a compromise.

Comittees make mistakes. gets() should never have happened, and std::auto_ptr was a mess (but the best that could be done given the language at the time).
I totally understand all of that. There no point in further stating things that are obviously true. I wanted a philosophical / historical discussion about the original "spirit of C" and how it changed.
Solar wrote:
Just, I care to remark that modern C is not what the C language was meant to be...
Again, philosophy. Modern C is the best that people could come up with. If you disagree, get involved in the committee. Just be aware that there are many other interests involved, and if you are calling for more complete static code analysis to handle UB gracefully, you will have to convince every single compiler company that investing that kind of resources will be worth it.
Yes, philosophy. It's all I'm talking about. Yes, I totally realize the whole picture, the trade-offs etc. Again, I've been in the industry for a while. I'm just stating that things are changing and the story according to which most of greatest programmers didn't understood C is very likely false. They understood it well, for what it was at the time. Slowly it changed and evolved into something else. There are many good reasons for C to evolve this way, but, sadly, there's a price to pay as well: broken backwards compatibility.
Solar wrote:Legacy code -- all code, actually -- is known to contain bugs. There are various tools out there to track them down. Compiler warnings, static code analysis, runtime checkers.
Yes, there are. But not enough, as already stated. I'd argue for having more compiler options like -fwrapv to disable certain assumptions or more warnings when certain assumptions are made.
Solar wrote:Just don't point to the compiler for being "the bad boy" when UB breaks your app.
Actually, the whole discussion didn't originate from any UB breaking anything, actually. It's about NOT re-writing the history of C claiming that C has always been the language we know and use today. C changed a lot, not so much in its syntax, but in the way compilers are allowed to transform our code.

If I'm irritated about one thing that would be compilers breaking pre-existing (not always so much "legacy") code because "the standard technically allows that" without taking serious care to give the right compile-time tools to handle the problem. The approach "your code has always been broken" is bad, even if they language lawyers can prove that the compiler is right. Being right on the paper is not enough in the real world. I hope that you can understand this practical point.

Another example: when working on big projects, if I introduce a change that correctly uses an interface, but that cause tests to break because the bug is in the other component, I HAVE TO roll-back my changes, even if it I have 100% proof that the bug is elsewhere. I have to first file a bug and make the other team (or myself) fix the bug and then submit my new code. Keeping a thing broken just because a party is right "on the paper", does not work in the real life.

Re: C and undefined behavior

Posted: Mon Jun 07, 2021 7:07 am
by Solar
vvaltchev wrote:This is mostly a philosophical discussion about what "C" was meant to be in the eyes of its creators rather than talking about specific UB examples.
And I am trying to tell you that Dennis Ritchie might have been the creator of the first iteration of C, but after that it became an international standard, which is not beholden to follow whatever Ritchie originally created C as / for.
vvaitchev wrote:
Solar wrote:I don't agree. That example program in that bug report is stupid, and it was stupid right from Ritchie's first iteration of C.
I don't believe you can prove it.
Easy. CPUs that trap on signed overflow exist. That code is non-portable.
Also, you're starting to be aggressive. You can disagree like the other people do without attacking and implying I don't know what I'm talking about.
I was referring to the person filing that "bug" and then getting quite inappropriately vocal about it. (I also admit to being a bit miffed about your stance that virtually everybody actually working on C compilers and library implementations "does not truly understand" C because only Ritchie and yourself do, but I wouldn't have called that a "mimimi".)
vvaltchev wrote:
Solar wrote:There is no "compiler's point of view" on UB.
Nope. I can write a double my_sqrt(double x) function and claim that x cannot be < 0. The compiler knows nothing about it. There is no UB from the compiler point of view.
Again, there is no "compiler point of view" in this. The compiler does not "see" UB and then does evil things. The compiler may (and indeed, must) work under the assumption that there is no UB in the code. That is the very contract between the language and the programmer.

The standard strives to define a language that can be used on the widest variety of platforms, with an absolute minimum of assumptions on how that platform operates. Signaling overflow, trap representations, non-IEEE floating point, non-two's-component, non-8-bit-bytes.

The mechanism by which this works is leaving behavior that cannot be portably defined either implementation-defined (implementation has to specify), unspecified (implementation does not have to specify but has to support somehow), and undefined (no requirements on the implementation).

Again: An implementation is perfectly at liberty to define its behavior in certain (detectable) cases of UB. It might generate a compile time warning. This is what compilers actually do. The alternative would be to assert / trap / abort or whatever. But since, as you stated, lots of existing code exploits observed (but undefined) behavior, such an assert / trap / abort would probably not be considered graceful. Plus, the checks involved would mean the compiler would generate slower executables.
vvaltchev wrote:...you're completely missing the point of the conversation. You're assuming I've no idea what I'm talking about. Stop doing this and start trying to understand my point. Don't assume that everything controversial I say is because I don't know what I'm talking about. I've been writing C code since 1998.
Perhaps you're not making your point very well then? Perhaps realize that I know what I am talking about as well, and not only because I started writing C two or five years earlier than you.
vvaltchev wrote:

Code: Select all

void foo(int x). // x must be in [1,5].
What if x is not in that range? That's certainly NOT UB, from the compiler point of view. That's why I don't call it UB. The function might produce a wrong result or assert or crash the program in any way, but that's still not UB because the compiler doesn't know about it.
I don't quite get where this idea of "the compiler doesn't know about it" comes from. If your implementation leaves the behavior of foo() with inputs outside the [1,5] range undefined, your implementation is allowed to assume that inputs are in the [1,5] range, and may trigger UB if it isn't. It is up to the caller of foo() to ensure arguments are in range.
vvaltchev wrote:Slowly it changed and evolved into something else. There are many good reasons for C to evolve this way, but, sadly, there's a price to pay as well: broken backwards compatibility.
I dispute that C changed in this regard in any meaningful way ever since it became a standard. Ever since ISO 9899:1989, UB has been what it was, and exactly for that purpose that it was introduced.

With very, very few exceptions, code that was correct back then still is correct now, and will run correctly. What you mean with "backwards compatibility" is the Sim City kind... keeping known bugs working.
vvaltchev wrote:I'd argue for having more compiler options like -fwrapv to disable certain assumptions or more warnings when certain assumptions are made.
More warnings, yes, always. But realize that "disabling certain assumptions" means creating even more compiler-specific dialect (which is then bound to break at some later point, e.g. when you have to switch to a different compiler). In case of that GCC "bug" you linked, to make really lazy programming "work". I disagree with that, because it just means having to deal with more lazy programming in the field.
vvaltchev wrote:It's about NOT re-writing the history of C claiming that C has always been the language we know and use today. C changed a lot, not so much in its syntax, but in the way compilers are allowed to transform our code.
Yes. It changed from "Kernighan & Ritchie's C" to an international standard, which had to find terminology for the things K&R left out or took for granted. Pushing that change into the 2000's and into the compiler developer's field is ridiculous IMHO.
vvaltchev wrote:If I'm irritated about one thing that would be compilers breaking pre-existing (not always so much "legacy") code because "the standard technically allows that" without taking serious care to give the right compile-time tools to handle the problem.
But it's not the compiler breaking something! Your code was broken, you just happened to be lucky until now. That has never changed! That first compiler Ritchie cooked up had hardly any warnings at all, because the memory constraints wouldn't allow it...
vvaltchev wrote:The approach "your code has always been broken" is bad, even if they language lawyers can prove that the compiler is right. Being right on the paper is not enough in the real world. I hope that you can understand this practical point.
I completely understand your POV. But then C is simply not the language you are looking for; you should look at some of the VM'ed languages that replace "whatever this happens to run on" with a well-defined virtual machine. That way, virtually everything in the language can be well-defined. It just doesn't work when what you have in the end is machine code running directly on whatever CPU happens to be in charge.

Re: C and undefined behavior

Posted: Mon Jun 07, 2021 3:23 pm
by Korona
Let me throw in the following data point: Fortran's arrays were "noalias" from the start (around 15 years before C even existed as K&R C). So the idea of letting compilers efficiently exploit UB actually predicates C by quite a margin (and was certainly not coined in the 2000s).

(Fortran got POINTERs that can alias much later, namely in 1990. Before, it was always a programmer error if the same memory was accessible through two different variables in the same function and even early compilers exploited this fact.)

Re: C and undefined behavior

Posted: Tue Jun 08, 2021 12:21 pm
by vvaltchev
Solar wrote:And I am trying to tell you that Dennis Ritchie might have been the creator of the first iteration of C, but after that it became an international standard, which is not beholden to follow whatever Ritchie originally created C as / for.
Yes, that's absolutely true. But, at the same time, it's irrelevant for this discussion. I hope that now we could stop pointing simple facts like that and enter in a more sophisticated discussion about ideas, interpretations etc. I'm not forcing you in into such a discussion, just it's what I'm interested in doing in this topic.
Solar wrote:Easy. CPUs that trap on signed overflow exist. That code is non-portable.
That's the thing: non-portable code has become unacceptable, while in the past it was just "non portable". Incrementing an integer used to do the most obvious thing, facing the whatever consequences an overflow brings on a given system. Just wrap around? Fine. Trap? Fine. Stick to a given max value? Fine. One thing has always been sure: the C language per se couldn't define what will happen in this case. So, assuming any concrete behavior is not portable.

That's why, I believe, UB has been introduced in C89 or at least, that's what C's creators intended with UB. That that's the whole argument here. Does this interpretation make sense or did ISO C89 with "undefined behavior" really intended to allow compilers to do anything, including assuming that this case will never happen, opening the door for ultra-aggressive optimizations or it simply intended that compilers have no obligation to handle cases like this in any particular way, leaving the behavior simply "undefined"? A subtle difference in the wording can have a great impact.

Again, in C99 things are different. The newer (in my view) way of approaching UB got much stronger consensus in the '90s, so I firmly believe that C99's undefined behavior really means what we intend today, in "modern C".
Solar wrote:I was referring to the person filing that "bug" and then getting quite inappropriately vocal about it.
OK, sorry then. I totally agree here. The person was very aggressive and looked like he (maybe) didn't really understand UB, while still having a point, I don't deny that. On the other side, I do, completely understand UB. If I was a compiler engineer, my goal would have been exactly that: make the existing code faster by taking advantage of UB and pushing for more of that in the ISO standard. Because it works, side-effects aside. It all depends on your perspective.

Anyway, I didn't mention that bug to talk about that case, but as an indicator of when approximately UB started be affect considerably the behavior of compilers.
Solar wrote: (I also admit to being a bit miffed about your stance that virtually everybody actually working on C compilers and library implementations "does not truly understand" C because only Ritchie and yourself do, but I wouldn't have called that a "mimimi".)
Ouch, wait a second. I never implied anything liked that, at all. First of all, I believe that most UNIX programmers did understand C, for what it really was. Actually, almost the opposite problem is true: still today I often find people who don't fully understand what is UB and why it exists. So, not even for a second I wanted to imply that only Ritchie and myself understood the "true nature of C". Everybody else did understood it too, because that's the most intuitive way of seeing C.

Let me get more detailed on this: C appears to be a fully imperative language offering the paradigm: "do what I say". That's how historically it has been thought and people used it. Even beginners today who don't have a good teacher might get this idea about C. In the '90s we had a transition period which ended (on the paper) with C99 when the committee had the crystal-clear idea that the only way to introduce many more sophisticated optimizations is to treat UB the way is treated today. In the real world of compilers, we're observing the definitive death of the "do what I say" in the last years, with GCC 8.x and newer.

Modern C is absolutely and completely dominated by the opposite paradigm: "do what I mean". Clearly, that has much more potential for optimizations. For many years I've been myself a fan of this new approach, because of the better performance and the "forced" portability. Only recently, after writing a fair amount of kernel code, I started to reconsider this, in particular when compiler engineers pushed this idea a little too far for my taste.
Solar wrote:
vvaltchev wrote:
Solar wrote:There is no "compiler's point of view" on UB.
Nope. I can write a double my_sqrt(double x) function and claim that x cannot be < 0. The compiler knows nothing about it. There is no UB from the compiler point of view.
Again, there is no "compiler point of view" in this. The compiler does not "see" UB and then does evil things. The compiler may (and indeed, must) work under the assumption that there is no UB in the code. That is the very contract between the language and the programmer.

The standard strives to define a language that can be used on the widest variety of platforms, with an absolute minimum of assumptions on how that platform operates. Signaling overflow, trap representations, non-IEEE floating point, non-two's-component, non-8-bit-bytes.

The mechanism by which this works is leaving behavior that cannot be portably defined either implementation-defined (implementation has to specify), unspecified (implementation does not have to specify but has to support somehow), and undefined (no requirements on the implementation).

Again: An implementation is perfectly at liberty to define its behavior in certain (detectable) cases of UB. It might generate a compile time warning. This is what compilers actually do. The alternative would be to assert / trap / abort or whatever. But since, as you stated, lots of existing code exploits observed (but undefined) behavior, such an assert / trap / abort would probably not be considered graceful. Plus, the checks involved would mean the compiler would generate slower executables.
vvaltchev wrote:...you're completely missing the point of the conversation. You're assuming I've no idea what I'm talking about. Stop doing this and start trying to understand my point. Don't assume that everything controversial I say is because I don't know what I'm talking about. I've been writing C code since 1998.
Perhaps you're not making your point very well then? Perhaps realize that I know what I am talking about as well, and not only because I started writing C two or five years earlier than you.
Yeah, here again you're trying to explain to me what UB is. Missing the point. But, I admit, it's possible that I'm not making my point well. I'm doing my best to re-articulate my thesis. Also, I'm not assuming that you don't know what you're talking about. On the contrary, you look like an experienced developer more or less like myself who states true facts, but misses the point of my argument. I'd like to go much, much beyond simple and obvious facts about C, UB, the ISO standard, the committee and industry as it is today. I was interested in analyzing the intentions behind C89's wording of "undefined behavior", what was C before C89 according to the data we have, and how and why it changed. It's all about the philosophy of the language and the interpretation of some facts. Subjective stuff, supported by some evidence.

My (greatly simplified) thesis is: it is possible (not certain!) that C89 was the result conflicting interests and the wording appeared to make everybody happy because some parts of it (particularly about UB) could be interpreted in multiple ways. In particular DMR opposed the "do what I mean" paradigm, but had to make some compromises because it's impossible to formally define a compiled language like C and specify what will happen in a ton of cases, because that depended on the platform. Maybe (or maybe not) already at the time somebody in the committee was considering UB in a different way but it wasn't the right time yet. Years later, the "do what I mean" believers gradually won the battle and pushed further what C89 started. "Non-portable" C code has definitively become "incorrect" C code, so compilers could take advantage of that, but it took many years to introduce actual optimizations benefiting from that.

So, the question simply was: does it make sense? I've shown my arguments in favor of this theory (true or not). What arguments do you have that oppose it?
Solar wrote:
vvaltchev wrote:

Code: Select all

void foo(int x). // x must be in [1,5].
What if x is not in that range? That's certainly NOT UB, from the compiler point of view. That's why I don't call it UB. The function might produce a wrong result or assert or crash the program in any way, but that's still not UB because the compiler doesn't know about it.
I don't quite get where this idea of "the compiler doesn't know about it" comes from. If your implementation leaves the behavior of foo() with inputs outside the [1,5] range undefined, your implementation is allowed to assume that inputs are in the [1,5] range, and may trigger UB if it isn't. It is up to the caller of foo() to ensure arguments are in range.
vvaltchev wrote:Slowly it changed and evolved into something else. There are many good reasons for C to evolve this way, but, sadly, there's a price to pay as well: broken backwards compatibility.
We're simply not understanding each other here. Let's skip this for later (eventually) and focus on the rest. OK?
Solar wrote:I dispute that C changed in this regard in any meaningful way ever since it became a standard. Ever since ISO 9899:1989, UB has been what it was, and exactly for that purpose that it was introduced.

With very, very few exceptions, code that was correct back then still is correct now, and will run correctly. What you mean with "backwards compatibility" is the Sim City kind... keeping known bugs working.
OK, this is on topic. I explained why I believe that UB was interpreted in a different way. No point in adding more comments about that here. About "code that was correct back then still is correct now, and will run correctly" instead I'd say: code that was correct back then according to the modern interpretation of C89, is still correct today. Code that was correct but non-portable back then, it is considered now incorrect. The definition of correct changed becoming an alias of portable.
Solar wrote:
vvaltchev wrote:I'd argue for having more compiler options like -fwrapv to disable certain assumptions or more warnings when certain assumptions are made.
More warnings, yes, always. But realize that "disabling certain assumptions" means creating even more compiler-specific dialect (which is then bound to break at some later point, e.g. when you have to switch to a different compiler). In case of that GCC "bug" you linked, to make really lazy programming "work". I disagree with that, because it just means having to deal with more lazy programming in the field.
Fair point. I prefer pushing for higher quality in a different non-intrusive way (like enabling almost all the warnings and then build with -Werror), but I understand your point as well.
Solar wrote:
vvaltchev wrote:The approach "your code has always been broken" is bad, even if they language lawyers can prove that the compiler is right. Being right on the paper is not enough in the real world. I hope that you can understand this practical point.
I completely understand your POV.
I'm happy that you've understood at least one of my points :-) Maybe it's that I'm not communicating well enough my theories, or you started this conversation biased somehow, I don't know. But.. we'll start to at least understand (not necessarily to agree) each other at some point.
Solar wrote:But then C is simply not the language you are looking for; you should look at some of the VM'ed languages that replace "whatever this happens to run on" with a well-defined virtual machine. That way, virtually everything in the language can be well-defined. It just doesn't work when what you have in the end is machine code running directly on whatever CPU happens to be in charge.
No man, I'm not interested in VM languages. I just wanted, in a few cases, to write my non-portable C code that will be used conditionally on a given platform and take responsibility for that. Portability is not always what we want, at least not at ANY price. I'm willing to pay some price for it, but not to sacrifice everything else for it. Also, if I want to do non-portable stuff that works as I expect on 90% of the architectures and never intend to support the other "exotic" architectures, why you, compiler, have to oppose me, so much? At the end, it's also a business decision which architectures to support. Portability on everything in the world is not very realistic for most of the software projects. An example we already made? The integer overflow. It has been made UB because 2's complement is not portable. Well, almost all of the architectures today use it, even if, during the long history of C, we had architectures that used a different representation for integers. So, am I crazy if I rationally decide to not support exotic architectures? Nope, I'm not. Linux builds with -fwrapv, they're not crazy for sure.

Re: C and undefined behavior

Posted: Tue Jun 08, 2021 12:40 pm
by vvaltchev
Korona wrote:Let me throw in the following data point: Fortran's arrays were "noalias" from the start (around 15 years before C even existed as K&R C). So the idea of letting compilers efficiently exploit UB actually predicates C by quite a margin (and was certainly not coined in the 2000s).

(Fortran got POINTERs that can alias much later, namely in 1990. Before, it was always a programmer error if the same memory was accessible through two different variables in the same function and even early compilers exploited this fact.)
That's a good point, exactly on topic! Thanks for stating it. So, I have a few questions:

Did Fortran (a language I never learned) had any form of pointers back then (before 1990)? If it didn't, we cannot talk about aliasing rules since the language didn't have the basic tool for introducing any kind of aliasing. If it did, how did it work? Like in modern C with UB in case of aliasing? Can you show me some examples? How programmers checked for UB? I'd be super interested in understanding that with real examples.

But anyway, I guess that, because of this historic data point, you're implying that in C89 at least part of the committee intended UB as we know today? Maybe DMR accepted that because he was "pushed" or lead to believe that UB meant something else, much less radical?

I totally agree with Solaar's point that after the first ISO document, the C language stopped being "owned" by K&R, but at least in the wording of the first ISO standard, the opinion of the creators on what the language is must have had some serious weight, didn't it? That would be interesting to know.

Re: C and undefined behavior

Posted: Tue Jun 08, 2021 8:42 pm
by Solar
vvaitchev wrote:But anyway, I guess that, because of this historic data point, you're implying that in C89 at least part of the committee intended UB as we know today?
You still utterly refuse to even acknowledge the basic frame challenge we have been trying to impress on you. Chapter 3.4 of the standard ("Behavior") has not changed one bit through the C90, C99, and C11 iterations of the standard, and neither has its interpretation!

C was not created by Dennis Ritchie tabula rasa. It is based on Thompson's B, which in turn is based on Richard's BCPL, which was primarily geared to make compilers simple. Richard wrote (in 1980!) about his brainchild:
The philosophy of BCPL is not one of the tyrant who thinks he knows best and lays down the law on what is and what is not allowed; rather, BCPL acts more as a servant offering his services to the best of his ability without complaint, even when confronted with apparent nonsense. The programmer is always assumed to know what he is doing and is not hemmed in by petty restrictions.
-- Richards, Martin; Whitby-Strevens, Colin (1980). BCPL: The Language and its Compiler. Cambridge University Press. p. 5. ISBN 978-0521785433.
This is what the whole family -- BCPL, B, C, C++ -- really are about. All that efficiency and ability to work "directly on the hardware", and such diverse hardware, without runtime costs, comes at a price.

And it is not something that an evil committee inflicted upon a poor, pure version of C. It is part and parcel of the whole category of languages: You write buggy code, you die.

Or you get demons flying out of your nose....
:nasal demons: n. Recognized shorthand on the USENET group comp.std.c for any unexpected behavior of a C compiler on encountering an undefined construct. During a discussion on that group in early 1992, a regular remarked "When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose" (the implication is that the compiler may choose any arbitrarily bizarre way to interpret the code without violating the ANSI C standard). Someone else followed up with a reference to "nasal demons", which quickly became established.
-- Jargon File, Version 3.0, 1993
While it is a selling point for a compiler implementation to provide warnings for detectable problems in your source, it should be obvious that an exhaustive check whether a program could trigger UB is equivalent to solving the halting problem. If anything, compilers today are much better at pointing out such problems to you than they were before. I distinctly remember a time before e.g. GCC warned you about type mismatches in printf() format specifiers...

You (and that blogger you cited) are really on a wild goose chase here.

Re: C and undefined behavior

Posted: Wed Jun 09, 2021 4:38 am
by vvaltchev
Solar wrote:You still utterly refuse to even acknowledge the basic frame challenge we have been trying to impress on you. Chapter 3.4 of the standard ("Behavior") has not changed one bit through the C90, C99, and C11 iterations of the standard, and neither has its interpretation!
You didn't read the article, did you? The wording around UB did change in the transition C89 -> C99 and that might be caused by a shift in the interpretation as I explained over and over again.
Solar wrote:
The philosophy of BCPL is not one of the tyrant who thinks he knows best and lays down the law on what is and what is not allowed; rather, BCPL acts more as a servant offering his services to the best of his ability without complaint, even when confronted with apparent nonsense. The programmer is always assumed to know what he is doing and is not hemmed in by petty restrictions.
-- Richards, Martin; Whitby-Strevens, Colin (1980). BCPL: The Language and its Compiler. Cambridge University Press. p. 5. ISBN 978-0521785433.
Interesting quote. In particular is interesting how you're interpreting it 41 years later using a modern mindset. I interpret the very same quote as:

BCPL is an imperative "do what I say" language. BCPL won't impose you anything and will do exactly what you tell it to, assuming that you know what you're doing.

So, if I write something that is non-portable, the compiler will do the most obvious thing for the given architecture, leaving to the programmer to deal with the consequences of that, whatever they might me. Obviously, they will be different across different architectures (or "targets" to be more general).
Solar wrote:This is what the whole family -- BCPL, B, C, C++ -- really are about. All that efficiency and ability to work "directly on the hardware", and such diverse hardware, without runtime costs, comes at a price.
Exactly! The problem is which price we choose to pay. We can work "directly on the hardware" with the "do as I say" paradigm and sacrifice portability or adopt the "do as I mean" paradigm, which inevitably pushes us further away from the hardware and makes us focus on the "C abstract machine", but allows more optimizations and better portability. Do you understand that? If a given thing is LEGAL on a given architecture, but you (language) don't allow it because it's not portable, you are NOT allowing people to work "directly on the hardware". If you allow it instead, it's possible to work "directly on the hardware" but the amount of possible optimizations is reduced because the compiler cannot really assume to know what will happen and becomes very conservative. It's a trade-off.

Solar wrote:And it is not something that an evil committee inflicted upon a poor, pure version of C. It is part and parcel of the whole category of languages: You write buggy code, you die.
Never said that the ISO committee was evil. On the contrary, I said to some degree that paradigm shift was for good even if, in some cases, compilers go "too far".
Solar wrote:
:nasal demons: n. Recognized shorthand on the USENET group comp.std.c for any unexpected behavior of a C compiler on encountering an undefined construct. During a discussion on that group in early 1992, a regular remarked "When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose" (the implication is that the compiler may choose any arbitrarily bizarre way to interpret the code without violating the ANSI C standard). Someone else followed up with a reference to "nasal demons", which quickly became established.
-- Jargon File, Version 3.0, 1993
I knew that story already. It doesn't contradict anything I've said. It just shows that in the '90s, the school of thought which wanted UB to allow everything (at compiler side) started to get more and more followers, as I already stated. That culminated with C99 when some small changes have been made to around the wording of UB. Again, read the article.
Solar wrote:While it is a selling point for a compiler implementation to provide warnings for detectable problems in your source, it should be obvious that an exhaustive check whether a program could trigger UB is equivalent to solving the halting problem.
It depends on what you mean with "check whether a program could trigger UB". What I want is likely not what you meant in that statement. I meant the compilers to issue a special warning (or something else for UB specifically) when they make SOME dangerous assumptions. They don't need to prove that will never happen. It's enough saying: "warning: I'm assuming here at X will never overflow". Actually, such warnings already exists, but not for everything that's dangerous.

EDIT: fixing typing errors plus a little rewording to make the text more clear.

Re: C and undefined behavior

Posted: Wed Jun 09, 2021 10:10 am
by Schol-R-LEA
I think that the semantic issue here is with 'undefined behavior'. My admittedly shallow understanding is that UB == non-portable, that's all. The standard can only define things which are portable, because that is what defining a standard is all about - ensuring that code which could be the same across all foreseeable platforms will be. Anything non-portable is outside the standard by the nature of a standard. Hence it is not defined in the standard.

The whole point of the term is to discourage coders from assuming portability, not to discourage non-portability - it is a warning to be careful about what you don't intend, not a barrier to what you actually intend.

And yes, even experienced coders often need a warning or reminder that they might be taking something for granted when they shouldn't be. That doesn't stop them from doing what they want, it just is a check that it actually is what they want to do. Not every compiler has such warnings, true, but those which do still don't prevent you from writing the non-portable code you want.

You have to know what the compiler itself will do with a given piece of code, rather than assuming that two different compilers or platforms will be the same. Which in the end is exactly what everyone here is saying, just in different ways and with different emphasis.

Everyone here seems to be overthinking the matter. You're agreeing more than you're disagreeing.

Re: C and undefined behavior

Posted: Wed Jun 09, 2021 10:55 am
by vvaltchev
Schol-R-LEA wrote:I think that the semantic issue here is with 'undefined behavior'. My admittedly shallow understanding is that UB == non-portable, that's all. The whole point of the term is to discourage coders from assuming portability, not to discourage non-portability - it is a warning to be careful about what you don't intend, not a barrier to what you actually intend.

And yes, even experienced coders often need a warning or reminder that they might be taking something for granted when they shouldn't be. That doesn't stop them from doing what they want, it just is a check that it actually is what they want to do.

You have to know what the compiler itself will do with a given piece of code, rather than assuming that two different compilers or platforms will be the same. Which in the end is exactly what everyone here is saying, just in different ways and with different emphasis.

Everyone here seems to be overthinking the matter. You're agreeing more than you're disagreeing.
Thanks for joining the conversation with such a positive attitude :-) This is just an intellectual - philosophical discussion about non-obvious and non strictly technical topics. There's no need to get it "heated up" when opinions diverge. So, I appreciate the attempt to calm everybody.

Overall, I'd like to agree with all you said.. but I'm not convinced on the fact that non-portability is not discouraged by the ISO standard or by the compilers. I feel that we're going in the direction of making non-portable code (even when properly checked etc.) as hard as possible to write.
Schol-R-LEA wrote:That doesn't stop them from doing what they want, it just is a check that it actually is what they want to do.
I believe there was a time, maybe 10 years ago when that was exactly true. Now, it seems to me that we passed that limit. While C compilers cannot (yet) stop us from doing it what we want, they're really doing their best in some cases :D

Check this WONTFIX bug on GCC on which I also commented myself:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031

And the problem is not even how hard is to write new non-portable code. I'm fine with all of that, for the sake of better performance optimizations. The problem is with legacy code, even one written by experts with the best intentions, checked with #ifdefs etc. See the Linux kernel example in that bug.

EDIT: changed my mind on a sentence.

Re: C and undefined behavior

Posted: Wed Jun 09, 2021 11:39 am
by Solar
vvaltchev wrote:You didn't read the article, did you? The wording around UB did change in the transition C89 -> C99 and that might be caused by a shift in the interpretation as I explained over and over again.
I admit that what I thought to be a copy of ISO/IEC 9899:1990 on my harddrive turned out to be a mislabeled 201x draft. I never actually used that copy for anything before, so the mix-up went undiscovered.

Yes, that minute wording change happened. However, I violently disagree with the interpretation by Eskil Steenberg that "one word broke C". This is somehow reinforced by him using a bogus false claim about his struct / memcpy example, showing that he hasn't actually read the standard.

My POV is that this wording change was a result of the C99 document becoming much more formal in its wording. My POV is also that it did neither intend an actual change nor reflects a shift in interpretation. I have worked with pre-C99 C compilers, e.g. on the Amiga platform, and can tell you that the attitude to UB hadn't changed one yota.

I just opened my copy of The Amiga Guru Book by Ralph Babel, (c) 1989, 1990. It comes with humorous quotes under each of the chapter headers. Again, these refer to library functions, but they display what IMHO is the original, and this-hasn't-changed, mindset of C.
Amiga Guru Book wrote:Chapter 2
Programming Guidelines


[...]

AARRGGHHH! What a bloody nuisance. Throwing temporary RastPorts around on the stack has been a real nice win for me. [...] So now I have to do AllocMem()s with sanity checks with corresponding FreeMem()s? Foo.
-- Leo L. Schwab, comp.sys.amiga.tech, <[email protected]>

In the interim, we provide a utility with the release which pokes the (present) locations of Intuition's internal copies of the overscan settings. [...] Needless to say, behavior such as that exhibited by this utility is forbidden. Put another way: you poke IntuitionBase, you die.
-- Jim Mackraz, Jumpstart 1.4: Intuition_Update

Believe me, not having an MMU doesn't bother me in the least!
-- Matt Dillon, comp.sys.amiga, 8 Apr. 1987
I also distinctly remember the time when people wanted to show off their CPU power with various home-grown "benchmark" programs, and basically ended up having most of their "benchmark" optimized away completely by the compiler. 8)
vvaltchev wrote:We can work "directly on the hardware" with the "do as I say" paradigm and sacrifice portability or adopt the "do as I mean" paradigm, which inevitably pushes us further away from the hardware and makes us focus on the "C abstract machine", but allows more optimizations and better portability.
I don't quite agree with all of that, but that would be trying to make too fine a point.
vvaltchev wrote:Do you understand that?
Please don't insult my intelligence. I don't disagree with you because I don't understand. I disagree with you because I understand what you (and the bloggers you quote) are saying, and I disagree.
vvaltchev wrote:If a given thing is LEGAL on a given architecture, but you (language) don't allow it because it's not portable, you are NOT allowing people to work "directly on the hardware".
Well, define "legal". What specifies that signed integer overflow from that GCC bug report as "legal on the given architecture"? The C language standard doesn't, and I am pretty sure GCC docs don't either. It's something that (perhaps) worked before, and now has stopped working, but was it ever "legal" in the meaning that it should rightly work?

If the GCC docs stated that "GCC will handle signed overflow as two's complement wraparound", that would be different.
vvaltchev wrote:...in some cases, compilers go "too far".
If you feel like that, pick another compiler. But the GCC compiler is not "broken" in any way or from for behaving like it does.
vvaltchev wrote:I meant the compilers to issue a special warning (or something else for UB specifically) when they make SOME dangerous assumptions. They don't need to prove that will never happen. It's enough saying: "warning: I'm assuming here at X will never overflow". Actually, such warnings already exists, but not for everything that's dangerous.
"Permissible undefined behavior ranges from ignoring the situation completely..."

(Original C90 wording.)

The assumption is that source doesn't invoke UB. That's not dangerous, that's a basic requirement. Anything beyond that, like a warning, is quality of service.

Re: C and undefined behavior

Posted: Wed Jun 09, 2021 2:22 pm
by vvaltchev
Solar wrote:Yes, that minute wording change happened. However, I violently disagree with the interpretation by Eskil Steenberg that "one word broke C".
Ehehehe it's fine to disagree with somebody, you don't have to be violent about it. You could just firmly disagree with him (or me) :D
Solar wrote:My POV is that this wording change was a result of the C99 document becoming much more formal in its wording. My POV is also that it did neither intend an actual change nor reflects a shift in interpretation. I have worked with pre-C99 C compilers, e.g. on the Amiga platform, and can tell you that the attitude to UB hadn't changed one yota.

I just opened my copy of The Amiga Guru Book by Ralph Babel, (c) 1989, 1990. It comes with humorous quotes under each of the chapter headers. Again, these refer to library functions, but they display what IMHO is the original, and this-hasn't-changed, mindset of C.
Amiga Guru Book wrote:Chapter 2
Programming Guidelines


[...]

AARRGGHHH! What a bloody nuisance. Throwing temporary RastPorts around on the stack has been a real nice win for me. [...] So now I have to do AllocMem()s with sanity checks with corresponding FreeMem()s? Foo.
-- Leo L. Schwab, comp.sys.amiga.tech, <[email protected]>

In the interim, we provide a utility with the release which pokes the (present) locations of Intuition's internal copies of the overscan settings. [...] Needless to say, behavior such as that exhibited by this utility is forbidden. Put another way: you poke IntuitionBase, you die.
-- Jim Mackraz, Jumpstart 1.4: Intuition_Update

Believe me, not having an MMU doesn't bother me in the least!
-- Matt Dillon, comp.sys.amiga, 8 Apr. 1987
I also distinctly remember the time when people wanted to show off their CPU power with various home-grown "benchmark" programs, and basically ended up having most of their "benchmark" optimized away completely by the compiler. 8)
Finally, a true disagreement perfectly on topic! I don't care we disagree at all, I'm just happy that now you've understood my POV.
Solar wrote:Please don't insult my intelligence. I don't disagree with you because I don't understand. I disagree with you because I understand what you (and the bloggers you quote) are saying, and I disagree.
Never meant to. Just, I wasn't sure you were following me. It took us quite a discussion to understand each other. But, I'm happy that we're finally there.
Solar wrote:
vvaltchev wrote:If a given thing is LEGAL on a given architecture, but you (language) don't allow it because it's not portable, you are NOT allowing people to work "directly on the hardware".
Well, define "legal". What specifies that signed integer overflow from that GCC bug report as "legal on the given architecture"? The C language standard doesn't, and I am pretty sure GCC docs don't either. It's something that (perhaps) worked before, and now has stopped working, but was it ever "legal" in the meaning that it should rightly work?

If the GCC docs stated that "GCC will handle signed overflow as two's complement wraparound", that would be different.
LEGAL on a given architecture means that the ISA allows that. It has nothing to do with the C language. For example: if I can do unaligned access on x86 in assembly, the C language shouldn't stop me from doing that (backing that behavior up with UB) when I'm compiling C code for x86. Obviously, I don't expect C to support all the instructions any architecture has, just to not impose at all cost unnecessary restrictions on the instructions that are translatable from C expressions, unless I ask for it with an option like -fforce-alignment or something like that. Or at least, make alignment requirements default (because the ISO doc) but support an option like -fno-strict-alignment or -fallow-non-portable-features. Something like that. In general, because some people always defend compilers as long as they are right on the paper, I'd simply claim that I don't like what is written there, on the paper. I won't use inappropriate expressions like the ISO standard is wrong or right. No, I'd state simply that:

I do NOT like some parts of the ISO C standard, in particular the way UB is defined in C99.

I'm free to express this opinion. Certainly me saying that won't change the standard at time soon, but will make some people think about it. An ISO document shouldn't be treated like a gospel. At some point, it could be replaced by a different one if enough people are not OK with it. Again, note that I'm not using better or worse because those are relative concepts. It depends on your goals, POV etc.
Solar wrote:
vvaltchev wrote:...in some cases, compilers go "too far".
If you feel like that, pick another compiler. But the GCC compiler is not "broken" in any way or from for behaving like it does.
Can't really. The whole industry appears to be moving in that direction. In certain cases CLANG is behaving better, in others GCC is behaving better. No clear winner. Maybe if at some point some major corporation or some major open source project like the Linux kernel gets tired from GCC/Clang and decides to fork one of the two projects, I might use it. But, that's unlikely to happen. Usually, if a compiler behaves "too crazy" and breaks too many important projects, leaders of those projects force the compiler to "back off".
Solar wrote:The assumption is that source doesn't invoke UB. That's not dangerous, that's a basic requirement. Anything beyond that, like a warning, is quality of service.
I'd answer using Eskil Steenberg's example: being right "on the paper" is not enough if people get their fingers cut. Also, in order to avoid being pointed out as wrong because "the almighty paper says so", I'm questioning the "paper" itself, its history & etc. That's why I used Dennis Ritchie's comments in the discussion etc.

Re: C and undefined behavior

Posted: Thu Jun 10, 2021 4:06 am
by Solar
vvaltchev wrote:
Solar wrote:Well, define "legal". What specifies that signed integer overflow from that GCC bug report as "legal on the given architecture"?
LEGAL on a given architecture means that the ISA allows that. It has nothing to do with the C language.
Err... no. If you want that kind of freedom, you need to do Assembler. I meant "legal for a C programmer to do".

C comes with all kinds of conventions that abstract it from the actual hardware, in order to be a portable language (instead of a really complex set of assembler macros).

Perhaps you aren't actually doing "C", you are doing "DiceC", or "SAS/C", or "GCC C", making use of features that the language itself leaves undefined, but which the compiler in question chose to define. That's fine, many people have done so, but you have to realize that your code is no longer "strictly conforming". A misbehavior in the non-conforming part is between you and the compiler manufacturer, and that has nothing to do with the C language.

But (ab)using some effect that neither the language standard nor the implementation has defined is "being lazy" and / or "being lucky".
vvaltchev wrote:For example: if I can do unaligned access on x86 in assembly, the C language shouldn't stop me from doing that (backing that behavior up with UB) when I'm compiling C code for x86.
Thought experiment. Let's say the next generation of the architecture changes that behavior. Let's say the next x86 to appear adds different opcodes for ADD / MULT that trip on signed integer overflow, to finally get rid of all the pesky little programming mistakes.

From my POV, compiler manufacturers would be perfectly fine to make use of that new ADD/MULT. From your POV, they would not. That is what it boils down to.
vvaltchev wrote:...just to not impose at all cost unnecessary restrictions on the instructions that are translatable from C expressions...
You want freedom to be lazy. Compiler manufacturers want freedom to produce more efficient code.
vvaltchev wrote:...unless I ask for it with an option like -fforce-alignment or something like that. Or at least, make alignment requirements default (because the ISO doc) but support an option like -fno-strict-alignment or -fallow-non-portable-features. Something like that.
Well, they do the second. That's where you cross from strictly conforming to compiler-specific code.
vvaltchev wrote:In general, because some people always defend compilers as long as they are right on the paper, I'd simply claim that I don't like what is written there, on the paper. I won't use inappropriate expressions like the ISO standard is wrong or right. No, I'd state simply that:

I do NOT like some parts of the ISO C standard, in particular the way UB is defined in C99.
That sounds quite different from your initial statement in this thread, which was mostly about how other people misinterpreted the "true vision" of C.

It also puts on the table what you, and those blog writers, could do: Attend the committee meetings. Most of the compiler manufacturers are right there. You can present your reasoning, and they will consider what you have to say. They might even agree and make changes, but as you have seen in this thread, I personally would not hold my breath for it.
vvaltchev wrote:I'm free to express this opinion. Certainly me saying that won't change the standard at time soon, but will make some people think about it.
I dislike this "rolling the drum to sway public opinion" approach, especially when the issue at question is decided by a committee vote, not a public vote. It basically serves no purpose, other than perhaps vent your feelings. It can also be quite damaging, even though I don't think something as well-established as C will care either way.

(What's getting my hackles up more is the constant, and quite mal-informed, slandering of C++ from both the "lower" camp (C) and the "higher" camp (Java), mostly borne out of a lack of understanding, but I'd rather not extend the already-wide scope of this thread and let your remarks on C++ just slide.)
vvaltchev wrote:An ISO document shouldn't be treated like a gospel. At some point, it could be replaced by a different one if enough people are not OK with it.
That is my point: Neither you nor me are on the committee. Our opinions quite plainly do not matter in this regard. Even addressing your compiler manufacturer of choice will likely only change that compiler's implementation-defined behavior if you are lucky.

This is not intended to turn your opinions down. This is encouragement to take your opinion up with the committee. They are actually quite open about it. Erin Shepherd filed a paper with the committee for shortcomings of the way threads are defined in C11 she came across while working on PDCLib, and they duly discussed it.
vvaltchev wrote:
Solar wrote:
vvaltchev wrote:...in some cases, compilers go "too far".
If you feel like that, pick another compiler. But the GCC compiler is not "broken" in any way or from for behaving like it does.
Can't really. The whole industry appears to be moving in that direction. In certain cases CLANG is behaving better, in others GCC is behaving better. No clear winner. Maybe if at some point some major corporation or some major open source project like the Linux kernel gets tired from GCC/Clang and decides to fork one of the two projects, I might use it.
Oh come on. This ain't C++ we are talking about. C compilers aren't rocket science; there are many out there.
vvaltchev wrote:But, that's unlikely to happen. Usually, if a compiler behaves "too crazy" and breaks too many important projects, leaders of those projects force the compiler to "back off".
Well, either the compiler isn't conforming then (so it's trash), or the source isn't conforming and it isn't really the compiler to blame but the project leaders...

Wait, did we just come full circle on this? Yes, I think we did. :twisted:
vvaltchev wrote:I'd answer using Eskil Steenberg's example...
I can't express my opinion of that notable's "expertise" any more explicitly than I already did.

Re: C and undefined behavior

Posted: Thu Jun 10, 2021 7:33 am
by Solar
Now that we apparently exchanged our POV's on the philosophical side of things, let us not ignore the technical:

Those guys you quote are using really, really bad examples, because at the end of the day, you will actually get what you want.

Let's take that GCC "bug" as an example.

Code: Select all

assert( x + 100 > x );
Consider the following:
  • The statement "x + 100 > x" has no side effect.
  • Unless you consider platform-specific behavior (i.e., signed integer overflow), the statement "x + 100 > x" must evaluate to "true".
  • By the same token, unless you consider platform-specific behavior, the statement "assert( x + 100 > x )" is a no-op.
  • According to the "as-if" rule, statements that have no observable effect may be optimized away.
Optimization happens before executable code is generated. Making the optimization step of a cross-platform compiler aware of all kinds of platform-specific behavior adds a lot of switches and additional logic.

So it is not as simple as claiming that "of course" the optimizer stage needs to be aware that, on the given target platform (which might be different from the compilation platform!), the above statement might evaluate to something other than "true".

What we are looking at here is not some sophisticated corner-cutting by the compiler! It is quite the opposite, the compiler making the simple assumption.

Right now I unfortunately don't have access to my "old ladies" at home. As soon as I get back, I'll check the above code on some pre-89 compiler, and I am pretty sure the above code will get optimized away...