Solar wrote:And I am trying to tell you that Dennis Ritchie might have been the creator of the first iteration of C, but after that it became an international standard, which is not beholden to follow whatever Ritchie originally created C as / for.
Yes, that's absolutely true. But, at the same time, it's irrelevant for this discussion. I hope that now we could stop pointing simple facts like that and enter in a more sophisticated discussion about ideas, interpretations etc. I'm not forcing you in into such a discussion, just it's what I'm interested in doing in this topic.
Solar wrote:Easy. CPUs that trap on signed overflow exist. That code is non-portable.
That's the thing: non-portable code has become unacceptable, while in the past it was just "non portable". Incrementing an integer used to do the most obvious thing, facing the whatever consequences an overflow brings on a given system. Just wrap around? Fine. Trap? Fine. Stick to a given max value? Fine. One thing has always been sure: the C language per se couldn't define what will happen in this case. So, assuming any concrete behavior is not portable.
That's why, I believe, UB has been introduced in C89 or at least, that's what C's creators intended with UB. That that's the whole argument here. Does this interpretation make sense or did ISO C89 with "undefined behavior" really intended to allow compilers to do
anything, including assuming that this case will never happen, opening the door for ultra-aggressive optimizations or it simply intended that compilers have no obligation to handle cases like this in any particular way, leaving the behavior simply "undefined"? A subtle difference in the wording can have a great impact.
Again, in C99 things are different. The newer (in my view) way of approaching UB got much stronger consensus in the '90s, so I firmly believe that C99's undefined behavior really means what we intend today, in "modern C".
Solar wrote:I was referring to the person filing that "bug" and then getting quite inappropriately vocal about it.
OK, sorry then. I totally agree here. The person was very aggressive and looked like he (maybe) didn't really understand UB, while still having a point, I don't deny that. On the other side, I do, completely understand UB. If I was a compiler engineer, my goal would have been exactly that: make the existing code faster by taking advantage of UB and pushing for more of that in the ISO standard. Because it works, side-effects aside. It all depends on your perspective.
Anyway, I
didn't mention that bug to talk about that case, but as
an indicator of when approximately UB started be affect considerably the behavior of compilers.
Solar wrote: (I also admit to being a bit miffed about your stance that virtually everybody actually working on C compilers and library implementations "does not truly understand" C because only Ritchie and yourself do, but I wouldn't have called that a "mimimi".)
Ouch, wait a second. I
never implied anything liked that, at all. First of all, I believe that most UNIX programmers did understand C, for what it really was. Actually, almost the opposite problem is true: still today I often find people who don't fully understand what is UB and why it exists. So, not even for a second I wanted to imply that only Ritchie and myself understood the "true nature of C". Everybody else did understood it too, because that's the
most intuitive way of seeing C.
Let me get more detailed on this: C appears to be a fully imperative language offering the paradigm: "do what I say". That's how historically it has been thought and people used it. Even beginners today who don't have a good teacher might get this idea about C. In the '90s we had a transition period which ended (on the paper) with C99 when the committee had the crystal-clear idea that the only way to introduce many more sophisticated optimizations is to treat UB the way is treated today. In the real world of compilers, we're observing the definitive death of the "do what I say" in the last years, with GCC 8.x and newer.
Modern C is absolutely and completely dominated by the opposite paradigm: "do what I mean". Clearly, that has much more potential for optimizations. For many years I've been myself a fan of this new approach, because of the better performance and the "forced" portability. Only recently, after writing a fair amount of kernel code, I started to reconsider this, in particular when compiler engineers pushed this idea a little too far for my taste.
Solar wrote:vvaltchev wrote:Solar wrote:There is no "compiler's point of view" on UB.
Nope. I can write a double my_sqrt(double x) function and claim that x cannot be < 0. The compiler knows nothing about it. There is no UB from the compiler point of view.
Again,
there is no "compiler point of view" in this. The compiler does not "see" UB and then does evil things. The compiler may (and indeed, must) work under the assumption that there
is no UB in the code. That is the very contract between the language and the programmer.
The standard strives to define a language that can be used on the widest variety of platforms, with an absolute minimum of assumptions on how that platform operates. Signaling overflow, trap representations, non-IEEE floating point, non-two's-component, non-8-bit-bytes.
The mechanism by which this works is leaving behavior that cannot be portably defined either implementation-defined (implementation has to specify), unspecified (implementation does not have to specify but has to support somehow), and undefined (no requirements on the implementation).
Again: An implementation is perfectly at liberty to
define its behavior in certain (detectable) cases of UB. It might generate a compile time warning. This is what compilers actually do. The alternative would be to assert / trap / abort or whatever. But since, as you stated, lots of existing code exploits
observed (but undefined) behavior, such an assert / trap / abort would probably not be considered graceful. Plus, the checks involved would mean the compiler would generate slower executables.
vvaltchev wrote:...you're completely missing the point of the conversation. You're assuming I've no idea what I'm talking about. Stop doing this and start trying to understand my point. Don't assume that everything controversial I say is because I don't know what I'm talking about. I've been writing C code since 1998.
Perhaps you're not making your point very well then? Perhaps realize that I know what I am talking about as well, and not only because I started writing C two or five years earlier than you.
Yeah, here again you're trying to explain to me what UB is. Missing the point. But, I admit, it's possible that I'm not making my point well. I'm doing my best to re-articulate my thesis. Also, I'm
not assuming that you don't know what you're talking about. On the contrary, you look like an experienced developer more or less like myself who states
true facts, but misses the point of my argument. I'd like to go much,
much beyond simple and obvious facts about C, UB, the ISO standard, the committee and industry as it is today. I was interested in analyzing the intentions behind C89's wording of "undefined behavior", what was C before C89 according to the data we have, and how and why it changed. It's all about the philosophy of the language and the interpretation of some facts. Subjective stuff, supported by some evidence.
My (greatly simplified) thesis is: it is possible (not certain!) that C89 was the result conflicting interests and the wording appeared to make everybody happy because some parts of it (particularly about UB) could be interpreted in multiple ways. In particular DMR opposed the "do what I mean" paradigm, but had to make some compromises because it's impossible to formally define a compiled language like C and specify what will happen in a ton of cases, because that depended on the platform. Maybe (or maybe not) already at the time somebody in the committee was considering UB in a different way but it wasn't the right time yet. Years later, the "do what I mean" believers gradually won the battle and pushed further what C89 started. "Non-portable" C code has definitively become "incorrect" C code, so compilers could take advantage of that, but it took many years to introduce actual optimizations benefiting from that.
So, the question simply was: does it make sense? I've shown my arguments in favor of this theory (true or not). What arguments do you have that oppose it?
Solar wrote:vvaltchev wrote:Code: Select all
void foo(int x). // x must be in [1,5].
What if x is not in that range? That's certainly NOT UB, from the compiler point of view. That's why I don't call it UB. The function might produce a wrong result or assert or crash the program in any way, but that's still not UB because the compiler doesn't know about it.
I don't quite get where this idea of "the compiler doesn't know about it" comes from. If your implementation leaves the behavior of foo() with inputs outside the [1,5] range
undefined, your implementation is allowed to assume that inputs
are in the [1,5] range, and
may trigger UB if it isn't. It is up to the caller of foo() to ensure arguments are in range.
vvaltchev wrote:Slowly it changed and evolved into something else. There are many good reasons for C to evolve this way, but, sadly, there's a price to pay as well: broken backwards compatibility.
We're simply not understanding each other here. Let's skip this for later (eventually) and focus on the rest. OK?
Solar wrote:I dispute that C changed in this regard in any meaningful way ever since it became a standard. Ever since ISO 9899:1989, UB has been what it was, and exactly for that purpose that it was introduced.
With very, very few exceptions, code that was correct back then still is correct now, and will run correctly. What you mean with "backwards compatibility" is the Sim City kind... keeping known bugs working.
OK, this is on topic. I explained why I believe that UB was interpreted in a different way. No point in adding more comments about that here. About "code that was correct back then still is correct now, and will run correctly" instead I'd say: code that was correct back then according to the
modern interpretation of C89, is still correct today. Code that was
correct but
non-portable back then, it is considered now
incorrect. The definition of
correct changed becoming an alias of
portable.
Solar wrote:vvaltchev wrote:I'd argue for having more compiler options like -fwrapv to disable certain assumptions or more warnings when certain assumptions are made.
More warnings, yes, always. But realize that "disabling certain assumptions" means creating even more compiler-specific dialect (which is then bound to break at some later point, e.g. when you have to switch to a different compiler). In case of that GCC "bug" you linked, to make really lazy programming "work". I disagree with that, because it just means having to deal with
more lazy programming in the field.
Fair point. I prefer pushing for higher quality in a different non-intrusive way (like enabling almost all the warnings and then build with -Werror), but I understand your point as well.
Solar wrote:vvaltchev wrote:The approach "your code has always been broken" is bad, even if they language lawyers can prove that the compiler is right. Being right on the paper is not enough in the real world. I hope that you can understand this practical point.
I completely understand your POV.
I'm happy that you've understood at least one of my points
Maybe it's that I'm not communicating well enough my theories, or you started this conversation biased somehow, I don't know. But.. we'll start to at least understand (not necessarily to agree) each other at some point.
Solar wrote:But then C is simply not the language you are looking for; you should look at some of the VM'ed languages that replace "whatever this happens to run on" with a well-defined virtual machine. That way, virtually everything in the language can be well-defined. It just doesn't work when what you have in the end is machine code running directly on whatever CPU happens to be in charge.
No man, I'm not interested in VM languages. I just wanted, in a few cases, to write my non-portable C code that will be used conditionally on a given platform and take responsibility for that. Portability is not always what we want, at least not at ANY price. I'm willing to pay some price for it, but not to sacrifice everything else for it. Also, if I want to do non-portable stuff that works as I expect on 90% of the architectures and never intend to support the other "exotic" architectures, why you, compiler, have to oppose me, so much? At the end, it's also a
business decision which architectures to support. Portability on
everything in the world is
not very realistic for most of the software projects. An example we already made? The integer overflow. It has been made UB because 2's complement is not portable. Well, almost all of the architectures today use it, even if, during the long history of C, we had architectures that used a different representation for integers. So, am I crazy if I rationally decide to not support exotic architectures? Nope, I'm not. Linux builds with -fwrapv, they're not crazy for sure.