C and undefined behavior
Posted: Sun Jun 06, 2021 5:53 am
Recently, I read this blog post about C and undefined behavior:
https://www.yodaiken.com/2021/05/19/und ... ing-error/
In which, substantially, the author states that the current way of treating UB by modern compilers is a result of a misinterpretation of the C89 standard.
What do you think about that?
IMHO, I believe he's probably right. Even if new fancy compiler optimizations are great, the "modern C" probably broke the philosophy behind the C language.
Check out Dennis Richies' essay about noalias: https://www.yodaiken.com/2021/03/19/den ... uage-1988/
Really, it's not fair claiming that millions and millions of lines of C code written in '90s are just wrong and all written by incompetent programmers who didn't read the standard. No. Everybody agreed on what C was back in time. And then, things start to change, step by step. The boundaries of the C89 standard have been pushed since then. In the C99 standard some different wording around UB allowed even more freedom to compilers.
And the consequences of that are very remarkable. Let me quote the blog's author:
But that shouldn't be C's case. The language has been designed with the purpose of writing operating systems. It's exactly where you're supposed to do things like that, from time to time. C was the right tool for that kind of software. It was the portable assembler, really. Now, it a feels more like a downgraded high-level language. A C++ without most of its features.
Still, when I write C code I'm super-pedantic and careful about UB. The fact that I don't like how modern compilers behave in some cases, doesn't mean I deny UB's existence or ignore the ISO documents etc. It's the opposite: I'm obsessed with avoiding UB, because I hate dealing with UB bugs.
-----------------------------------------------------------------------------------------------------------------
EDIT[1]: This text is rough and seriously rushed up. Please, check the whole discussion.
Changes: I added the word "probably" in two places and fixed some typos like missing "lines of code".
EDIT[2]: For future readers ending up here, who might not desire to read a LONG and unstructured discussion about undefined behavior, I believe it's worth sharing a few articles on the same topic:
https://www.yodaiken.com/2021/05/19/und ... ing-error/
In which, substantially, the author states that the current way of treating UB by modern compilers is a result of a misinterpretation of the C89 standard.
What do you think about that?
IMHO, I believe he's probably right. Even if new fancy compiler optimizations are great, the "modern C" probably broke the philosophy behind the C language.
Check out Dennis Richies' essay about noalias: https://www.yodaiken.com/2021/03/19/den ... uage-1988/
Really, it's not fair claiming that millions and millions of lines of C code written in '90s are just wrong and all written by incompetent programmers who didn't read the standard. No. Everybody agreed on what C was back in time. And then, things start to change, step by step. The boundaries of the C89 standard have been pushed since then. In the C99 standard some different wording around UB allowed even more freedom to compilers.
And the consequences of that are very remarkable. Let me quote the blog's author:
C used to be the "portable assembler" where any kind of hacks/casts were allowed. If you wanted a type-safe language, you simply had to use something else. At the end of the day, I don't want to question the good intentions of the people that pushed those changes forward. Actually, for a language like C++, I think it's mostly for good. In C++ you're not supposed to do tricky casts or other unsafe low-level tricks. Even if you can do that, there are 1,000 things you have to consider before doing that. It's part of what we consider "idiomatic C++" having to be careful when going there. Rust goes one step further, making low-level stuff impossible to do without super-explicit annotations etc. See the Rust kernel projects out there.https://www.yodaiken.com/ wrote:[...] over time the Standard and the common compilers have made C an unsuitable language for developing a range of applications, from memory allocators, to cryptography applications, to threading libraries and, especially operating systems. We have the absurd situation that C, specifically constructed to write the UNIX kernel, cannot be used to write operating systems. In fact, Linux and other operating systems are written in an unstable dialect of C that is produced by using a number of special flags that turn off compiler transformations based on undefined behavior (with no guarantees about future “optimizations”). The Postgres database also needs some of these flags as does the libsodium encryption library and even the machine learning tensor-flow package.
But that shouldn't be C's case. The language has been designed with the purpose of writing operating systems. It's exactly where you're supposed to do things like that, from time to time. C was the right tool for that kind of software. It was the portable assembler, really. Now, it a feels more like a downgraded high-level language. A C++ without most of its features.
Still, when I write C code I'm super-pedantic and careful about UB. The fact that I don't like how modern compilers behave in some cases, doesn't mean I deny UB's existence or ignore the ISO documents etc. It's the opposite: I'm obsessed with avoiding UB, because I hate dealing with UB bugs.
-----------------------------------------------------------------------------------------------------------------
EDIT[1]: This text is rough and seriously rushed up. Please, check the whole discussion.
Changes: I added the word "probably" in two places and fixed some typos like missing "lines of code".
EDIT[2]: For future readers ending up here, who might not desire to read a LONG and unstructured discussion about undefined behavior, I believe it's worth sharing a few articles on the same topic:
- How one word broke C by Eskil Steenberg
https://news.quelsolaar.com/2020/03/16/ ... d-broke-c/ - Undefined Behavior: What Happened to My Code? A research paper by Wang et al (MIT)
https://people.csail.mit.edu/nickolai/p ... -07-12.pdf - A proposal for BoringCC by D. J. Bernstein, a researcher in cryptography
https://groups.google.com/g/boring-cryp ... GGp2K1DAAJ - What every compiler writer should know about programmers or
“Optimization” based on undefined behaviour hurts performance by M. Anton Ertl (TU Wien)
http://www.complang.tuwien.ac.at/kps201 ... ion_29.pdf - The Intended Meaning of Undefined Behaviour in C Programs by M. Anton Ertl (TU Wien)
http://www.complang.tuwien.ac.at/papers/ertl17kps.pdf - Some Were Meant for C: The Endurance of an Unmanageable Language by Stephen Kell (University of Cambridge)
https://www.cs.kent.ac.uk/people/staff/ ... eprint.pdf - Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior by Xi Wang et al (MIT)
https://srg.doc.ic.ac.uk/440h/papers/stack.pdf - Optimization-unstable code, a 2013 LWN article
https://lwn.net/Articles/575563/