Korona wrote:What you listed (Linux, GNOME etc.) is not really legacy software that became broken due to the introduction of UB.
So, you're ignoring the 2013 LWN article:
LWN wrote:40% of 8500+ C/C++ packages have "optimization-unstable code"
And the fact that
type punning with casts was THE DOCUMENTED way to use Berkley sockets?
Also, while the Linux kernel never cared about relaying on GCC extensions, certainly they did
not start in the '90s with -fwrapv, -fno-strict-aliasing. The Linux kernel
broke multiple times because of UB and that caused security bugs as well:
https://lwn.net/Articles/342330/. That's why it started to build with -fwrapv, -fno-delete-null-pointer-check, -fno-strict-aliasing and other options. In my previous reply I was supposed to show that A TON of code broke cause of UB and I believe I've done that.
There's also a research paper about that:
https://srg.doc.ic.ac.uk/440h/papers/stack.pdf
It shows the presence of UB in plenty of projects like: Binutils, e2fsprogs, FFmpeg+Libav, FreeType, GRUB, HiStar, Kerberos, libX11, libarchive, libgcrypt, Linux kernel, Mozilla, OpenAFS, plan9port, Postgres, Python, QEMU, Ruby+Rubinius, Sane, uClibc, VLC, Xen, Xpdf.
Is still all of that not enough?
Korona wrote:The moment GCC breaks their unaligned access code, they will add a flag to disable that optimization.
Such an option does not exist at the moment as I've shown in the WONTFIX GCC bug. So, it will be interesting to see what will happen in that case. I've tried opening a conversation about that on LKML, but they didn't seem to care enough.
Korona wrote:That leaves unaligned access as the only somewhat convincing example.
I have no problem with you believing that narrative. I honestly believe that I've shown plenty of evidence that a ton of well-established code, examples and documentation broke because of UB, over the years. No point in further insisting repeating the same things. Let's agree to disagree.
Korona wrote:I think the reason that unaligned access is often found in the wild is that compilers traditionally cannot do a lot of optimizations based on aligned access
That's true, but it's not the only reason. Unaligned access is done using very
natural expressions in C. What is
unnatural is using memcpy() for that. Again, no matter what the standard technically says it's OK to do. Let's forget for a moment what is supposed to be "right" and what is supposed to be "wrong" and just observe what developers did: until compilers allowed something, they took advantage of it.
It's the same for type punning with casts: it's pretty natural to do that in C and people have been doing it for
decades, before and after the ANSI C came out. People stopped doing that in the last decade, after the strict aliasing rule has been enforced. Type-punning with unions existed and it was considered the "super-safe" and "super-portable" way of doing type punning, while using casts was the mainstream practice. Today, type punning with unions is barely acceptable in C and it's possible that will be made completely UB in some future. memcpy() (or __builtin_memcpy) will remain the only safe option, as already happened in C++. We will have to tell the compiler to COPY some data in a local variable, modify it and then the compiler to COPY it back, relying that it will WON'T do any of that in the emitted code. That means we gave up the ability to tell the machine directly what to do (the old "do what I say" paradigm), but we've gained better portability and more powerful optimizations. I hope you can agree at least on that. It's not all necessarily bad, I NEVER said that. Simply, it's not what it used to be. For most of the code, that's fine, but for some low-level code that need to do type-punning and stuff like that, it would mean writing more verbose and less expressive code.
Korona wrote:Unaligned access is also one of the types of UB that is trivial to detect by UBSAN (in contrast to data races, among others).
At runtime, many types of UB are trivial to detect. The problem is exploring all the code paths at runtime.