Clang emits code making ESP unaligned. Compiler bug?

kzinti · Post by **kzinti** » Sat Apr 03, 2021 5:21 pm

vvaltchev wrote: Now, between "push offset .irq_resume" and "mov eax, esp" I could do something like: "and esp, ~3", and the stack pointer will be aligned but the "struct regs" will be unreadable because now I've moved 0..3 bytes away from it. So, I not only have to re-align the stack pointer, but also shift the whole struct by 0..3 bytes. That's a nightmare.

Yeah, I wouldn't do try to align "struct regs", just leave it unaligned. I suppose that's not helping you shutdown unaligned access warnings.

thewrongchristian wrote: Your example is not optimized. As soon as you enable even minimal optimization (-O), the problem disappears:

Given that one doesn't use XMM registers in kernel space, I am not sure how much of this unaligned stack matters (especially if it is aligned when using anything above O0).

Are you just trying to get a warning-free code base or is there some other concern(s) here?

vvaltchev · Post by **vvaltchev** » Sun Apr 04, 2021 5:37 am

thewrongchristian wrote:Your example is not optimized. As soon as you enable even minimal optimization (-O), the problem disappears:

Yeah, I know that, but I'd like everything to work with -O0 (my debug build) too.

kzinti wrote:Are you just trying to get a warning-free code base or is there some other concern(s) here?

Yes, I do. Briefly, I build and test the project in a variety of configurations (more than 10) using almost all the warnings compilers have, with -Werror, of course. My goal is making a project as robust and portable as possible, even if today supports just x86. Also, it has to be absolutely UB-free too. [UB is my biggest enemy. Coming from C++, I learned that in large-enough code bases there's always an UB lurking somewhere, ready to strike. I hate so much when everything works and it's doing exactly what most people expect to, but in reality, that's just an illusion, that miserably breaks after upgrading the compiler.]

The longer story. I don't really care that the stack is unaligned in debug builds, on x86 with clang. That's fine, the performance of debug builds is not a problem. The problem is that I cannot turn on -fsanitize=alignment because of that. I don't want to display UBSAN warnings or log them somewhere. My policy is: if there might be a problem, just panic. I'll fix it and restart all the testing. Now, some time ago, I realized thanks to a discussion here with @Octocontrabass that we have no ad-hoc option to make unaligned access non-UB. I believed that -fno-strict-aliasing did that as a side effect, but that was simple false. No such thing as -fwrapv for unaligned access. So, just for the moment in kernel where we disallow any kind of SIMD instructions, it looks safe it's to do unaligned access, but in the future that won't be the case, anymore. In the last years, compiler developers are taking advantage of the undefined behavior where allowed by the standard as much as possible. I don't want end up with weird UB bugs when I switch to the newest compiler a few years from now. Warnings like -Wcast-align helped, but not in all the cases. In some cases, we have to check at runtime if unaligned access is performed by "regular" C code.

Here comes the UBSAN: I started introducing it two days ago and it already helped me fixing a few places, not just about unaligned access, but other stuff too, discovering real bugs. For example, in one case I was performing unaligned access needlessly simply because I forgot to add a .align directive in the assembly where I declared two arrays. In another cases, with minor effort I could align a buffer. In other, the access was misaligned because of a wrong offset. Finally, in rare cases, I really needed to support unaligned access as well. For those cases, given that the only standard-compliant way to do read/write memory where the address might be unaligned is to use something like __builtin_memcpy(), that's what I did. I introduced macros like READ_LONG(), WRITE_U32() etc. and I used them in the very few places where the address might not be aligned. Anyway, UBSAN with GCC does the work, but it's sad that with clang I have to conditionally not enable -fsanitize=alignment, because of that weird behavior in leaf functions.

And yes, just to use -fsanitize=alignment for debug builds with clang on x86, I could do insane tricks in the kernel to align the stack pointer on interrupts, but it's not worth the effort. Not only that, it will make that assembly code forked with #ifdefs etc. It would be a mess that I wouldn't be happy to maintain. The compiler should just support an option to prevent that. It's sad that -mstack-alignment has no effect.

nullplan · Post by **nullplan** » Sun Apr 04, 2021 10:17 am

vvaltchev wrote:Yes, I do. Briefly, I build and test the project in a variety of configurations (more than 10) using almost all the warnings compilers have, with -Werror, of course. My goal is making a project as robust and portable as possible, even if today supports just x86. Also, it has to be absolutely UB-free too.

I completely get you.

I thought about the problem a while, and, well, the hard problem is that there is no ABI boundary here. Essentially, on entry to interrupt, you cannot assume RSP to be aligned, because there is no contract with the compiler to always make it so. Indeed, you cannot even assume RSP to point to stack. The compiler might have, for some odd reason, decided to make RSP point somewhere else while it does something, but it would restore RSP before returning. It would be utterly ludicrous for the compiler to do such a thing, but it would be possible. And I have no idea how to tell the compiler that it has to stick to certain constraints all the time. Even just "-mgeneral-regs-only" is something of an innovation, and it does preserve the contract that while certain features may be available in the CPU, they are off-limits to the program you are trying to write (even with eager FPU/vector register saving, using FPU or vectors in kernel space requires you to disable preemption, save the content of those registers, do your thing, then restore those registers, which is usually more trouble than whatever speed you could save). Apparently, there's -mstack-alignment, but as we saw, this does not help here.

The only thing we can do, therefore, is to prepare the actual ABI boundary correctly. Which in this case is the call to higher-level handlers from the interrupt entry code. So if misaligned RSP is a possibility, you can only detect that to be the case, then fix it. What I will do, therefore, is to add code to all my entry stubs, after pushing all registers, to detect misaligned stack. And if the stack is misaligned, to "rep movsb" it to the correct spot (can't be "rep movsq" because I don't know the semantics of that when RSI and RDI overlap, and I don't want to find out.) This also solves a different problem: I can make the stack always be 16-bytes aligned before the call (which is necessary for ABI conformance). Nothing more is necessary, because the old value of RSP is still part of the register image on stack, even after copying it all.

Of course, it is still possible the compiler is misusing RSP for something completely different in some place. However, this would also cause problems with asynchronous signals if no signal stack is registered. And this would be too difficult to detect for the compiler. So I would consider it unlikely that that would ever happen. Also, way too niche for this to ever be useful.

vvaltchev · Post by **vvaltchev** » Sun Apr 04, 2021 12:10 pm

nullplan wrote:I completely get you.

Thank you for the sympathy.

nullplan wrote:I thought about the problem a while, and, well, the hard problem is that there is no ABI boundary here. Essentially, on entry to interrupt, you cannot assume RSP to be aligned, because there is no contract with the compiler to always make it so.

Well, I understand that point of view. But shouldn't a compiler provide an option for that? I've seen compiler options for so tiny details, that I'm surprised that there's no working option for that. After all, the compiler is a tool that should make our life easier, it's it? It feels like quite the opposite, sometimes

nullplan wrote:Indeed, you cannot even assume RSP to point to stack. The compiler might have, for some odd reason, decided to make RSP point somewhere else while it does something, but it would restore RSP before returning. It would be utterly ludicrous for the compiler to do such a thing, but it would be possible.

I didn't get that far but.. yeah, everything is possible at this point. So the radical solution I'm thinking right now to handle that would be each interrupt to have its own stack, exactly like in the case of double faults. If ESP can point anywhere, the CPU cannot use it, not even to write a single byte. We'd need a dedicated TSS for that (I'm talking about i686). But the thing is, with this approach, we cannot support nested interrupts of the same type. OK, in practice, nested interrupts of the same type are almost never allowed, but still, it would become impossible because as a 2nd interrupt (of the same type) comes, we won't be able to resume the previous one, anymore. Also, it would be waste to allocate a whole page, for every interrupt number, but we couldn't avoid that, otherwise we wouldn't be able to handle nested interrupts of different types.

nullplan wrote:The only thing we can do, therefore, is to prepare the actual ABI boundary correctly. Which in this case is the call to higher-level handlers from the interrupt entry code. So if misaligned RSP is a possibility, you can only detect that to be the case, then fix it. What I will do, therefore, is to add code to all my entry stubs, after pushing all registers, to detect misaligned stack. And if the stack is misaligned, to "rep movsb" it to the correct spot (can't be "rep movsq" because I don't know the semantics of that when RSI and RDI overlap, and I don't want to find out.) This also solves a different problem: I can make the stack always be 16-bytes aligned before the call (which is necessary for ABI conformance). Nothing more is necessary, because the old value of RSP is still part of the register image on stack, even after copying it all.

Yeah, that's doable, but it's tricky and it will have a performance impact, if I keep it in all the cases. Otherwise, if I have to keep it just for clang and/or debug builds, it could be a mess to maintain. No happy end here.

nullplan wrote:Of course, it is still possible the compiler is misusing RSP for something completely different in some place. However, this would also cause problems with asynchronous signals if no signal stack is registered. And this would be too difficult to detect for the compiler. So I would consider it unlikely that that would ever happen. Also, way too niche for this to ever be useful.

Sorry, I didn't quite get this last part. Did you refer to the case mentioned above when RSP doesn't even point to the stack?

nullplan · Post by **nullplan** » Sun Apr 04, 2021 11:16 pm

vvaltchev wrote: Sorry, I didn't quite get this last part. Did you refer to the case mentioned above when RSP doesn't even point to the stack?

Yes. If an asynchronous signal arrives, the kernel uses the current RSP to calculate the position of the signal frame. (Ab)Using RSP for something different is therefore only safe if signals are blocked, or all handled signals have a signal stack (you have to register a signal stack with sigaltstack(), then register the signal handlers with SA_ONSTACK). Both of these are dynamic runtime properties, so in general, telling if a program is going to be in such a state is as hard as the halting problem. Therefore I really don't think they would ever implement something like this in the compiler.

Korona · Post by **Korona** » Tue Apr 06, 2021 12:11 pm

I still think that this is a compiler bug (or at least a QoI bug), even if there is no ABI boundary.

vvaltchev · Post by **vvaltchev** » Tue Apr 06, 2021 3:53 pm

Korona wrote:I still think that this is a compiler bug (or at least a QoI bug), even if there is no ABI boundary.

For what is worth, I agree with you. I hope the clang guys will agree with us, but I'm skeptic. For the moment, I got no answer: https://bugs.llvm.org/show_bug.cgi?id=49828

Just, allow me to add a short rant about all that. Of course, I'm not expecting you nor anybody to necessarily agree with me. I'll be just stating my personal opinion.
We should keep reasonable boundaries about what compilers can do. I oppose the idea of taking everything single tiny detail in the ISO C standard and abusing it as much as possible, with an adversarial implementation. Unless the compiler can warn me at compile-time about non-obvious behavior, I'm not happy with it taking the freedom to do "whatever is legally allowed by the standard". Language lawyers could at this point state that I'm against the ISO standard. Well, it's not a binary thing: you either love it, 100% of it, or you hate it, 100% of it. There are aspects of the standard I simply don't agree with. I believe that in many places "undefined behavior" should be replaced with "implementation-specific behavior" which is a totally different story. An implementation-specific behavior does not limit the language and its implementation on new platforms, nor it forces different compilers to behave the same way. It simply requires compilers to explicitly declare what will happen in certain situations. Because the behavior has to be defined (even if not by the standard), compilers will be forced to add options when multiple behaviors make sense and not just take an arbitrary decision because it's undefined behavior.

Anyway, back to our topic, if you believe that clang's behavior in this case is really a problem and if you feel like you wanna help me convincing the clang guys to "do something" about it (e.g. adding an option), you're more than welcome to add a comment there.. if enough people agree that's a problem, sometimes things change.

kzinti · Post by **kzinti** » Tue Apr 06, 2021 4:03 pm

I am not sure that it is even possible to get what you want.

There has been a lot of back and forth in this thread around ia32 vs x86_64 and 8 vs 16 bytes alignment.

If I understanding correctly, this thread started with the desire to have 8 bytes alignment in 32 bits mode. This is not possible as pushing values on the stack will misalign your stack. Even calling a function will misalign your stack by pushing the return value.

The same problem exists in 64 bits mode: how are you going to enforce 16 bytes alignment? Any call instruction will break that alignment (and if you push something before making the call, your push instruction breaks it).

The reality is kernel development is always going to be tricky. Maybe you don't care about 16 bytes alignment in the kernel (since you are not using XMM registers anyways). If you aren't willing (or can't) ignore the issue, the best thing you can do is probably just to align the stack yourself on interrupt entry.

vvaltchev · Post by **vvaltchev** » Tue Apr 06, 2021 6:23 pm

kzinti wrote:There has been a lot of back and forth in this thread around ia32 vs x86_64 and 8 vs 16 bytes alignment.

If I understanding correctly, this thread started with the desire to have 8 bytes alignment in 32 bits mode. This is not possible as pushing values on the stack will misalign your stack. Even calling a function will misalign your stack by pushing the return value.

Sorry, maybe there has been a misunderstanding because of other people's comments but, from my part the discussion has never been about having 8 vs 16 bytes alignment. It has been about having the minimum alignment of 4 bytes on ia32 and 8 bytes on x86_64. Pushing values on the stack will always keep the alignment at least at that value ( 4 or 8 ). Asking for the minimum alignment is absolutely sane, IMHO. Isn't a 32-bit integer required to have a 4-byte alignment, in order to avoid UB, by the ISO standard? So should the stack[1], still IMHO. The stack has always been aligned. And aligned always meant pointer-size alignment, nothing more. Sure, if you need to use SIMD instructions, you might need to align the stack by yourself (with "you" I mean "the compiler", which might do that in the callee, because it cannot rely on the stack being previously aligned that way, and that's perfectly fine).

kzinti wrote:The same problem exists in 64 bits mode: how are you going to enforce 16 bytes alignment? Any call instruction will break that alignment (and if you push something before making the call, your push instruction breaks it).

Well, in theory the same problem might exist on x86_64, but in practice clang never does that: no matter what I do, the stack is aligned at least at 8 bytes (= pointer-size alignment).

kzinti wrote:The reality is kernel development is always going to be tricky. Maybe you don't care about 16 bytes alignment in the kernel (since you are not using XMM registers anyways). If you aren't willing (or can't) ignore the issue, the best thing you can do is probably just to align the stack yourself on interrupt entry.

Well, for the moment, I'm ignoring the issue because it affects just one sanitizer, but that doesn't mean I'm happy with such behavior.

[1] In this case, I tried searching the C99 standard for anything about stack's alignment, and there's absolutely nothing. There's more: the word "stack" does not appear in the whole C99 document, not even once. So, "legally" compilers can do whatever they want, but I can oppose some particular behavior stating that they're not backed up by the ISO standard. In other words, neither party can "appeal to authority". I can just claim on my own that's a bad behavior. For example, GCC does not do that. [That doesn't make GCC better than clang, because I have examples where GCC is the one does crazy stuff, while clang is conservative. I'm talking about this specific case.]

kzinti · Post by **kzinti** » Tue Apr 06, 2021 6:28 pm

You are correct, the C (and C++) standard do not talk about the stack at all, and this is intentional. Unaligned accesses on x86, unfortunately as it might be, are perfectly valid.

The standards say that accessing unaligned data is UB. It just happens that UB in this case means that it just works.

Korona · Post by **Korona** » Wed Apr 07, 2021 5:12 am

This is not a C standard issue but a C ABI issue. My reading of the ELF ABI for i386 is that it at least intends to specify a stack alignment: in Table 2.2, the top of the stack is specified as ebp + 4n + 8 and ebp is specified to be esp minus a multiple of 4 (in case a frame pointer is used). It does not contain exact wording for that, but I don't think that the ABI is supposed to allow misaligned stacks. For x8_64, it does mention that leaf functions are allowed not to modify esp at all, and that will happen with any -O flag, I guess (?). But it does not state that leaf functions are allowed to violate the ABI in any other way.

My guess is that this does not bite existing systems because they (i) usually build with GCC, or (ii) build with optimizations.

EDIT: I expressed this opinion on the Clang bugtracker; let's see if we get a fix.

vvaltchev · Post by **vvaltchev** » Wed Apr 07, 2021 7:19 am

Korona wrote:This is not a C standard issue but a C ABI issue.

You're absolutely right. I derailed the conversation mentioning the ISO standard, which has nothing to do with the ABI.

Korona wrote:My reading of the ELF ABI for i386 is that it at least intends to specify a stack alignment: in Table 2.2, the top of the stack is specified as ebp + 4n + 8 and ebp is specified to be esp minus a multiple of 4 (in case a frame pointer is used). It does not contain exact wording for that, but I don't think that the ABI is supposed to allow misaligned stacks. For x8_64, it does mention that leaf functions are allowed not to modify esp at all, and that will happen with any -O flag, I guess (?). But it does not state that leaf functions are allowed to violate the ABI in any other way.

Makes sense to me.

Korona wrote:My guess is that this does not bite existing systems because they (i) usually build with GCC, or (ii) build with optimizations.

Yep, they don't, at the same time:
- build with -O0
- use clang
- use -fsanitize=alignment
- build for i686
At least one of those four parameters is different. On Linux, -fsanitize=alignment is enabled ONLY when the architecture does not support natively unaligned access: https://www.kernel.org/doc/html/v4.14/d ... ubsan.html

Quoting them directly:

Linux documentation wrote: Detection of unaligned accesses controlled through the separate option - CONFIG_UBSAN_ALIGNMENT. It’s off by default on architectures that support unaligned accesses (CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y). One could still enable it in config, just note that it will produce a lot of UBSAN reports.

They know that on x86 -fsanitize=alignment will generate a ton of reports and just don't care about that, because the hardware supports unaligned access. I wonder what will happen when compiler engineers find a way to take advantage of that UB case even when SIMD instructions are disallowed and Linux, along with many other popular projects, breaks.

Korona wrote:EDIT: I expressed this opinion on the Clang bugtracker; let's see if we get a fix.

Thank you so much for that!

Ethin · Post by **Ethin** » Fri Apr 16, 2021 9:59 pm

For what its worth, this doesn't happen in Rust, so I don't think its a bug in LLVM itself. This rust code:

Code: Select all

#![no_std]
#![no_main]
#![feature(abi_x86_interrupt)]
#![feature(lang_items)]

use core::panic::PanicInfo;

extern "x86-interrupt" fn handle_interrupt() {}

#[inline(always)]
#[no_mangle]
extern "C" fn bar(_: u32) {}

#[no_mangle]
extern "C" fn foo() {
    bar(0);
}

#[lang = "eh_personality"]
#[no_mangle]
pub extern "C" fn rust_eh_personality() {}

#[panic_handler]
fn panic_handler(_: &PanicInfo) -> ! {
    loop {}
}

Generates the expected assembly (at least on i686-unknown-linux-gnu):

Code: Select all

	.section	.text.bar,"ax",@progbits
	.globl	bar
	.p2align	4, 0x90
	.type	bar,@function
bar:
	.cfi_startproc
	retl
.Lfunc_end0:
	.size	bar, .Lfunc_end0-bar
	.cfi_endproc

	.section	.text.foo,"ax",@progbits
	.globl	foo
	.p2align	4, 0x90
	.type	foo,@function
foo:
	.cfi_startproc
	jmp	.LBB1_1
.LBB1_1:
	retl
.Lfunc_end1:
	.size	foo, .Lfunc_end1-foo
	.cfi_endproc

I'm not sure how else to test this though -- I can't seem to reproduce it.

vvaltchev · Post by **vvaltchev** » Sat Apr 17, 2021 7:54 am

Ethin wrote:For what its worth, this doesn't happen in Rust, so I don't think its a bug in LLVM itself.

Probably it's a bug in the clang front-end, not in LLVM.

OSDev.org

Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?

Re: Clang emits code making ESP unaligned. Compiler bug?