Page 1 of 4

Finding an UB

Posted: Thu Feb 14, 2019 1:50 pm
by bzt
Hi,

Can you C language and undefined behaviour experts please help me and explain what the problem is with this code?

Code: Select all

void lang_init()
{
    char fn[]="/sys/lang/core.\0\0\0\0\0";
    char *s,*e,*a;
    int i=0,l,k;

    memcpy(fn+15, lang, 2);             <---- I would expect the lang string to be copied into fn array after the dot.
    kprintf("%x %x %s\n", fn, fn+15, fn);
When compiled with "-ansi -freestanding -fno-stack-protector -fno-builtins -Wall -Wextra -Wpedantic", I got no warnings or errors of any kind. Compilers used:
- gcc (GCC) 8.2.1 20180831
- clang version 7.0.0 (tags/RELEASE_700/final)

Compiled for x86_64 with gcc -O0 outputs: "FFFFFFFFFFFFFF70 FFFFFFFFFFFFFF7F /sys/lang/core.en" - difference 15 bytes.
Compiled for x86_64 with gcc -O2 outputs: "FFFFFFFFFFFFFF30 FFFFFFFFFFFFFF3F /sys/lang/core.en" - difference 15 bytes.
Compiled for x86_64 with Clang -O0 outputs: "FFFFFFFFFFFFFF70 FFFFFFFFFFFFFF7F /sys/lang/core.en" - difference 15 bytes.
Compiled for x86_64 with Clang -O2 outputs: "FFFFFFFFFFFFFF50 FFFFFFFFFFFFFF5F /sys/lang/core.en" - difference 15 bytes.
Compiled for AArch64 with gcc -O0 outputs: "FFFFFFFFFFFFFF78 FFFFFFFFFFFFFF87 /sys/lang/core.en" - difference 15 bytes.
Compiled for AArch64 with gcc -O2 outputs: "FFFFFFFFFFFFFF80 FFFFFFFFFFFFFF8F /sys/lang/core.en" - difference 15 bytes.
Compiled for AArch64 with Clang -O0 outputs: "FFFFFFFFFFFFFD7E FFFFFFFFFFFFFD8D /sys/lang/core.en" - difference 15 bytes.
Compiled for AArch64 with Clang -O2 outputs: "FFFFFFFFFFFFFF08 FFFFFFFFFFFFFF0F /sys/laen/core." - Ooooooops! difference is only 7!

Now if I change the storage class and place the character array in the data segment instead of the local stack like this:

Code: Select all

char fn[]="/sys/lang/core.\0\0\0\0\0";

void lang_init()
{
    char *s,*e,*a;
    int i=0,l,k;

    memcpy(fn+15, lang, 2);
    kprintf("%x %x %s\n", fn, fn+15, fn);
Then suddenly all 8 combinations generate the same good output, and calculate fn+15 correctly. I've repeated all tests with "&fn[15]", then again with explicit array dimension "char fn[26]=" too, just to be through. All 64 tests ended with the same results.
I've looked up all my C language books, but neither mentions anything even remotely suggesting there could be an undefined behaviour with character arrays, other than indexing out of bounds (which is not the case here).

So my question is, what's causing the Clang AArch64 optimizer to miscompile when the storage class is the local stack? Where is the UB?

Objdumps (for AArch64, Clang and -O2 only, I can provide the x86_64, gcc and -O0 versions too if you need them, but these dumps are long enough already):

AArch64 Clang -O2, local stack (BROKEN)

Code: Select all

ffffffffffe06b68 <lang_init>:
ffffffffffe06b68:       d101c3ff        sub     sp, sp, #0x70
ffffffffffe06b6c:       f0000008        adrp    x8, ffffffffffe09000 <platform_dbgputc+0x4ec>
ffffffffffe06b70:       91096108        add     x8, x8, #0x258
ffffffffffe06b74:       3cc0a100        ldur    q0, [x8, #10]
ffffffffffe06b78:       3dc00101        ldr     q1, [x8]
ffffffffffe06b7c:       a90267fa        stp     x26, x25, [sp, #32]
ffffffffffe06b80:       a9035ff8        stp     x24, x23, [sp, #48]
ffffffffffe06b84:       a90457f6        stp     x22, x21, [sp, #64]
ffffffffffe06b88:       a9054ff4        stp     x20, x19, [sp, #80]
ffffffffffe06b8c:       a9067bfd        stp     x29, x30, [sp, #96]
ffffffffffe06b90:       3c80a3e0        stur    q0, [sp, #10]
ffffffffffe06b94:       3d8003e1        str     q1, [sp]
ffffffffffe06b98:       b0000074        adrp    x20, ffffffffffe13000 <_binary_font_end+0xcd0>
ffffffffffe06b9c:       f9428294        ldr     x20, [x20, #1280]
ffffffffffe06ba0:       910003e8        mov     x8, sp
ffffffffffe06ba4:       b2400d13        orr     x19, x8, #0xf
ffffffffffe06ba8:       321f03e2        orr     w2, wzr, #0x2
ffffffffffe06bac:       aa1303e0        mov     x0, x19
ffffffffffe06bb0:       aa1403e1        mov     x1, x20
ffffffffffe06bb4:       910183fd        add     x29, sp, #0x60
ffffffffffe06bb8:       94000166        bl      ffffffffffe07150 <memcpy>
ffffffffffe06bbc:       d0000000        adrp    x0, ffffffffffe08000 <mbox_call+0xe8>
ffffffffffe06bc0:       913c9000        add     x0, x0, #0xf24
ffffffffffe06bc4:       910003e1        mov     x1, sp
ffffffffffe06bc8:       910003e3        mov     x3, sp
ffffffffffe06bcc:       aa1303e2        mov     x2, x19
ffffffffffe06bd0:       97fff503        bl      ffffffffffe03fdc <kprintf>
AArch64 Clang -O2, data segment (WORKS)

Code: Select all

ffffffffffe06b68 <lang_init>:
ffffffffffe06b68:       a9bb67fa        stp     x26, x25, [sp, #-80]!
ffffffffffe06b6c:       a9015ff8        stp     x24, x23, [sp, #16]
ffffffffffe06b70:       a90257f6        stp     x22, x21, [sp, #32]
ffffffffffe06b74:       a9034ff4        stp     x20, x19, [sp, #48]
ffffffffffe06b78:       a9047bfd        stp     x29, x30, [sp, #64]
ffffffffffe06b7c:       b0000074        adrp    x20, ffffffffffe13000 <_binary_font_end+0xcb0>
ffffffffffe06b80:       f9431694        ldr     x20, [x20, #1576]
ffffffffffe06b84:       b0000075        adrp    x21, ffffffffffe13000 <_binary_font_end+0xcb0>
ffffffffffe06b88:       f94292b5        ldr     x21, [x21, #1312]
ffffffffffe06b8c:       321f03e2        orr     w2, wzr, #0x2
ffffffffffe06b90:       91003e93        add     x19, x20, #0xf
ffffffffffe06b94:       aa1303e0        mov     x0, x19
ffffffffffe06b98:       aa1503e1        mov     x1, x21
ffffffffffe06b9c:       910103fd        add     x29, sp, #0x40
ffffffffffe06ba0:       94000166        bl      ffffffffffe07138 <memcpy>
ffffffffffe06ba4:       d0000000        adrp    x0, ffffffffffe08000 <mbox_call+0x100>
ffffffffffe06ba8:       913c3000        add     x0, x0, #0xf0c
ffffffffffe06bac:       aa1403e1        mov     x1, x20
ffffffffffe06bb0:       aa1303e2        mov     x2, x19
ffffffffffe06bb4:       aa1403e3        mov     x3, x20
ffffffffffe06bb8:       97fff509        bl      ffffffffffe03fdc <kprintf>
Thanks,
bzt

Re: Finding an UB

Posted: Fri Feb 15, 2019 6:07 am
by Octocontrabass
According to the disassembly, the "broken" version is passing SP as the second and fourth arguments to your kprintf function. Thanks to that, we can see that SP has the value 0xFFFFFFFFFFFFFF08 at the time kprintf is called.

The AArch64 calling conventions require the stack to be 16-byte aligned (SP mod 16 == 0). It looks like your stack is not aligned correctly.

Once you fix the stack alignment, the problem should go away.

Re: Finding an UB

Posted: Fri Feb 15, 2019 6:19 am
by Solar
bzt wrote:Hi,

Can you C language and undefined behaviour experts please help me and explain what the problem is with this code?

Code: Select all

void lang_init()
{
    char fn[]="/sys/lang/core.\0\0\0\0\0";
    char *s,*e,*a;
    int i=0,l,k;

    memcpy(fn+15, lang, 2);             <---- I would expect the lang string to be copied into fn array after the dot.
    kprintf("%x %x %s\n", fn, fn+15, fn);
  • Your memcpy() is writing into the memory of a string literal. String literals, even though of non-const type (in C), are not modifiable, and writing to them is undefined behavior. (In C++, their type is actually const char [].) <- Invalid; I missed that it is indeed char fn[], not char *. Sorry.
  • Your kprintf() -- assuming it's functionally identical to a "real" printf() -- is using the wrong conversion specifiers for printing addresses. %x is for unsigned integers, not pointers. Pointers are to be printed using %p (and be cast to void * in the argument list). (Rationale: sizeof( unsigned int ) may be not equal to sizeof( char * ). And sizeof( void * ) may be not equal to sizeof( char * ). Also, binary representation may differ. No, really.)
The kprintf() problem, assuming your kprintf() makes use of appropriate warning machinery, should have generated a warning if you're using -Wformat (which is included in -Wall).

Re: Finding an UB

Posted: Fri Feb 15, 2019 6:32 am
by Octocontrabass
Solar wrote:Your memcpy() is writing into the memory of a string literal. String literals are not modifiable.
If it were a pointer to a string literal, it would not be modifiable. It's an array being initialized by a string literal, which means the array is writable.

Re: Finding an UB

Posted: Fri Feb 15, 2019 6:36 am
by Solar
Ah. :oops: OK, I missed that. It's been some time since I dabbled with C string literals, sorry. (Edited my post accordingly.)

Re: Finding an UB

Posted: Fri Feb 15, 2019 9:10 am
by Octocontrabass
Solar wrote:The kprintf() problem, assuming your kprintf() makes use of appropriate warning machinery, should have generated a warning if you're using -Wformat (which is included in -Wall).
You can add this functionality to your kprintf() using a function attribute. If I'm not mistaken, you would need to put "__attribute__ ((format (printf, 1, 2)))" in the kprintf() declaration.

Re: Finding an UB

Posted: Fri Feb 15, 2019 11:34 am
by bzt
Octocontrabass wrote:The AArch64 calling conventions[/url] require the stack to be 16-byte aligned (SP mod 16 == 0). It looks like your stack is not aligned correctly.
Hmmm, (SP mod 16 == 8 ) on x86, (SP mod 16 == 0 ) on ARM. That SysV ABI is not so obvious after all.
Once you fix the stack alignment, the problem should go away.
It didn't. :-( Moving fn[] into the data segment is the only thing that helps.

Cheers,
bzt

Re: Finding an UB

Posted: Fri Feb 15, 2019 11:42 am
by Octocontrabass
bzt wrote:
Once you fix the stack alignment, the problem should go away.
It didn't. :-(
What output do you get now that the stack is aligned?

Re: Finding an UB

Posted: Fri Feb 15, 2019 2:07 pm
by zaval
Of course, I only looked at it a few minuts, but this line:

Code: Select all

ffffffffffe06ba4:       b2400d13        orr     x19, x8, #0xf
is hilarious. Looks like this is a purely Clang bug (added later: no, it's not, the stack 16 byte alignment assumed, then it should work). It loads sp into x8 and then "thinks" that calculation of fn+15 could be done by or'ing the x8 value (which is fn as well) with 0x0f. Instead of adding it!

btw, this also shows what an "optimization" it provides. the first argument (fn+15) should go into x0, but CLang does calculations (wrong ones) in the x19 and then copies (why?) it into x0. :/

PS. Of course, as Octacontrabass said, if the stack is aligned at 16 (and it should, if it's compliant with the given ABI), then that "arithmetic" should work.

Re: Finding an UB

Posted: Fri Feb 15, 2019 2:38 pm
by Octocontrabass
zaval wrote:It loads sp into x8 and then "thinks" that calculation of fn+15 could be done by or'ing the x8 value (which is fn as well) with 0x0f. Instead of adding it!
Most likely, it saw two possibilities to get the desired result and picked one at random. (Or perhaps it's targeting some strange CPU where ADD is more expensive than ORR.)
zaval wrote:btw, this also shows what an "optimization" it provides. the first argument (fn+15) should go into x0, but CLang does calculations in the x19 and then copies (why?) it into x0. :/
That's so it can reuse the value after the memcpy call without calculating it again. The ABI guarantees x19 won't be modified by called functions.

Re: Finding an UB

Posted: Fri Feb 15, 2019 2:45 pm
by zaval
Octocontrabass wrote: That's so it can reuse the value after the memcpy call without calculating it again. The ABI guarantees x19 won't be modified by called functions.
good point. it reuses the calculated value for the kprintf call.

Re: Finding an UB

Posted: Fri Feb 15, 2019 2:59 pm
by bzt
Thanks for your answers!

Actually I'm starting to think there must be something wrong with my qemu. I got another strange issue, a most peculiar one. Again, only on AArch64 and only with -O2, but this time with gcc too. Now I'm testing my vmm code, in which I'm building up paging tables. I can confirm that the memory map is parsed correctly, and I have a valid list of free memory pages.

Now suddenly, after calling it about two dozen times, I got an "out of RAM" panic from the physical memory allocator. Shouldn't happen during boot, I still have hundred of thousands of free pages. I've dumped the memory, and the reason for the panic is, my free memory list is gone! All filled up with zeros. First I suspected that I may have messed up the paging tables, but no, I'm still at identity paging, and I haven't touched ttbr0_el1 yet (and only read ttbr1_el1). At qemu monitor I've dumped again with "x", full of zeros. Unfortunately "info tbl" and "info mem" not available for aarch64, but I've dumped the physical memory too with "xp", the same, full of zeros. So it's definitely not paging-related. Wtf? Where have my list gone?

Then I started to suspect maybe there's something wrong with the memset (I clear every page before I return it from the physical memory allocator). So I traced every single memset call, but no, it's not an out of bound memset clearing the list. Then I recompiled everything with "-g", and fired up gdb. I've set a watchpoint at the free memory list's address, and run the vm.

And here's the most interesting part: gdb did not trigger the watchpoint when the free memory list disappeared! I have the list when watchpoint stops on legal access, then at next time just zeros. This suggests that it's not a faulty code that accidentally clears that memory area, but something with qemu itself. How could a memory area be gone without any guest code accessing it? Or could it be that gdb-server is buggy and misses a watchpoint? I don't think so.

So updating qemu is the next step, and we'll see.

Cheers,
bzt

Re: Finding an UB

Posted: Fri Feb 15, 2019 3:04 pm
by Combuster
I wonder now if your "aligned stack" is actually the same problem as your other "aligned stack", which wasn't actually aligned and prompted you to blame the compiler?

Re: Finding an UB

Posted: Fri Feb 15, 2019 3:53 pm
by bzt
Combuster wrote:I wonder now if your "aligned stack" is actually the same problem as your other "aligned stack", which wasn't actually aligned and prompted you to blame the compiler?
I've never blamed the compiler. Where did you get that?

And no, all stacks are properly aligned, as I've already said. SP mod 16 == 8 on x86, SP mod 16 == 0 on AArch64. It also doesn't explain why gdb watchpoint isn't triggered.

Just for the records, now I do blame GNU glibc. I try to recompile qemu, and I got:

Code: Select all

util/memfd.c:38:12: error: static declaration of 'memfd_create' follows non-static declaration
 static int memfd_create(const char *name, unsigned int flags)

/usr/include/bits/mman-shared.h:46:5: note: previous declaration of 'memfd_create' was here
int memfd_create (const char *__name, unsigned int __flags) __THROW;
Before the last system update it was working just fine. So good they take care of backward-compatibility when they change the C headers! It's so good we have all of those #ifdefs, features.h and POSIX standards! Now isn't that just wonderful? :-D :-D :-D I'm going to download the latest source, let's hope qemu developers are keeping up with them.

Cheers,
bzt

Re: Finding an UB

Posted: Fri Feb 15, 2019 6:29 pm
by Solar
bzt wrote:
Combuster wrote:I wonder now if your "aligned stack" is actually the same problem as your other "aligned stack", which wasn't actually aligned and prompted you to blame the compiler?
I've never blamed the compiler. Where did you get that?