Page 1 of 1

Problems with memset() implementation on GCC 10.2.0

Posted: Tue Dec 15, 2020 7:27 pm
by kzinti
I just upgraded my cross compiler to GCC 10.2.0 and my OS crashes early on memset().

I am sure I am doing something wrong and GCC 10.2.0 compiles it into something unexpected:

Code: Select all

void* memset(void* ptr, int value, size_t num)
{
    for (unsigned char* p = ptr; num; --num)
    {
        *p++ = (unsigned char)value;
    }

    return ptr;
}

Code: Select all

ffffffff80006360 <memset>:
ffffffff80006360:	48 85 d2             	test   %rdx,%rdx
ffffffff80006363:	74 13                	je     ffffffff80006378 <memset+0x18>
ffffffff80006365:	55                   	push   %rbp
ffffffff80006366:	40 0f b6 f6          	movzbl %sil,%esi
ffffffff8000636a:	48 89 e5             	mov    %rsp,%rbp
ffffffff8000636d:	e8 ee ff ff ff       	callq  ffffffff80006360 <memset>
ffffffff80006372:	5d                   	pop    %rbp
ffffffff80006373:	c3                   	retq   
ffffffff80006374:	0f 1f 40 00          	nopl   0x0(%rax)
ffffffff80006378:	48 89 f8             	mov    %rdi,%rax
ffffffff8000637b:	c3                   	retq   
ffffffff8000637c:	0f 1f 40 00          	nopl   0x0(%rax)
What happens is I call memset with a non-zero length (in %rdx)... so the code above ends up calling memset() recursively at address ffffffff8000636d until I run out of stack space.

Please help if you can. I refuse to believe the problem is with GCC, I must be missing something.

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Tue Dec 15, 2020 7:30 pm
by nexos
It might be better just to use __builtin_memset IMO.

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Tue Dec 15, 2020 7:33 pm
by kzinti
Agreed. I would still like to understand why it is broken though.

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Tue Dec 15, 2020 7:35 pm
by kzinti
Well what do you know, I am not the first to run into this:

https://github.com/micropython/micropython/issues/6053

It looks like GCC detects that the loop is memset and optimizes the loop by calling... memset. Good times.

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Tue Dec 15, 2020 7:53 pm
by kzinti
Adding "-fno-builtin" when compiling the kernel fixes the issue, but clearly not what I want.

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Tue Dec 15, 2020 8:16 pm
by Octocontrabass
GCC assumes it can emit calls to memcpy(), memmove(), memset(), and memcmp() at any point - including inside your attempt at implementing one of those four functions. As the optimizer gets smarter, it will get better at creating endless recursion loops.

Various GCC bug reports suggest the following function attribute:

Code: Select all

__attribute__((optimize("no-tree-loop-distribute-patterns")))
You can also disable this optimization at a global level, although that seems like a poor choice.

You can also implement those four functions in assembly, to be sure GCC can never create an endless recursion loop.

You can also use Clang, which seems to automatically avoid infinite recursion and/or emitting C library calls in freestanding mode.
nexos wrote:It might be better just to use __builtin_memset IMO.
No, __builtin_memset() is only an optimization hint. The optimizer may still translate __builtin_memset() into a memset() call, and then you'll have a link error due to the undefined function.

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Wed Dec 16, 2020 12:56 am
by kzinti
Thanks, I went with the following at the top of my file:

Code: Select all

#pragma GCC optimize "no-tree-loop-distribute-patterns"

Re: Problems with memset() implementation on GCC 10.2.0

Posted: Fri Dec 18, 2020 5:31 pm
by moonchild
Can also implement strings functions in assembly; this also gives you a pretty easy perf boost, at least on x86. Here are a couple:

Code: Select all

memcpy:
mov rcx, rdx
mov rax, rdi
rep movs byte ptr [rdi], byte ptr [rsi]
ret

memmove:
cmp rdi, rsi
ja memcpy
mov rax, rdi
mov rcx, rdx
lea rdi, [rdi + rdx - 1]
lea rsi, [rsi + rdx - 1]
std
rep movs byte ptr [rdi], byte ptr [rsi]
cld
ret

memset:
mov rcx, rdx
mov rdx, rdi
mov al, sil
rep stos byte ptr [rdi]
mov rax, rdx
ret