Hunting UBs game, level 2
Posted: Thu Feb 21, 2019 8:45 am
Dear OSDevers,
I have another interesting optimization issue, and if that's caused by an UB, I failed to see where and what.
The code is extremely simple, it's in my RAM allocator, when I clear the page(s) before I return them to the caller. Just a note, both i and pages are defined as uint64_t in the local stack, and they are 8 bytes aligned.Both vmm_map() and memset() are well tested, and they work fine. Just to be complete, here's the source for the former (both virt_t and phys_t are aliases to uint64_t, I use those to make the source more readable), the latter is written in assembly (hence not influenced by optimizations):
Here are the test results (-ansi -O2 -Wall -Wextra -Wpedantic -fno-delete-null-pointer-checks -fno-stack-protector -mno-red-zone):
- gcc aarch64 mtune=cortex-a53 OK
- clang aarch64 mtune=cortex-a53 OK
- gcc x86_64 mtune=pentium4 OK
- clang x86_64 mtune=pentium4 OK
- gcc x86_64 mtune=core2 BROKEN
- clang x86_64 mtune=core2 OK
If I add these:then what I see on the console with the gcc core2 optimized code, is this:
With debugging, I can confirm that the virtual address of tmpmap1 is correct, and the page is mapped properly there, so the memset does not mess up things on the first iteration for sure.
Clearly i changed more than it should. Now the most interesting part, if I declare i asthen everything works as expected.
So, where is the UB?
Btw, I've used gcc 7.3.0 and clang 7.0.1. Right now I'm compiling the latest gcc 8.2.0 and I will repeat the test with that too.
EDIT: I've just tried (p is another uint64_t local variable)And it runs perfectly without the volatile keyword on i even when optimized for core2.
EDIT2: I take it back. If I remove the kprintf() from the loop, the error cames back, so back to square one. With "volatile i", it works either with or without kprintf and with multiplication or with addition.
Bests,
bzt
I have another interesting optimization issue, and if that's caused by an UB, I failed to see where and what.
The code is extremely simple, it's in my RAM allocator, when I clear the page(s) before I return them to the caller. Just a note, both i and pages are defined as uint64_t in the local stack, and they are 8 bytes aligned.
Code: Select all
for (i=0; i<pages; i++) {
vmm_page(tmpmap1, base+i*__PAGESIZE, PG_CORE_RWNOCACHE);
memset((void*)tmpmap1, 0, __PAGESIZE);
}
Code: Select all
void vmm_page(virt_t virt, phys_t phys, uint16_t access) {
virt_t *ptr;
switch(virt) {
...
case tmpmap1: ptr = tmpmap1ptr; break;
...
}
*ptr = phys | access;
vmm_invalidate(virt); /* this is __asm__ __volatile__ ("invplg (%0)" : : "r"(virt) ); on x86 */
}
- gcc aarch64 mtune=cortex-a53 OK
- clang aarch64 mtune=cortex-a53 OK
- gcc x86_64 mtune=pentium4 OK
- clang x86_64 mtune=pentium4 OK
- gcc x86_64 mtune=core2 BROKEN
- clang x86_64 mtune=core2 OK
If I add these:
Code: Select all
kprintf("i=%d pages=%d", i, pages); /* goes after "for(i=0;...") */
kprintf("vmm_page(%x,%x)\n", virt, phys); /* goes before "switch(virt) {" */
Code: Select all
i=0 pages=1
vmm_page(FFFFFFFF00006000, 3C1000)
i=-429492720 pages=1
vmm_page(FFFFFFFF00006000, FFF000063C1000)
...page fault naturally...
Clearly i changed more than it should. Now the most interesting part, if I declare i as
Code: Select all
volatile uint64_t i;
So, where is the UB?
Btw, I've used gcc 7.3.0 and clang 7.0.1. Right now I'm compiling the latest gcc 8.2.0 and I will repeat the test with that too.
EDIT: I've just tried (p is another uint64_t local variable)
Code: Select all
p = base;
for (i=0; i<pages; i++) {
vmm_page(tmpmap1, p, PG_CORE_RWNOCACHE);
memset((void*)tmpmap1, 0, __PAGESIZE);
p += __PAGESIZE;
}
EDIT2: I take it back. If I remove the kprintf() from the loop, the error cames back, so back to square one. With "volatile i", it works either with or without kprintf and with multiplication or with addition.
Bests,
bzt