I have another interesting optimization issue, and if that's caused by an UB, I failed to see where and what.
The code is extremely simple, it's in my RAM allocator, when I clear the page(s) before I return them to the caller. Just a note, both i and pages are defined as uint64_t in the local stack, and they are 8 bytes aligned.
Code: Select all
for (i=0; i<pages; i++) {
vmm_page(tmpmap1, base+i*__PAGESIZE, PG_CORE_RWNOCACHE);
memset((void*)tmpmap1, 0, __PAGESIZE);
}
Code: Select all
void vmm_page(virt_t virt, phys_t phys, uint16_t access) {
virt_t *ptr;
switch(virt) {
...
case tmpmap1: ptr = tmpmap1ptr; break;
...
}
*ptr = phys | access;
vmm_invalidate(virt); /* this is __asm__ __volatile__ ("invplg (%0)" : : "r"(virt) ); on x86 */
}
- gcc aarch64 mtune=cortex-a53 OK
- clang aarch64 mtune=cortex-a53 OK
- gcc x86_64 mtune=pentium4 OK
- clang x86_64 mtune=pentium4 OK
- gcc x86_64 mtune=core2 BROKEN
- clang x86_64 mtune=core2 OK
If I add these:
Code: Select all
kprintf("i=%d pages=%d", i, pages); /* goes after "for(i=0;...") */
kprintf("vmm_page(%x,%x)\n", virt, phys); /* goes before "switch(virt) {" */
Code: Select all
i=0 pages=1
vmm_page(FFFFFFFF00006000, 3C1000)
i=-429492720 pages=1
vmm_page(FFFFFFFF00006000, FFF000063C1000)
...page fault naturally...
Clearly i changed more than it should. Now the most interesting part, if I declare i as
Code: Select all
volatile uint64_t i;
So, where is the UB?
Btw, I've used gcc 7.3.0 and clang 7.0.1. Right now I'm compiling the latest gcc 8.2.0 and I will repeat the test with that too.
EDIT: I've just tried (p is another uint64_t local variable)
Code: Select all
p = base;
for (i=0; i<pages; i++) {
vmm_page(tmpmap1, p, PG_CORE_RWNOCACHE);
memset((void*)tmpmap1, 0, __PAGESIZE);
p += __PAGESIZE;
}
EDIT2: I take it back. If I remove the kprintf() from the loop, the error cames back, so back to square one. With "volatile i", it works either with or without kprintf and with multiplication or with addition.
Bests,
bzt