Problem with on-demand page mapping on x86_64 (Solved: red zone issue)
Posted: Thu Mar 20, 2025 6:21 pm
So I've been working on a port of my OS from x86 to x86_64 and tasking is basically functional again, but now I'm a bit stuck with my on-demand page mapping.
When loading ELF binaries/shared objects I don't load the actual content but only keep track of the area and where to load it from. Then, when a page fault occurs, I map that page, load the content and let the code continue. This worked fine on x86 previously.
Now I'm getting a very funny behaviour. This unsuspicious looking example code:
Causes the following, interesting log output:
Looking at my stringLength function, it disassembles to this:
Looking at where exactly the page fault occurs, it happens exactly at the movzbl instruction at ffffffff80221997.
It's quite strange to me. Could it be that the movzbl instruction is partially completed - possible source pointer already incremented or something like this? Or is it rather due to register corruption at some point?
Yes I‘m doing invlpg and I also tried a memory barrier, but all to no avail…
When loading ELF binaries/shared objects I don't load the actual content but only keep track of the area and where to load it from. Then, when a page fault occurs, I map that page, load the content and let the code continue. This worked fine on x86 previously.
Now I'm getting a very funny behaviour. This unsuspicious looking example code:
Code: Select all
auto dependency = (g_elf_dependency*) heapAllocate(sizeof(g_elf_dependency));
const char* libName = (const char*) (object->dynamicStringTable + it->d_un.d_val);
logInfo("%! now trying to read string length of %x for first time...", "debug", libName);
logInfo("%# stringLength first time: %i", stringLength(libName));
logInfo("%# stringLength second time: %i", stringLength(libName));
logHexDump8((void*) libName, 0, 2);
Code: Select all
debug: now trying to read string length of 0x0000000000a63abc for first time... // task starts and wants the length of "libc.so"
pagefault: accessed 0x0000000000a63abc from 0xffffffff80221997 // task faults because page is not there
pagefault: ondemand mapped: 0x0000000000a63000 // page is mapped in exception handler
0x0000000000a63abc: 0x6c 0x69 0x62 0x63 0x2e 0x73 0x6f 0x0 libc.so // exception handler dumps memory after mapping the page
0x0000000000a63ac4: 0x6c 0x69 0x62 0x67 0x63 0x63 0x5f 0x73 libgcc_s
0x0000000000a63acc: 0x2e 0x73 0x6f 0x2e 0x31 0x0 0x47 0x43 .so.1GC
// now IRETQing back into the thread that caused the page fault
stringLength first time: 3 // First string length wrong
stringLength second time: 7 // Second time, it's correct
0x0000000000a63abc: 0x6c 0x69 0x62 0x63 0x2e 0x73 0x6f 0x0 libc.so // dumping the memory again in the faulted task
0x0000000000a63ac4: 0x6c 0x69 0x62 0x67 0x63 0x63 0x5f 0x73 libgcc_s
0x0000000000a63acc: 0x2e 0x73 0x6f 0x2e 0x31 0x0 0x47 0x43 .so.1GC
Code: Select all
ffffffff80221976 <_Z12stringLengthPKc>:
ffffffff80221976: 55 push %rbp
ffffffff80221977: 48 89 e5 mov %rsp,%rbp
ffffffff8022197a: 48 89 7d e8 mov %rdi,-0x18(%rbp)
ffffffff8022197e: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
ffffffff80221985: eb 04 jmp ffffffff8022198b <_Z12stringLengthPKc+0x15>
ffffffff80221987: 83 45 fc 01 addl $0x1,-0x4(%rbp)
ffffffff8022198b: 48 8b 45 e8 mov -0x18(%rbp),%rax
ffffffff8022198f: 48 8d 50 01 lea 0x1(%rax),%rdx
ffffffff80221993: 48 89 55 e8 mov %rdx,-0x18(%rbp)
ffffffff80221997: 0f b6 00 movzbl (%rax),%eax
ffffffff8022199a: 84 c0 test %al,%al
ffffffff8022199c: 0f 95 c0 setne %al
ffffffff8022199f: 84 c0 test %al,%al
ffffffff802219a1: 75 e4 jne ffffffff80221987 <_Z12stringLengthPKc+0x11>
ffffffff802219a3: 8b 45 fc mov -0x4(%rbp),%eax
ffffffff802219a6: 5d pop %rbp
ffffffff802219a7: c3 ret
It's quite strange to me. Could it be that the movzbl instruction is partially completed - possible source pointer already incremented or something like this? Or is it rather due to register corruption at some point?
Yes I‘m doing invlpg and I also tried a memory barrier, but all to no avail…