Page 1 of 1

__cxa_atexit and other GCC stuff

Posted: Wed Apr 29, 2020 7:18 am
by 8infy
Hey everyone, I've recently started to learn about how GCC implements C++ as it was something good (necessary?) to know to make it work properly on my OS.

I've read a few very good articles about how it works, and I sort of understand it now, however i'm still a bit confused about what parts of the runtime are provided by GCC,
and what parts are OS/someone else.

AFAIK GCC uses itanium ABI for C++ so it expects __cxa_atexit and other functions to be available and calls them. But where does _init and _fini and other functions come from? Who implements frame_dummy, who implements __libc_start_main? Is is something GCC has hardcoded as a function name and expects a definition somewhere? I'd very much appreciate if someone could explain me the dependency structure of GCC.

I've used a tutorial from this website to implement ctri and crtn for global constructors via _init and _fini, but global destructors were not getting called until I passed -fno-use-cxa-atexit as a parameter to the compiler.
Why is it calling 0 and what is it even doing? (the only reason why it says __cxa_ateixt there is because I didn't put it in a section so it ended up at the top of the executable, and the starting address is 0x20000 so it makes 0 sense for it to call NULL.)
Also I know that it doesn't actually get to that call 0 instruction because in both cases the kernel successfully returns from the function.
Why did passing -fno-use-cxa-atexit make my global destructors work correctly? (my implementation of __cxa_atexit is just ret so I know this wouldn't work).
(FYI: I know that kernel doesn't need destructors, that's not the point of this question, i'm simply trying to get to the bottom of this.)

On the left of this screenshot is the -fno-use-cxa-atexit variant of my kernel and on the right is the default.
Image
In the cxa-at-exit version, the global class constructor does `call 20000 <__cxa_atexit>`, which makes sense, however in the no-cxa version it doesn't, so how does do_global_dtors know to call the destructor?

I know this is a lot of questions but this feels like some secret ancient knowledge that like 10 people in the world know :(

Thanks.

Re: __cxa_atexit and other GCC stuff

Posted: Wed Apr 29, 2020 1:42 pm
by nullplan
Whenever you wonder where functions are coming from, you can instruct your linker to generate a map file, and then simply look up what object files satisfied these symbols.

Calls to literally 0 look to me like those were weak references that went unsatisfied at the end. So now you have to pray that the code never gets there, or else it will crash. But you don't have to pray. See at address 0x240f9, it is setting EAX to zero, then testing to see if it was zero? And if it is zero, it jumps over the invalid call instruction. I bet you anything that 0 would be replaced by an actual address if the symbol had been there. And frame_dummy has the same code right at the start.
8infy wrote:In the cxa-at-exit version, the global class constructor does `call 20000 <__cxa_atexit>`, which makes sense, however in the no-cxa version it doesn't, so how does do_global_dtors know to call the destructor?
Ironically, you have already marked one of them in the files you listed. Look at the code following address 0x240c8. What is it doing there? It is performing some calculations with constants. If you have looked at some disassembly, you start to notice patterns, and this very clearly looks like someone is performing pointer subtraction on two symbols that were known at link time, but not at assembly time. Anyway, following that there is a loop (you have the cmp followed by jae, and if you follow where the jae goes to,directly above that you have a jb back to address e0. This is a loop rotation: The compiler is performing the loop test once above the loop to jump over it, and once at the end to jump back up again), and in that loop you have the instruction "call *(%esi, %eax, 4)". That very clearly means that ESI points to an array of function pointers and EAX is an index into that array, and here all of them are called.

To stop teasing any further, the usual way functions are located is with the fini_array mechanism: The compiler generates whatever code is needed to call those destructors, writes the pointers to those stubs into a section called ".fini_array", and the linker assembles these sections into one large array bounded by the symbols "__fini_array_start" and "__fini_array_end". Then your __do_global_dtors only has to iterate over all of these and call them all in turn. Interestingly, it looks like they are called with a forward loop, not a backward one. Interesting, because musl is doing it backwards, and Rich said something to the effect of that being ABI on the mailing list once.
8infy wrote:Why did passing -fno-use-cxa-atexit make my global destructors work correctly? (my implementation of __cxa_atexit is just ret so I know this wouldn't work).
That is now also explained by the code: With you disabling cxa_atexit, GCC was forced to emit the destructor call as part of fini_array. That's why __fini_array_end is 4 bytes greater in the no-cxa-atexit version than in the cxa-atexit version. And since that is called in the loop below that, your destructor was called.

Re: __cxa_atexit and other GCC stuff

Posted: Wed Apr 29, 2020 1:56 pm
by 8infy
nullplan wrote:Whenever you wonder where functions are coming from, you can instruct your linker to generate a map file, and then simply look up what object files satisfied these symbols.

Calls to literally 0 look to me like those were weak references that went unsatisfied at the end. So now you have to pray that the code never gets there, or else it will crash. But you don't have to pray. See at address 0x240f9, it is setting EAX to zero, then testing to see if it was zero? And if it is zero, it jumps over the invalid call instruction. I bet you anything that 0 would be replaced by an actual address if the symbol had been there. And frame_dummy has the same code right at the start.
8infy wrote:In the cxa-at-exit version, the global class constructor does `call 20000 <__cxa_atexit>`, which makes sense, however in the no-cxa version it doesn't, so how does do_global_dtors know to call the destructor?
Ironically, you have already marked one of them in the files you listed. Look at the code following address 0x240c8. What is it doing there? It is performing some calculations with constants. If you have looked at some disassembly, you start to notice patterns, and this very clearly looks like someone is performing pointer subtraction on two symbols that were known at link time, but not at assembly time. Anyway, following that there is a loop (you have the cmp followed by jae, and if you follow where the jae goes to,directly above that you have a jb back to address e0. This is a loop rotation: The compiler is performing the loop test once above the loop to jump over it, and once at the end to jump back up again), and in that loop you have the instruction "call *(%esi, %eax, 4)". That very clearly means that ESI points to an array of function pointers and EAX is an index into that array, and here all of them are called.

To stop teasing any further, the usual way functions are located is with the fini_array mechanism: The compiler generates whatever code is needed to call those destructors, writes the pointers to those stubs into a section called ".fini_array", and the linker assembles these sections into one large array bounded by the symbols "__fini_array_start" and "__fini_array_end". Then your __do_global_dtors only has to iterate over all of these and call them all in turn. Interestingly, it looks like they are called with a forward loop, not a backward one. Interesting, because musl is doing it backwards, and Rich said something to the effect of that being ABI on the mailing list once.
8infy wrote:Why did passing -fno-use-cxa-atexit make my global destructors work correctly? (my implementation of __cxa_atexit is just ret so I know this wouldn't work).
That is now also explained by the code: With you disabling cxa_atexit, GCC was forced to emit the destructor call as part of fini_array. That's why __fini_array_end is 4 bytes greater in the no-cxa-atexit version than in the cxa-atexit version. And since that is called in the loop below that, your destructor was called.
Thanks for the detailed explanation!