Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Possibly -flto and then use the gold linker. There's also the option of -fwhole-program, though you'd have to compile differently. Lastly there's always the option of switching to an clang + LLVM based build system and using their intermediate files to perform whole program optimization (a bit more complicated but doable).
That seems like a bit of a hack... I'm surprised there isn't any way to get GCC to do this properly. All it needs to do is emit a R_X86_64_PC32 relocation for an external symbol. I might go bug the GCC people directly and see if I can't get an answer.
For now, I can deal with not being able to directly reference global variables in other translation units (doing so is so-so design-wise anyway, and I can deal with a restricted environment for this component) but it would be nice to fix this eventually.