I am in the process of developing an ahead-of-time compiler for the CLR. I am running into an issue with how to deal with generics (essentially the CLR version of C++ templates). Whilst I can compile them (methods and vtables/typeinfos) adjusting them for the relevant generic parameter, my problem comes with where to emit them in the output file (I currently use ELF32/64 as output file formats).
Take the following example: a generic type MyType<T> is defined in assembly A. It has a single method foo(). MyType is instantiated in A as MyType<int> and in assembly B also as MyType<int>. Thus the object files produced for both A and B both contain the method MyType<int>.foo(). Unfortunately, when I then come to link them together I get linker errors as the same label is used in both object files. I could get around this by prefixing each label with the object file it is instantiated in (e.g. A_MyType<int>.foo() ) but then I still get an issue with typeinfo structures as my type comparison algorithm (e.g. for casting or interface methods) relies on a direct comparison on the addresses of typeinfo objects (e.g. typeof(MyType<int>) called from A would not equal typeof(MyType<int>) called from B).
I understand this is accomplished in Microsoft's compiler suite for C++ templates by the use of COMDAT sections where the template method/typeinfo instantiation is emitted in a special COMDAT section, where similar ones are then merged together at link time. I further understand from a quick web search that this mechanism has been extended to ELF in gcc/llvm (as there are various patch proposals etc) but I cannot find anything particularly documenting the actual standards used here if any. In particular, it is not mentioned in the ELF docs (elf32/elf64/i386 processor supplement/amd64 processor supplement) but is mentioned in a HP-UX supplement which uses the special section type SHT_HP_COMDAT.
Does anyone have any further information on the implementation of COMDAT sections in ELF? Is the HP-UX way the defacto standard?
Many thanks in advance.
Regards,
John.
COMDAT sections (or equivalent for ELF...)
Re: COMDAT sections (or equivalent for ELF...)
GCC creates a separate .text.name section for each instantiated function and defines a weak symbol within, like in this example below.
Code: Select all
[06:30] icee@earth ~ $ objdump -x 2.o
2.o: file format elf32-i386
2.o
architecture: i386, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .group 00000008 00000000 00000000 00000034 2**2
CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
1 .text 00000014 00000000 00000000 0000003c 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 .data 00000000 00000000 00000000 00000050 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 00000000 00000000 00000000 00000050 2**2
ALLOC
4 .text._Z3fooIiEiT_ 0000000c 00000000 00000000 00000050 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
5 .comment 00000026 00000000 00000000 0000005c 2**0
CONTENTS, READONLY
6 .note.GNU-stack 00000000 00000000 00000000 00000082 2**0
CONTENTS, READONLY
7 .eh_frame 00000054 00000000 00000000 00000084 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
SYMBOL TABLE:
00000000 l df *ABS* 00000000 2.cpp
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l d .text._Z3fooIiEiT_ 00000000 .text._Z3fooIiEiT_
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
00000000 l d .eh_frame 00000000 .eh_frame
00000000 l d .comment 00000000 .comment
00000000 l d .group 00000000 .group
00000000 g F .text 00000014 _Z5func1v
00000000 *UND* 00000000 __gxx_personality_v0
00000000 w F .text._Z3fooIiEiT_ 0000000c _Z3fooIiEiT_
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000e R_386_PC32 _Z3fooIiEiT_
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
00000012 R_386_32 __gxx_personality_v0
00000024 R_386_PC32 .text
00000040 R_386_PC32 .text._Z3fooIiEiT_
Re: COMDAT sections (or equivalent for ELF...)
Thanks Icee, that got me on the right track.
It seems the .group sections are also important. Particularly, the .text.mangled_name sections define the SHF_GROUP flag (0x200), and they are referenced by the .group sections. The .group section has a 'tag', a separate flags field (different from the section flags - it is defined as the first 32-bit value in the actual data of the section) and then a list of section indices which follow the flags field in the section data. If the group flags contains the flag GRP_COMDAT (0x1) then each group (as identified by its tag) is only included once in the output file. Typically, the GNU tools assign a single function or type_info structure to each group and the tag of the group is set to the mangled name of that function/type. This leads to a large number of sections within each object file, but then these are all combined together in the resultant executable.
Regards,
John.
It seems the .group sections are also important. Particularly, the .text.mangled_name sections define the SHF_GROUP flag (0x200), and they are referenced by the .group sections. The .group section has a 'tag', a separate flags field (different from the section flags - it is defined as the first 32-bit value in the actual data of the section) and then a list of section indices which follow the flags field in the section data. If the group flags contains the flag GRP_COMDAT (0x1) then each group (as identified by its tag) is only included once in the output file. Typically, the GNU tools assign a single function or type_info structure to each group and the tag of the group is set to the mangled name of that function/type. This leads to a large number of sections within each object file, but then these are all combined together in the resultant executable.
Regards,
John.