COMDAT sections (or equivalent for ELF...)

Programming, for all ages and all languages.
Post Reply
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

COMDAT sections (or equivalent for ELF...)

Post by jnc100 »

I am in the process of developing an ahead-of-time compiler for the CLR. I am running into an issue with how to deal with generics (essentially the CLR version of C++ templates). Whilst I can compile them (methods and vtables/typeinfos) adjusting them for the relevant generic parameter, my problem comes with where to emit them in the output file (I currently use ELF32/64 as output file formats).

Take the following example: a generic type MyType<T> is defined in assembly A. It has a single method foo(). MyType is instantiated in A as MyType<int> and in assembly B also as MyType<int>. Thus the object files produced for both A and B both contain the method MyType<int>.foo(). Unfortunately, when I then come to link them together I get linker errors as the same label is used in both object files. I could get around this by prefixing each label with the object file it is instantiated in (e.g. A_MyType<int>.foo() ) but then I still get an issue with typeinfo structures as my type comparison algorithm (e.g. for casting or interface methods) relies on a direct comparison on the addresses of typeinfo objects (e.g. typeof(MyType<int>) called from A would not equal typeof(MyType<int>) called from B).

I understand this is accomplished in Microsoft's compiler suite for C++ templates by the use of COMDAT sections where the template method/typeinfo instantiation is emitted in a special COMDAT section, where similar ones are then merged together at link time. I further understand from a quick web search that this mechanism has been extended to ELF in gcc/llvm (as there are various patch proposals etc) but I cannot find anything particularly documenting the actual standards used here if any. In particular, it is not mentioned in the ELF docs (elf32/elf64/i386 processor supplement/amd64 processor supplement) but is mentioned in a HP-UX supplement which uses the special section type SHT_HP_COMDAT.

Does anyone have any further information on the implementation of COMDAT sections in ELF? Is the HP-UX way the defacto standard?

Many thanks in advance.

Regards,
John.
Icee
Member
Member
Posts: 100
Joined: Wed Jan 08, 2014 8:41 am
Location: Moscow, Russia

Re: COMDAT sections (or equivalent for ELF...)

Post by Icee »

GCC creates a separate .text.name section for each instantiated function and defines a weak symbol within, like in this example below.

Code: Select all

[06:30] icee@earth ~ $ objdump -x 2.o

2.o:     file format elf32-i386
2.o
architecture: i386, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .group        00000008  00000000  00000000  00000034  2**2
                  CONTENTS, READONLY, EXCLUDE, GROUP, LINK_ONCE_DISCARD
  1 .text         00000014  00000000  00000000  0000003c  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 .data         00000000  00000000  00000000  00000050  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000000  00000000  00000000  00000050  2**2
                  ALLOC
  4 .text._Z3fooIiEiT_ 0000000c  00000000  00000000  00000050  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  5 .comment      00000026  00000000  00000000  0000005c  2**0
                  CONTENTS, READONLY
  6 .note.GNU-stack 00000000  00000000  00000000  00000082  2**0
                  CONTENTS, READONLY
  7 .eh_frame     00000054  00000000  00000000  00000084  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
SYMBOL TABLE:
00000000 l    df *ABS*  00000000 2.cpp
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l    d  .text._Z3fooIiEiT_     00000000 .text._Z3fooIiEiT_
00000000 l    d  .note.GNU-stack        00000000 .note.GNU-stack
00000000 l    d  .eh_frame      00000000 .eh_frame
00000000 l    d  .comment       00000000 .comment
00000000 l    d  .group 00000000 .group
00000000 g     F .text  00000014 _Z5func1v
00000000         *UND*  00000000 __gxx_personality_v0
00000000  w    F .text._Z3fooIiEiT_     0000000c _Z3fooIiEiT_


RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
0000000e R_386_PC32        _Z3fooIiEiT_


RELOCATION RECORDS FOR [.eh_frame]:
OFFSET   TYPE              VALUE
00000012 R_386_32          __gxx_personality_v0
00000024 R_386_PC32        .text
00000040 R_386_PC32        .text._Z3fooIiEiT_
jnc100
Member
Member
Posts: 775
Joined: Mon Apr 09, 2007 12:10 pm
Location: London, UK
Contact:

Re: COMDAT sections (or equivalent for ELF...)

Post by jnc100 »

Thanks Icee, that got me on the right track.

It seems the .group sections are also important. Particularly, the .text.mangled_name sections define the SHF_GROUP flag (0x200), and they are referenced by the .group sections. The .group section has a 'tag', a separate flags field (different from the section flags - it is defined as the first 32-bit value in the actual data of the section) and then a list of section indices which follow the flags field in the section data. If the group flags contains the flag GRP_COMDAT (0x1) then each group (as identified by its tag) is only included once in the output file. Typically, the GNU tools assign a single function or type_info structure to each group and the tag of the group is set to the mangled name of that function/type. This leads to a large number of sections within each object file, but then these are all combined together in the resultant executable.

Regards,
John.
Post Reply