Page 1 of 1

LD writes different LMA/VMA

Posted: Sat Dec 16, 2017 10:16 am
by henje
While working on my build system I encountered a problem when using ld. I used the following simple linker script which only specifies a virtual address, but when i use readelf virtual and load address for .rodata are different.

Code: Select all

OUTPUT_FORMAT("elf32-i386")

ENTRY(_start)

SECTIONS
{
	. = 0x100000;

	.text : {
		*(multiboot)
		*(.text)
	}
	.data ALIGN(4096) : {
		*(.data)
	}
	.rodata ALIGN(4096) : {
		*(.rodata)
	}
	.bss ALIGN(4096) : {
		*(.bss)
	}
}

Code: Select all

Elf file type is EXEC (Executable file)
Entry point 0x100090
There are 4 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x001000 0x00100000 0x00100000 0x0009e 0x0009e R E 0x1000
  LOAD           0x00109e 0x0010009e 0x00102ebe 0x0000d 0x0000d R   0x1000
  LOAD           0x002000 0x00101000 0x00103e20 0x00000 0x02000 RW  0x1000
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
The code I am using is just a minimal "Hello World!" kernel. When linking with gold or lld virtual and load address of .rodata is the same. I am not sure if my linker script is at fault, the ld invocation or ld is just bugging around. But I would assume it is my script.

I invoke ld like this:

Code: Select all

ld -T linker.ld *.o -o kernel -melf_i386
(I do not know why ld is the only linker to ignore OUTPUT_FORMAT)

Thanks for any help.

Re: LD writes different LMA/VMA

Posted: Sat Dec 16, 2017 12:00 pm
by MichaelPetch
It could be because you are not using a cross compiler. Maybe it is related position independent code. I'd recommend building an i686 cross compiler and using that to see if it changes. It can be dependent on the distro you are using (different default compiler options etc). What command line paramaters do you use to compile the source code?

Does that layout cause problems for your code? If you posted your project to github (or similar) and told us what distro/OS you are using to build on we might be able to say.

Re: LD writes different LMA/VMA

Posted: Sat Dec 16, 2017 2:20 pm
by henje
I uploaded the relevant parts of my project to https://github.com/Henje/LD-issue-minimal-example. I use clang as a cross-compiler, but I do not see how that is relevant to linking. Moreover, when using gold and lld, LMA and VMA are equal and my code works.

The ld I am using is the standard ld on my Ubuntu 17.10. It says about itself:

Code: Select all

GNU ld (GNU Binutils for Ubuntu) 2.29.1
  Supported emulations:
   elf_x86_64
   elf32_x86_64
   elf_i386
   elf_iamcu
   i386linux
   elf_l1om
   elf_k1om
   i386pep
   i386pe

Re: LD writes different LMA/VMA

Posted: Sat Dec 16, 2017 2:24 pm
by MichaelPetch
The relevance of the compiler to linking is that that the compiler can emit information into the object files that can alter how things are placed in memory (things like alignment etc) by the linker. You'll also get differing results if your compiler happens to default to Position Independent code vs the default in other distros and cross compilers where code isn't position independent. (Ubuntu made this type of change around 16.04). Using a host compiler can make a difference in the output you see. Using a cross compiler can give you more consistent results for your builds in general.

Edit: It appears you are using clang (and not gcc). clang will cross compile. Problem is your original question didn't say what tools you were using and I assumed gcc incorrectly.

Re: LD writes different LMA/VMA

Posted: Sat Dec 16, 2017 6:09 pm
by henje
It is not like I could not use the other linkers, I am just curious as to why there is a difference in the first place. I see your point with the position independent code but the manual of ld does not even feature the term. From the linkers perspective only sections and symbols are of interest. From a compiler's view, PIC just disallows absolute jumps and the like. At the time of linkage those are all generated. Then again, I am no expert at PIC so I might as well be wrong.

I tried linking with --nmagic, but the output did not change much. If it helps I attached the output of objdump -x.

Code: Select all

kernel4:     file format elf32-i386
kernel4
architecture: i386, flags 0x00000012:
EXEC_P, HAS_SYMS
start address 0x00100090

Program Header:
    LOAD off    0x000000a0 vaddr 0x00100000 paddr 0x00100000 align 2**4
         filesz 0x0000009e memsz 0x0000009e flags r-x
    LOAD off    0x0000013e vaddr 0x0010009e paddr 0x00102ebe align 2**0
         filesz 0x0000000d memsz 0x00002f62 flags rw-
   STACK off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
         filesz 0x00000000 memsz 0x00000000 flags rwx

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000009e  00100000  00100000  000000a0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata.str1.1 0000000d  0010009e  00102ebe  0000013e  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .bss          00002000  00101000  00103e20  0000014b  2**0
                  ALLOC
  3 .comment      0000002d  00000000  00000000  0000014b  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
00100000 l    d  .text	00000000 .text
0010009e l    d  .rodata.str1.1	00000000 .rodata.str1.1
00101000 l    d  .bss	00000000 .bss
00000000 l    d  .comment	00000000 .comment
00000000 l    df *ABS*	00000000 start.o
0010009a l       .text	00000000 _stop
00103000 l       .bss	00000000 kernel_stack
00000000 l    df *ABS*	00000000 main.cpp
00000000 l    df *ABS*	00000000 
001000a0 l     O .rodata.str1.1	00000000 _GLOBAL_OFFSET_TABLE_
00100090 g       .text	00000000 _start
00100070 g     F .text	0000001f init
00100010 g     F .text	0000005a _Z5printPKc
What boggles my mind is the load address which the linker calculates. I can see no relation to any code.

Re: LD writes different LMA/VMA

Posted: Sat Dec 16, 2017 10:47 pm
by MichaelPetch
Appears that clang maintains .rodata sections that may have trailing characters on the name. In the linker script you should use *(.rodata*) instead. With LD linker you should consider aligning the Load Memory Address (to the right of a colon on a section definition) to 4K if you want the LMA and VMA to match up. If you set the VMA (value to the left of the colon in the section definition), the LMA remains untouched. If you set both LMA and VMA in a section definition they are set separately. In your case you want to modify your linker.ld to look like:

Code: Select all

OUTPUT_FORMAT("elf32-i386")

ENTRY(_start)
SECTIONS
{
        . = 0x100000;
        .text : ALIGN(4096) {
                *(multiboot)
                *(.text)
        }
        .data : ALIGN(4096) {
                *(.data)
        }
        .rodata : ALIGN (4096) {
                *(.rodata*)
        }
        .bss : ALIGN (4096) {
                *(.bss)
        }
}
The linkers may create the PHDRS differently, so to see the individual sections using LD you may want to use objdump -x kernel to view the full headers. The output is more readable than readelf IMHO Modify your linker line to add -nostartfiles and -nostdlib. We don't have C runtime initialization nor do we have standard library support. The command could look like this:

Code: Select all

ld -Tlinker.ld -nostartfiles -nostdlib *.o -o kernel -melf_i386
Be aware that if you are going to use C++ you will need to enhance the linkers script to deal with static construct and destructors. Your assembly code would have to loop through that data and call all the static constructiors. If you ever put class objects at global scope for example, to have them initialized these constructors have to be called. Normally the startupfiles do that, but since we are in a freestanding environment it is up to us to do that ourselves. I believe there is a forum post or OSDev wiki discussing this.

Re: LD writes different LMA/VMA

Posted: Tue Dec 19, 2017 9:46 am
by henje
Thanks for your help, setting the LMA instead of VMA resulted in the right result for all linkers I tested with. As for the explanation, I am not so sure because the LD manual states "LMA is set so the difference between the VMA and LMA is the same as the difference between the VMA and LMA of the last section" (from here). There are other options but they boil down to "LMA is set to its VMA". This behaviour is especially weird, because I tested a different LD (2.23.1) which had no problem.

Also good catch with the .rodata regex, it kind of worked, but was not what I intended.

As for the C++ part, thanks for the heads up, but in my real project I got that handled. This was just a test for a different build system and had therefore no ctor, dtor stuff.