Set Location Counter to a new value in ld linker script

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Set Location Counter to a new value in ld linker script

Post by haolee »

Hi,

I'm confused about the location counter, especially when setting it to a new value in the output section.

I write a simple program and a linker script to make my question more clear. The code is as follows:

Code: Select all

-----test.s---------
section .text
        .globl _start
_start:
        movq $1, %rax
        movq $0, %rbx
        int $0x80

------test.lds------
SECTIONS
{
        . = 0x10;
        label_1 = .;
        custom_section : {
                . = 0x20;
                label_2 = . ;
                label_3 = ABSOLUTE(.) ;
                *(.text) ;
        }
}
After linking, use nm command to print symbol addresses:

Code: Select all

0000000000000010 T label_1
0000000000000030 T label_2
0000000000000030 A label_3
0000000000000030 T _start
I can't understand why lable_2 is 0x30. As the LD documentation says:
if '.' is used inside a section description, it refers to the byte offset from the start of that section, not an absolute address.
In the custom_section, '.' is set to 0x20 which is a relative offset, so I think label_2 should also be 0x20. The value of label_3 is reasonable because it's an absolute address.

Could someone please explain why label_2 is 0x30? Thanks!
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Set Location Counter to a new value in ld linker script

Post by simeonz »

HaoLee wrote:In the custom_section, '.' is set to 0x20 which is a relative offset, so I think label_2 should also be 0x20.
In ET_EXEC (executable elfs) and ET_DYN (dynamic shared objects - DSOs), the link time symbol value does not reflect whether the address is absolute or relative. Both values are the virtual address with the base address at 0, and this is what "nm" outputs. At load-time, symbols at absolute addresses (with pseudo-section SHN_ABS) are left in-place, whereas the rest of the symbols (irrespective of their section number) are relocated by the base address of the elf in the virtual address space of the process. For ET_REL (intermediate object elfs), the value field of symbols with absolute addresses is specified relative to the base address (, which is to say, the same as the address in 0-based memory layout), whereas the remaining symbols have value equal to their offset in their specified corresponding section. Thus section relative symbols are relocated by the link editor when merging sections, whereas absolute symbols are not.

Here is the relevant snippet in the elf docs:
Symbol table entries for different object file types have slightly different interpretations for the st_value member.
• In relocatable files, st_value holds alignment constraints for a symbol whose section index is SHN_COMMON.
• In relocatable files, st_value holds a section offset for a defined symbol. That is, st_value is an offset from the beginning of the section that st_shndx identifies.
• In executable and shared object files, st_value holds a virtual address. To make these files' symbols more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation) for which the section number is irrelevant.
The quote you made from the ld manual is correct, albeit a bit misleading. The '.' is indeed offset inside sections, but so is pretty much everything else. The value of '.' is section relative, number assignments to '.' are section relative, assignments to symbols are also section relative. In effect - unless you are dealing with numeral literal, there is not much effect from that rule. The relevant documentation for this aspect is here.
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Re: Set Location Counter to a new value in ld linker script

Post by haolee »

simeonz wrote:In ET_EXEC (executable elfs) and ET_DYN (dynamic shared objects - DSOs), the link time symbol value does not reflect whether the address is absolute or relative. Both values are the virtual address with the base address at 0, and this is what "nm" outputs.
If I understand you correctly, label_2 is indeed 0x20 during the linking process. But nm command always gives me the virtual address of label_2 which is 0x30, so I won't get the link-time value of label_2.
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Set Location Counter to a new value in ld linker script

Post by iansjack »

If it is relative to the start of the section, and the section starts at 0x10, then surely its final value is 0x30 (as you observe)?
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Re: Set Location Counter to a new value in ld linker script

Post by haolee »

iansjack wrote:If it is relative to the start of the section, and the section starts at 0x10, then surely its final value is 0x30 (as you observe)?
Yes, it's final value is 0x30. But the final value is a virtual address and nm can only give us the virtual address value, not the relative offset from the start of custom_section. I think in the linking process, label_2 is 0x20 which is a relative offset from the start of the custom_section. After the object file is linked, this relative offset value will be lost permanently and nm can only give us a value relative to the virtual address 0.
Last edited by haolee on Fri Apr 06, 2018 2:01 am, edited 1 time in total.
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Set Location Counter to a new value in ld linker script

Post by iansjack »

HaoLee wrote:Yes, it's final value is 0x30. But the final value is a virtual address and nm can only give us the virtual address value, not the relative offset from the start of custom_section.
Which seems to be exactly what you say happens?
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Re: Set Location Counter to a new value in ld linker script

Post by haolee »

iansjack wrote:
HaoLee wrote:Yes, it's final value is 0x30. But the final value is a virtual address and nm can only give us the virtual address value, not the relative offset from the start of custom_section.
Which seems to be exactly what you say happens?
I wrongly thought the nm command should print 0x20 for label_2 because it's an offset relative to the start of the custom_section. Now I know nm can only print the virtual address 0x30 and it can't print the relative offset 0x20.
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Set Location Counter to a new value in ld linker script

Post by simeonz »

Nm and objdump apparently output the calculated value, whereas readelf outputs the raw value.

But either way, different rules apply to the intermediate elfs and the final elfs. That is important to emphasize again. In the intermediate files, the values are offsets from the sections and in the executables, they are 0-based. The link editor is responsible for recomputing them. The values change, but the metadata that marks them as section relative or absolute is not changed. The interpretation of the value "type" is automatically different in executable elfs.

Regarding the script rules, I want to show you a modified example and the readelf results, separately for relocatable and non-relocatable files. I think you may get a better picture this way. The rules appear to be rather complicated and actually surprised me. One exceptional corner case was particularly awkward (assignment of previously defined absolute symbol to a section symbol).

Edit: Had to fix truncation of the symbol names in the readelf output.

My linker script:

Code: Select all

SECTIONS
{
        A_outside_current = .;
        . = 0x111000;
        B_outside_current = .;
        C_outside_direct = 0x111000;
        D_outside_current_doubled = . * 2;
        . = 0x111100;
        custom_section : {
                . = 0x11;
                E_inside_current = .;
                F_inside_direct = 0x11;
                G_inside_current_doubled = . * 2;
                H_inside_absolute_current = ABSOLUTE(.);
                I_inside_absolute_current_doubled = ABSOLUTE(.) * 2;
                J_inside_absolute_E = ABSOLUTE(E_inside_current);
                K_inside_absolute_H = ABSOLUTE(H_inside_absolute_current);
                L_inside_H = H_inside_absolute_current;
                *(.text);
        }
}
Linking a relocatable file:

Code: Select all

ld -r test.o -T test.ld -o test
Corresponding readelf output:

Code: Select all

ffffffffffeeef00        1       A_outside_current
ffffffffffffff00        1       B_outside_current
0000000000111000        ABS     C_outside_direct
0000000000222000        ABS     D_outside_current_doubled
0000000000000011        1       E_inside_current
0000000000000011        1       F_inside_direct
0000000000000022        1       G_inside_current_doubled
0000000000111111        ABS     H_inside_absolute_current
0000000000222222        ABS     I_inside_absolute_current_doubled
0000000000111111        ABS     J_inside_absolute_E
0000000000111111        ABS     K_inside_absolute_H
0000000000111111        1       L_inside_H
Linking to a non-relocatable file:

Code: Select all

ld test.o -T test.ld -o test
Corresponding readelf output:

Code: Select all

0000000000000000        1       A_outside_current
0000000000111000        1       B_outside_current
0000000000111000        ABS     C_outside_direct
0000000000222000        ABS     D_outside_current_doubled
0000000000111111        1       E_inside_current
0000000000111111        1       F_inside_direct
0000000000111122        1       G_inside_current_doubled
0000000000111111        ABS     H_inside_absolute_current
0000000000222222        ABS     I_inside_absolute_current_doubled
0000000000111111        ABS     J_inside_absolute_E
0000000000111111        ABS     K_inside_absolute_H
0000000000222211        1       L_inside_H
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Set Location Counter to a new value in ld linker script

Post by simeonz »

Hm. Forgot to use "--wide" for the readelf output. It is fixed now.
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Re: Set Location Counter to a new value in ld linker script

Post by haolee »

simeonz wrote: Linking a relocatable file:

Code: Select all

ld -r test.o -T test.ld -o test
Corresponding readelf output:

Code: Select all

ffffffffffeeef00        1       A_outside_current
ffffffffffffff00        1       B_outside_current
0000000000111000        ABS     C_outside_direct
0000000000222000        ABS     D_outside_current_doubled
0000000000000011        1       E_inside_current
0000000000000011        1       F_inside_direct
0000000000000022        1       G_inside_current_doubled
0000000000111111        ABS     H_inside_absolute_current
0000000000222222        ABS     I_inside_absolute_current_doubled
0000000000111111        ABS     J_inside_absolute_E
0000000000111111        ABS     K_inside_absolute_H
0000000000111111        1       L_inside_H
Many thanks for your examples! It's indeed complicated. Understanding these things need a lot of knowledge about ELF and the first example with the option "-r" is beyond my scope of understanding, to be honest. It seems `A_outside_current` is affected by the second assignment statement `. = 0x111100;`. If I change the value of the second assignment statement, the value of `A_outside_current` will be changed correspondingly.
User avatar
zaval
Member
Member
Posts: 659
Joined: Fri Feb 17, 2017 4:01 pm
Location: Ukraine, Bachmut
Contact:

Re: Set Location Counter to a new value in ld linker script

Post by zaval »

So the problem is for relocatable files' DOT assignment outside of an output section.
After some playing with this messiest thing ever devised, I figured out this. Look at the comments.
ENTRY( CompareGuid )

SECTIONS
{
wtf0 = .; /* here DOT = 0 - b16e00b0 */

. = 0xb16b00b0;

wtf1 = .; /* here DOT = b16b00b0 - b16e00b0 */

. = 0xb16e00b0;

wtf2 = .; /* here DOT = b16e00b0 - b16e00b0 */
custom_section : {
. = 0x20;
label_2 = . ;
label_3 = ABSOLUTE(.) ;
*(.text) ;
}
}
These are values of readelf for relocatable file.
wtf0 4e91ff50
wtf1 fffd0000
wtf2 00000000

as a conclusion, in relocatable files, DOT is assigned a difference between the CURRENT value of DOT (as it appeared in the non-relocatable case) and the FINAL value, that goes from the LAST assignment.


Note, they claim, that DOT is just a byte offset from the current "object". Namely, - either from SECTIONS statement, or, if it is inside an output section, - from that section. In the former case, it's not described how this is related to the memory layout.
Elf doesn't have a notion of Base Address. But they kind of state, SECTIONS is at 0, so, DOT assignments inside SECTIONS but outside of an output section are absolute adresses. And this is the case in non-relocatable files. In the relocatable files, see the paragraph above.
ANT - NT-like OS for x64 and arm64.
efify - UEFI for a couple of boards (mips and arm). suspended due to lost of all the target park boards (russians destroyed our town).
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Set Location Counter to a new value in ld linker script

Post by simeonz »

In the end, binutils have work to do on their documentation.
zaval wrote:as a conclusion, in relocatable files, DOT is assigned a difference between the CURRENT value of DOT (as it appeared in the non-relocatable case) and the FINAL value, that goes from the LAST assignment.
This applies for all files. But due to the elf specification caveats, non-relocatable files contain only absolute address-like symbol values, despite the fact that some symbols are still tagged as section relative. In any case, as you note, HaoLee observes negative offset from the beginning of the first section. The negative offset correspondingly changes if the section is moved around.
zaval wrote:Note, they claim, that DOT is just a byte offset from the current "object". Namely, - either from SECTIONS statement, or, if it is inside an output section, - from that section. In the former case, it's not described how this is related to the memory layout.
The documentation statement is misleading and the question HaoLee raised was justified. Apparently, the sentence only applies to assignments.

I am going to cite the documentation and provide my interpretation, omitting the SANE_EXPR behavior:
. always refers to a location in an output section
From which we can conclude that the location counter is a relative address. There is just no other possible interpretation. And it is consistent with the experiments I've done.
. actually refers to the byte offset from the start of the current containing object. Normally this is the SECTIONS statement, whose start address is 0, hence . can be used as an absolute address. If . is used inside a section description however, it refers to the byte offset from the start of that section, not an absolute address.
When assigning to the location counter, inside section definitions, numbers are treated as offsets, and outside, as absolute addresses.
all assignments or other statements belong to the previous output section, except for the special case of an assignment to .
the linker assumes that an assignment to . is setting the start address of a following output section and thus should be grouped with that section
This tells us that assignments to the location counter mark the start of a scope in which its value is relative to the next section.
Expressions appearing outside an output section definition treat all numbers as absolute addresses. Expressions appearing inside an output section definition treat absolute symbols as numbers.
The above sentences are important in regard to the way in which the rules apply to numbers and absolute addresses. In effect, absolute addresses are not treated as such when appearing inside a section definition. Similarly, numbers are not treated as numbers when appearing outside section definitions.
Unary operations on an absolute address or number, and binary operations on two absolute addresses or two numbers, or between one absolute address and a number, apply the operator to the value(s).
Unary operations on a relative address, and binary operations on two relative addresses in the same section or between one relative address and a number, apply the operator to the offset part of the address(es).
Other binary operations, that is, between two relative addresses not in the same section, or between a relative address and an absolute address, first convert any non-absolute term to an absolute address before applying the operator.
An operation involving only numbers results in a number.
The result of comparisons, ‘&&’ and ‘||’ is also a number.
The result of other binary arithmetic and logical operations on two relative addresses in the same section or two absolute addresses (after above conversions) is also a number when ... inside an output section definition but an absolute address otherwise.
The result of other operations on relative addresses or one relative address and a number, is a relative address in the same section as the relative operand(s).
The result of other operations on absolute addresses (after above conversions) is an absolute address.
I should add that the location counter at the start of a script is relative to the first section, and in the end of the script is relative to the last section, assuming that orphan sections are discarded.

Here is a commented example that extends the one from my previous post. The comments are based on the rules I cited from the documentation:

Code: Select all

SECTIONS
{
        /* Relative address by itself results in relative address.
           '.' at this point holds a negative offset to the start of custom_section,
           which is yet to be determined by a future assignment to '.'
           The absolute address would be 0, although if this is not the last
           linker invocation, future invocations move this symbol along with the section. */
        A_outside_current = .;
        . = 0x111000;
        /* Relative address by itself results in relative address. Still negative offset. */
        B_outside_current = .;
        /* Numbers treated as absolute addresses outside of sections.
           Absolute address by itself results in absolute address.
           This changes if LD_FEATURE ("SANE_EXPR") is requested at the start of the script. */
        C_outside_direct = 0x111000;
        /* Numbers treated as absolute addresses outside of sections.
           Arithmetic between absolute address and relative address, converts the
           relative address to absolute one and produces absolute address as well. */
        D_outside_current_doubled = . * 2;
        . = 0x111100;
        custom_section : {
                . = 0x11;
                /* Relative address by itself results in relative address. 0 offset. */
                E_inside_current = .;
                F_inside_direct = 0x11;
                /* Numbers treated as numbers inside of sections.
                   Arithmetic between number and relative address uses the offset of the
                   relative address and produces relative address. */
                G_inside_current_doubled = . * 2;
                /* The following are fairly clear, since they request conversion of the
                   address type explicitly. */
                H_inside_absolute_current = ABSOLUTE(.);
                I_inside_absolute_current_doubled = ABSOLUTE(.) * 2;
                J_inside_absolute_E = ABSOLUTE(E_inside_current);
                K_inside_absolute_H = ABSOLUTE(H_inside_absolute_current);
                /* Absolute addresses are treated as numbers within sections.
                   Numbers produce section relative symbols in assignments. */
                L_inside_H = H_inside_absolute_current;
                *(.text);
                . = 0x1000;
        }
        /* Arithmetic between relative addresses in the same section
           produces absolute address, but does not convert the operands.
           That is, it operates on the offset parts of the addresses. */
        M_outside_current_squared = . * .;
        /* Relative address in the same section produces relative address. */
        N_outside_E = E_inside_current;
        /* Absolute address produces absolute address as expected. */
        O_outside_H = H_inside_absolute_current;
        . = 0x1111100;
        /* Relative address to the following section. 0 offset. */
        P_in_another_section = .;
        another_section : {
                /* This is a surprise, although it is by the rules.
                   Mixing relative addresses from different sections converts them to their
                   absolute address value first, but produces a relative address in the end. */
                Q_mixed_F_P = F_inside_direct * P_in_another_section;
                LONG(0)
        }
        . = 0x1111111;
        /* We are still in the previous section right now.
           Although if orphan sections are not discarded,
           we would be in one of those orphan sections instead. */
        R_in_same_section = .;
        /* Relative address in another section by itself is still
           a relative address in that section, no matter where we use it. */
        S_in_first_section = E_inside_current;
        /DISCARD/ : { *(*); }
}
The results for relocatable output.

Code: Select all

ffffffffffeeef00        1       A_outside_current
ffffffffffffff00        1       B_outside_current
0000000000111000        ABS     C_outside_direct
0000000000222000        ABS     D_outside_current_doubled
0000000000000011        1       E_inside_current
0000000000000011        1       F_inside_direct
0000000000000022        1       G_inside_current_doubled
0000000000111111        ABS     H_inside_absolute_current
0000000000222222        ABS     I_inside_absolute_current_doubled
0000000000111111        ABS     J_inside_absolute_E
0000000000111111        ABS     K_inside_absolute_H
0000000000111111        1       L_inside_H
0000000001000000        ABS     M_outside_current_squared
0000000000000011        1       N_outside_E
0000000000111111        ABS     O_outside_H
0000000000000000        2       P_in_another_section
0000123455432100        2       Q_mixed_F_P
0000000000000011        2       R_in_same_section
0000000000000011        1       S_in_first_section
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Re: Set Location Counter to a new value in ld linker script

Post by haolee »

simeonz wrote:
zaval wrote:as a conclusion, in relocatable files, DOT is assigned a difference between the CURRENT value of DOT (as it appeared in the non-relocatable case) and the FINAL value, that goes from the LAST assignment.
Hi, thanks for this long explanation. I have read it carefully. But I have a new idea and I think we all may be deceived by readelf. What it gives us is also the final value, we still don't know the actual value of the linking process. Only ld linker knows the intermediate value. So we need a way to dump these value when the linker is linking the object file and that is what we want. After reading "3.4.5 Other Linker Script Commands" in ld docs, I find the ASSERT command can be used to verify the intermediate value during the linking process. This is an example:

Code: Select all

SECTIONS
{
        A_outside_current = .;
        ASSERT(A_outside_current == 0, "not equal")
        . = 0x111000;
        B_outside_current = .;
        ASSERT(B_outside_current == 0x111000, "not equal")
        C_outside_direct = 0x111000;
        ASSERT(C_outside_direct == 0x111000, "not equal")
        D_outside_current_doubled = . * 2;
        ASSERT(D_outside_current_doubled == 0x222000, "not equal")
        . = 0x111100;
        ASSERT(. == 0x111100, "not equal")
        custom_section : {
                . = 0x11;
                ASSERT(. == 0x11, "not equal");
                E_inside_current = .;
                ASSERT(E_inside_current == 0x11, "not equal");
                F_inside_direct = 0x11;
                G_inside_current_doubled = . * 2;
                H_inside_absolute_current = ABSOLUTE(.);
                ASSERT(H_inside_absolute_current  == 0x111111, "not equal");
                I_inside_absolute_current_doubled = ABSOLUTE(.) * 2;
                J_inside_absolute_E = ABSOLUTE(E_inside_current);
                K_inside_absolute_H = ABSOLUTE(H_inside_absolute_current);
                L_inside_H = H_inside_absolute_current;
                ASSERT(L_inside_H == 0x111111, "not equal");
                *(.text);
        }
}
After

Code: Select all

ld -r test.o -o test -T test.lds
, all assertions pass successfully. This is interesting, but I need some time to make a reasonable explanation.
simeonz
Member
Member
Posts: 360
Joined: Fri Aug 19, 2016 10:28 pm

Re: Set Location Counter to a new value in ld linker script

Post by simeonz »

haolee wrote:Hi, thanks for this long explanation. I have read it carefully. But I have a new idea and I think we all may be deceived by readelf. What it gives us is also the final value, we still don't know the actual value of the linking process. Only ld linker knows the intermediate value. So we need a way to dump these value when the linker is linking the object file and that is what we want. After reading "3.4.5 Other Linker Script Commands" in ld docs, I find the ASSERT command can be used to verify the intermediate value during the linking process.
The assert commands perform implicit conversions, and use not just the value of the symbol, but the converted value in the context of comparison expression between relative or absolute address and a number, treated as an absolute address outside of section definitions and as a number proper inside. Knowing this and the conversion rules explains the assertion results.

Using LD_FEATURE ("SANE_EXPR") makes numbers stay numbers everywhere, which eliminates the assertion conversions in this case. The option changes the values of some symbol assignments as well, be it in a desirable way or not. As far as I can see, it cannot be enabled and disabled locally (say, around the assertions only) but you can try it just to see how it makes the results more consistent with the output of readelf.
User avatar
haolee
Posts: 22
Joined: Thu Jul 13, 2017 9:52 am

Re: Set Location Counter to a new value in ld linker script

Post by haolee »

simeonz wrote: Using LD_FEATURE ("SANE_EXPR") makes numbers stay numbers everywhere, which eliminates the assertion conversions in this case. The option changes the values of some symbol assignments as well, be it in a desirable way or not. As far as I can see, it cannot be enabled and disabled locally (say, around the assertions only) but you can try it just to see how it makes the results more consistent with the output of readelf.
Many thanks. I must admit that this topic is very difficult. I may need to learn some ELF basics to understand linker process better. I think these posts are enough for me to digest for a long time :? . Again, many thanks for your help!
Post Reply