Simple question about linker script and section

nicko · Post by **nicko** » Wed Mar 05, 2008 8:57 am

Hi everybody,

I have a simple question about linker script and the way sections should be used (yes, I know, yet another linker question... but I didn't found answer to my question on this forum so...).

From what I have read .data contains initialized variables, whereas .bss contains unitialized globals, meaning these sections are marked as R/W.

So my question is : during execution, where does the global variables resides in memory ? In the defined .data/.bss sectins ? If so, is it not inside the binary ? How an ELF binary should be mapped ? Must we map .bss/.data sections R/W and the rest ReadOnly ?

You see I am confused.

Thanks for replying.

--
nicko

JamesM · Post by **JamesM** » Wed Mar 05, 2008 9:29 am

Hi.

In an ELF binary there are many sections, three of which are of particular note; .text, .data (gcc: .rodata also) and .bss.

.text and .data are similar in that they exist in the binary, and are copied verbatim into the target address space. The binary contains headers which tell the loader where the sections are to be loaded. They are normally mapped read-write. (.text may be exec-only, but that kills self modifying code like Java).

.bss is slightly different in that it occupies no space in the binary. You're right when you say that it is for "uninitialised global variables" - A header in the ELF file tells the loader where the .bss section should be placed, and how long it should be. The loader then memset's virtual memory from that address for that length to zero (0x00). Any other value and programs relying on values being initialised to zero won't work (technically values which are specifically initialised to zero in source code get put into the .bss as well!). This is mapped read-write.

Hope this clears some things up.

cheers,

James

nicko · Post by **nicko** » Wed Mar 05, 2008 10:52 am

First thanks for replying

JamesM wrote: .text and .data are similar in that they exist in the binary, and are copied verbatim into the target address space. The binary contains headers which tell the loader where the sections are to be loaded. They are normally mapped read-write. (.text may be exec-only, but that kills self modifying code like Java).

Ok this is why I was confused : I was sure that the .text and .rodata were always mapped ReadOnly, whereas others sections were mapped R/W. But so : how can a single binary be mapped in multiple address spaces (for example several instances of the same process) in a "safe way", if all process have write access on the .data section ?

JamesM wrote: A header in the ELF file tells the loader where the .bss section should be placed, and how long it should be. The loader then memset's virtual memory from that address for that length to zero (0x00).

So for a kernel linker script, is it good to locate the .bss section in the .data section like (if we don't make assumptions on initial value of global variables in the code) :

Code: Select all

  
.data :
{
        *(.data) *(.bss) *(COMMON)
}

or the linker script MUST separates the two section as follows:

Code: Select all

.data :
{
        *(.data) *(COMMON)
}
.bss :
{
        _bss_begin=.;
        *(.bss)
        _bss_end = .;
}

and zeroing in the startup code with memset(&_bss_begin, 0 , (&_bss_end - &_bss_begin)) ?

JamesM wrote: Hope this clears some things up.

Of course, thanks for your detailed explanations.

--
nicko

JamesM · Post by **JamesM** » Wed Mar 05, 2008 11:25 am

Hi,

nicko wrote:
JamesM wrote: .text and .data are similar in that they exist in the binary, and are copied verbatim into the target address space. The binary contains headers which tell the loader where the sections are to be loaded. They are normally mapped read-write. (.text may be exec-only, but that kills self modifying code like Java).
Ok this is why I was confused : I was sure that the .text and .rodata were always mapped ReadOnly, whereas others sections were mapped R/W. But so : how can a single binary be mapped in multiple address spaces (for example several instances of the same process) in a "safe way", if all process have write access on the .data section ?

The short answer is: it can't. Several instances of the same process do not share any data whatsoever. Each has its own copy of each section. Each is mapped read-write, so a process can change it's copy of any data, in it's address space. Reading through your text it's possible that you're confusing mapping read-write to mapping write-back, as in any data that is modified in memory is written back to the underlying binary file (like with a UNIX mmap). The latter is NOT the case. The binary is mapped into virtual memory - the program can change anything it wants but those changes are never propagated back to the original binary file.

JamesM wrote: A header in the ELF file tells the loader where the .bss section should be placed, and how long it should be. The loader then memset's virtual memory from that address for that length to zero (0x00).
So for a kernel linker script, is it good to locate the .bss section in the .data section like (if we don't make assumptions on initial value of global variables in the code) :
Code: Select all
  
.data :
{
        *(.data) *(.bss) *(COMMON)
}
or the linker script MUST separates the two section as follows:
Code: Select all
.data :
{
        *(.data) *(COMMON)
}
.bss :
{
        _bss_begin=.;
        *(.bss)
        _bss_end = .;
}
and zeroing in the startup code with memset(&_bss_begin, 0 , (&_bss_end - &_bss_begin)) ?

OK, so, all this really depends on *what* is loading your kernel. If by 'kernel' you mean 'second stage bootloader' or some such low level file, then one must not make any assumptions about the initial value of global data, and any method should suffice, however if you mean a 'proper' kernel, then any bootloader worth its salt should zero the .bss for you.

That means it needs to be able to find the .bss, so the first method won't work (it reads like this: for every object file I am linking together, the .data and .bss sections of those object files should be bundled together and labeled '.data'). That has the problem that you lose the line between initialised and uninitialised data (remember that binary data associated with .data exists in the ELF file, whereas no such data exists for .bss, as it is assumed to be zero-set).

The latter code would work, however the memset would be redundant - it is the file loader's job to zero the .bss, not the loaded file.

Hope this helps,

James

nicko · Post by **nicko** » Wed Mar 05, 2008 11:48 am

Hi,

JamesM wrote: Reading through your text it's possible that you're confusing mapping read-write to mapping write-back, as in any data that is modified in memory is written back to the underlying binary file (like with a UNIX mmap). The latter is NOT the case.

Yes that is ! Thanks for the clarification.

For .bss, indeed I will use the second method.

Thanks for all.

--
nicko