Page 1 of 2

Broken Newlib Program

Posted: Sat Apr 21, 2018 5:59 pm
by tay10r
For the last few months, I've been trying to fix the newlib port for BareMetal OS.
It has been very frustrating.

I was hoping to get some troubleshooting suggestions.

Currently, the newlib test program is loaded into the address 0xffff800000000000.
This may be significant or it may not be. I had to add the option '-mcmodel=large' to GCC for the program sources (including newlib.)
I've tested the same program running at 16 MiB (0x1000000), and the same problem occurs.

When I open up GDB and view the execution steps, I found that the program crashes when the reentrant structure is dereferenced.
The reentrant structure is declared as a static global variable, and has the address 0xffff80000001b340. It is referenced by the
function __getrent(), and by the variable _impure_ptr.

I looked at the disassembly when the structure gets dereferenced, and it appears that the address that is returned is a wild pointer.
I found out that the wild pointer is actually the first eight bytes of the address 0x00, so that tells me there might be a null pointer somewhere.

There are other programs in C and assembly that don't use newlib, and they load and execute just fine.
I noticed that the only difference between the newlib program and the programs that work is that the newlib program has a .got section, and the other ones don't.
But I thought that statically linked executables that are not position independent don't need to be modified before they are loaded (meaning no relocations.)
I verified that there are no relocations with the command.

Code: Select all

readelf --relocs test.app
I wrote a program that writes to and reads from the same point int memory, 0xffff80000001b340, and it works fine.
I did that just in case that the memory was not properly mapped there, but it looks like it is mapped just fine.

I'm really just looking for troubleshooting tips here, I don't want to ask anyone to debug the project.

But in case you want to look at the code, here's the link to the newlib port.

https://github.com/ReturnInfinity/BareM ... e-addr-fix

Note that it's in the large-addr-fix branch (although I no longer think the large address is the problem anymore.)

Thanks in advance for anyone who has read this far!

Edit I forgot to mention, currently programs use the same stack as the one in the kernel.

Re: Broken Newlib Program

Posted: Sat Apr 21, 2018 10:00 pm
by simeonz
Check out the "--use-dynamic" or "-D" option. The elf format has separate information for the linker and the loader - linker symbols and relocations and loader symbols and relocations. The linker relocations and symbols are sometimes kept for debugging purposes and such, but have no functional meaning for the OS loader after the linking. Check out "readelf -r -D" and "readelf -s -D".

Re: Broken Newlib Program

Posted: Sun Apr 22, 2018 8:07 am
by tay10r
simeonz wrote:Check out the "--use-dynamic" or "-D" option. The elf format has separate information for the linker and the loader - linker symbols and relocations and loader symbols and relocations. The linker relocations and symbols are sometimes kept for debugging purposes and such, but have no functional meaning for the OS loader after the linking. Check out "readelf -r -D" and "readelf -s -D".
Thanks for that info. That's a handy option for the readelf utility!

Re: Broken Newlib Program

Posted: Sun Apr 22, 2018 10:58 am
by simeonz
tay10r wrote:Thanks for that info. That's a handy option for the readelf utility!
Is your test.app missing dynamic relocations as well?

Re: Broken Newlib Program

Posted: Sun Apr 22, 2018 11:11 am
by tay10r
simeonz wrote:
tay10r wrote:Thanks for that info. That's a handy option for the readelf utility!
Is your test.app missing dynamic relocations as well?
No, it does not have dynamic relocations. No interpreter either, and no shared libraries indicated by "ldd".

Re: Broken Newlib Program

Posted: Sun Apr 22, 2018 11:16 am
by tay10r
The program use to be opened as a flat binary, and then I added ELF support but the same exact memory reference crashes the system.

Re: Broken Newlib Program

Posted: Mon Apr 23, 2018 11:36 am
by simeonz
I have hard time imagining what the problem might be. I will tell you what I would do. I am not sure if any of that would be useful, but these are the things that I would try first. If you provide the elf (not the binary) I will have a look into it.

First, use the unstripped libraries for a while. You can use "nm -l" to find the source file in which _impure_ptr was defined, under the current newlib configuration. (Say impure.c.) Add -save-temps in the option list of gcc for newlib in order to acquire the preprocessed and assembly files. Check out the declaration of _impure_ptr in the preprocessed file, to be sure that no funny attributes are specified there. Inspect the intermediate file (e.g. impure.o) with readelf and determine if it contains some notable relocations. Finally, using readelf -s and hexdump or similar utility, you can dump the data in the location of _impure_ptr in the final executable.

Re: Broken Newlib Program

Posted: Mon Apr 23, 2018 2:24 pm
by tay10r
simeonz wrote:I have hard time imagining what the problem might be. I will tell you what I would do. I am not sure if any of that would be useful, but these are the things that I would try first. If you provide the elf (not the binary) I will have a look into it.

First, use the unstripped libraries for a while. You can use "nm -l" to find the source file in which _impure_ptr was defined, under the current newlib configuration. (Say impure.c.) Add -save-temps in the option list of gcc for newlib in order to acquire the preprocessed and assembly files. Check out the declaration of _impure_ptr in the preprocessed file, to be sure that no funny attributes are specified there. Inspect the intermediate file (e.g. impure.o) with readelf and determine if it contains some notable relocations. Finally, using readelf -s and hexdump or similar utility, you can dump the data in the location of _impure_ptr in the final executable.
I appreciate those tips, I'll do that later this evening!

I've attached two ELF executables. One is called "hello-c.txt" and is a working properly and the other is called "test.txt" and contains the broken newlib port.
I had to convert them to text files to upload them.
You could probably have figured this out, but you can convert them back using

Code: Select all

xxd -r test.txt >test
and

Code: Select all

xxd -r hello-c.txt >hello-c

Re: Broken Newlib Program

Posted: Mon Apr 23, 2018 4:55 pm
by simeonz
Are you sure that either of those files is related to the newlib port?

They seem to be based on the sources here, and contain exactly the functions - write, output, _start and main. There are minor differences, but mostly - one of them outputs one string and the other two. I mean, newlib doesn't seem detectable in the code...

Re: Broken Newlib Program

Posted: Mon Apr 23, 2018 5:15 pm
by tay10r
simeonz wrote:Are you sure that either of those files is related to the newlib port?

They seem to be based on the sources here, and contain exactly the functions - write, output, _start and main. There are minor differences, but mostly - one of them outputs one string and the other two. I mean, newlib doesn't seem detectable in the code...
I sent you the wrong "test" program. Sorry about that.

This forum has a max file size of 64 KiB, and the compressed text version of the newlib app is 512KB.

Here's a download for it: https://file.io/PwmfLL

Here's a GPG copy: https://file.io/4QvHrO

Re: Broken Newlib Program

Posted: Mon Apr 23, 2018 7:05 pm
by simeonz
On my end, I cannot find a problem with gdb:

Code: Select all

(gdb) disass __getreent
Dump of assembler code for function __getreent:
   0xffff800000001ea9 <+0>:     lea    -0x7(%rip),%rax        # 0xffff800000001ea9 <__getreent>
   0xffff800000001eb0 <+7>:     movabs $0x19477,%r11
   0xffff800000001eba <+17>:    add    %r11,%rax
   0xffff800000001ebd <+20>:    movabs $0xffffffffffffffc8,%rdx
   0xffff800000001ec7 <+30>:    mov    (%rax,%rdx,1),%rax
   0xffff800000001ecb <+34>:    mov    (%rax),%rax
   0xffff800000001ece <+37>:    retq
End of assembler dump.
(gdb) x/a 0xffff800000001ea9 + 0x19477 + 0xffffffffffffffc8
0xffff80000001b2e8:     0xffff80000001ba88 <_impure_ptr>
(gdb) x/a 0xffff80000001ba88
0xffff80000001ba88 <_impure_ptr>:       0xffff80000001b340 <impure_data>
As you can see, the function correctly references _impure_ptr, and the pointer in _impure_ptr is correctly initialized with the address of impure_data. objdump -d/-s confirms this. At least __getreent seems fine.

In my opinion, the reason why you have a .got section (and _impure_ptr is accessed using .got) is that newlib was compiled as relocatable with "-r". This does make sense for static libraries only if they will be used by shared objects, but I have no other explanation. The gcc architecture is such, that when instructed, the compiler generates load-time relocatable .got references in the code, for which the linker eventually creates loader relocations. But even if the linker creates non-relocatable executable, such as in your case, the code remains .got dependent and the .got indirection for data cannot be discarded. There may be a limited number of cases where the linker can patch the instruction opcode of a data reference to bypass .got, but not always. The indirection for functions can always be avoided if the base address is fixed at link time, because the code that calls a trampoline in .got.plt and the code that calls a function directly are not fundamentally different, and the linker can decide which target to use at the last moment. Your .got.plt is correspondingly empty.

Re: Broken Newlib Program

Posted: Mon Apr 23, 2018 7:31 pm
by tay10r
simeonz wrote: In my opinion, the reason why you have a .got section (and _impure_ptr is accessed using .got) is that newlib was compiled as relocatable with "-r". This does make sense for static libraries only if they will be used by shared objects, but I have no other explanation.
Yes, newlib does compile with that. So that makes sense to me.
simeonz wrote: The gcc architecture is such, that when instructed, the compiler generates load-time relocatable .got references in the code, for which the linker eventually creates loader relocations. But even if the linker creates non-relocatable executable, such as in your case, the code remains .got dependent and the .got indirection for data cannot be discarded. There may be a limited number of cases where the linker can patch the instruction opcode of a data reference to bypass .got, but not always. The indirection for functions can always be avoided if the base address is fixed at link time, because the code that calls a trampoline in .got.plt and the code that calls a function directly are not fundamentally different, and the linker can decide which target to use at the last moment. Your .got.plt is correspondingly empty.
You'll have to bear with me because I don't think I'm as well versed in ELF executables as you are.
I thought that if the program is an executable and not relocatable, that there would be no extra work for the loader to do with the GOT.
Is that a bad assumption? Should I be modifying the GOT in some way at load time?
I looked into the section data, and it appears there are addresses in there.
I saw that the GOT section is part of the region indicated by the program header, so it does get loaded into memory.

Re: Broken Newlib Program

Posted: Wed Apr 25, 2018 7:29 am
by simeonz
tay10r wrote:I thought that if the program is an executable and not relocatable, that there would be no extra work for the loader to do with the GOT.
The loader shouldn't have work to do, assuming that the program is statically linked and the linker is instructed to produce non-relocatable binary. That does not mean that the .got, .got.plt, etc, sections will not be used by the code. They will just be fixed with static information and will require no dynamic relocations.

The details are somewhat involved, but here is the gist of it. When you instruct the compiler to generate relocatable code, it assumes the existence of a got table. Similarly, when instructed to generate non-relocatable code, it assumes direct calls and references. In the end, even if the references made by non-relocatable code end up being to relocatable objects, and similarly, the references made by relocatable code end up being to objects at known addresses, the linker will make sure that the compiler's expectations are always met. In particular, relocatable objects that were originally thought to be at a fixed address will be either copy-relocated for data references or trampoline proxied for function references. Similarly, object address that were thought to be "floating", but ended up being fixed, will be filled into got, even though they can be resolved at link time.

Again, the details involve the attributes of symbols, copy relocations, multiple types of got relocations, and various linker optimizations. But the gist of it is - no matter how the intermediate object files and static libraries are compiled, if they are statically linked, there should be no dynamic relocations.

Re: Broken Newlib Program

Posted: Wed Apr 25, 2018 5:36 pm
by tay10r
Well, after months of arduous debugging, I found out what the issue was.

The startup code that was responsible for zeroing the BSS section was also zeroing the data section.
This caused a global variable in newlib, _impure_ptr to be zeroed out.

The linker script had caused the symbols __bss_start and _end to overlap the data section.
Here's an except:

Code: Select all

__bss_start = .;
.bss {
    *(.bss)
}
_end = .;
Here's what it should have been:

Code: Select all

.bss {
    __bss_start = .;
    *(.bss)
    _end = .;
}
I guess the placement in the brackets makes a difference.

Re: Broken Newlib Program

Posted: Thu Apr 26, 2018 12:06 am
by simeonz
I wouldn't have picked up on that myself. Not by looking at the script, to be honest. (Is it app.ld from github?) The manual kind-of warns about this issue, but sends perplexing message with the round-about solution it offers. Your approach seems more stable. In case you're wondering, the relevant documentation snippet I am talking about is in the bottom half of this page. Anyway - sorry for not being very helpful. Glad you figured it on your own.