Double check what I am about to say, but here are my 5 cents.
First, regarding your specific issue, there is a function in newlib\libc\misc\init.c - __libc_init_array, that you probably need to call before main. Also, there are several implementations of the entry point in crt0.c in several locations (for different platforms). If nothing else you can use them as reference in creating yours. I am just sharing observations. I don't know if it is relevant.
Generally speaking, gcc generates elfs according to the System V ABI. This depends on the output format, compilation/linking options, the toolchain build options, but this is the default behavior, and even the alternative behaviors emulate it to some extent. There are some linux specific conventions (such as those in the Linux Standard Base), which gcc also tries to conform to and even ports them to non-conforming/generic platforms.
Ignoring any conformance, the minimum you need to provide is a _start routine in order to call the .init section before calling main, and register the .fini section with atexit. Furthermore, for .init and .fini to work, you must provide their prologues and epilogues in crti.o and crtn.o. Those files are ordered on the linker command line by gcc to properly sandwitch the rest of the _init and _fini instructions and form function bodies (assuming the prologues and epilogues are correct). Your implementation of _start will be in crt1.o, again, in order to have the linker pick it up automatically on request by the gcc front-end. This is all in the wiki, I believe.
However, if your compiler is built with --enable-initfini-array (check -v), you also need to iterate the pointers inside the sections .init_array and .fini_array, right after calling .init and right before calling .fini correspondingly. If you want to support dynamic loading, you need to use tags from the PT_DYNAMIC segment (i.e. use DT_PREINIT_ARRAY, DT_INIT_ARRAY, DT_INIT, DT_FINI_ARRAY, DT_FINI inside your loader). If you want to be System V ABI compliant, you need to initialize the pre-loaded shared objects of the executable using those tags in specific dependency-based order, then skip the initialization and termination routines of the executable itself and call _start with rdx argument that points to finalization code.
If you want to support certain language facilities - .ctors and .dtors (deprecated and may never be used by your compiler), exeption handling, or transactional memory, you need to add either .init (and .fini) or .init_array (and .fini_array) entry and perform the necessary setup and cleanup. You can do that using crtbegin.o and crtend.o (although whether you use atexit or crtend.o depends on the facility).
In other words, there are platform and compiler specific issues. You need to inspect your elf output and compiler build configuration to decide what you are or will be dealing with.
Some interesting references (some duplicate the wiki):
System V Application Binary Interface
Linux Standard Base (LSB), etc.
Linux x86 Program Start Up
Acronyms relevant to Executable and Linkable Format (ELF) (Search for "What do _start and __libc_start_main do")
The ELF format - how programs look from the inside
How Initialization Functions Are Handled
Mini FAQ about the misc libc/gcc crt files