Yesterday I prepared the v0.4 release of PDCLib. My _PDCLIB_malloc() finally worked to my satisfaction, as the test drivers proved. So I removed the _PDCLIB_ wart (which I had added during development to be safe from premature coredumps), and with a last test driver run, prepared to do the packaging.
[tt]SIGSEGV: Error 11 (core dumped).[/tt]
Bugger. So I added a couple of printf's to find out what's happening, one of them on the first line of [tt]main()[/tt] and one on the first line of [tt]malloc()[/tt].
[tt]Bus error (core dumped).[/tt]
Uh. Obviously the startup code already calls malloc(), in some way that makes it hickup - before [tt]printf()[/tt] is functional. Now an interesting question arose: How do you debug a program that dies before stdout is available?
After some time of confused hacking around, Candy sparked an idea that, in the end, led to the following code snippet which might be useful elsewhere, too:
Code: Select all
/* global */
char doodle[ 65535 ] = "doodle: ";
site_t doodleptr = 8;
/* in code */
doodleptr += sprintf( &doodle[ doodleptr ], "message" );
Nevermind who or what is actually calling [tt]malloc()[/tt] this early - one offender is [tt]atexit()[/tt], but there are others and it doesn't really matter.
What does matter is that the end-of-heap pointer, at program startup, is not page-aligned. Using the system [tt]malloc()[/tt], it is page-aligned once [tt]main()[/tt] is called, but apparently that is courtesy of the system [tt]malloc()[/tt], not by some other means.
Now, my page-allocating code assumed a page-aligned end-of-heap. No sweat, I added some code to take care of this: Take end-of-heap ( using [tt]sbrk(0)[/tt] ), cast to intptr_t, and use the modulus operator to find out by how much end-of-heap should be adjusted to make it page-aligned.
The problem is that the call doing the page-alignment ( [tt]sbrk( unaligned )[/tt], with unaligned being anything between 0x1 and 0xfff for a PAGESIZE of 4096 ) is where my code coredumps.
I cannot find anything in the manpage for [tt]sbrk()[/tt] that says you cannot use small values, or that it cannot be used during early startup. ( The [tt]sbrk( 0 )[/tt] works fine. )
Before I start digging into other lib's source code, has anyone here an idea what might be going wrong?