Page 1 of 2

A debug session, darker than night...

Posted: Mon Feb 06, 2006 2:38 am
by Solar
I would post this to "General Programming", but I doubt anyone outside OS development has ever dug this deep. A nice debugging hint and a question to the Unix-savy included.

Yesterday I prepared the v0.4 release of PDCLib. My _PDCLIB_malloc() finally worked to my satisfaction, as the test drivers proved. So I removed the _PDCLIB_ wart (which I had added during development to be safe from premature coredumps), and with a last test driver run, prepared to do the packaging.

[tt]SIGSEGV: Error 11 (core dumped).[/tt]

Bugger. So I added a couple of printf's to find out what's happening, one of them on the first line of [tt]main()[/tt] and one on the first line of [tt]malloc()[/tt].

[tt]Bus error (core dumped).[/tt]

Uh. Obviously the startup code already calls malloc(), in some way that makes it hickup - before [tt]printf()[/tt] is functional. Now an interesting question arose: How do you debug a program that dies before stdout is available?

After some time of confused hacking around, Candy sparked an idea that, in the end, led to the following code snippet which might be useful elsewhere, too:

Code: Select all

/* global */
char doodle[ 65535 ] = "doodle: ";
site_t doodleptr = 8;

/* in code */
doodleptr += sprintf( &doodle[ doodleptr ], "message" );
Voila, you get your output printed to memory, where it will be included in a coredump and can be retrieved with something like [tt]hexdump -C[/tt]. We're back on the trail of this bugger...

Nevermind who or what is actually calling [tt]malloc()[/tt] this early - one offender is [tt]atexit()[/tt], but there are others and it doesn't really matter.

What does matter is that the end-of-heap pointer, at program startup, is not page-aligned. Using the system [tt]malloc()[/tt], it is page-aligned once [tt]main()[/tt] is called, but apparently that is courtesy of the system [tt]malloc()[/tt], not by some other means.

Now, my page-allocating code assumed a page-aligned end-of-heap. No sweat, I added some code to take care of this: Take end-of-heap ( using [tt]sbrk(0)[/tt] ), cast to intptr_t, and use the modulus operator to find out by how much end-of-heap should be adjusted to make it page-aligned.

The problem is that the call doing the page-alignment ( [tt]sbrk( unaligned )[/tt], with unaligned being anything between 0x1 and 0xfff for a PAGESIZE of 4096 ) is where my code coredumps.

I cannot find anything in the manpage for [tt]sbrk()[/tt] that says you cannot use small values, or that it cannot be used during early startup. ( The [tt]sbrk( 0 )[/tt] works fine. )

Before I start digging into other lib's source code, has anyone here an idea what might be going wrong?

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 3:20 am
by distantvoices
Just a question: how do you initialize malloc(). Is there a chance that there *are* things happening (say writing some pointers, moving some stuff etc) ere the first malloc() call takes place - and ere it reaches sbrk() or morecore() or mmap()?

to me this smells like a not present vmm entity.

I suppose you make an sbrk() before you do the page alignment stuff.

alas

welcome to debug sessions darker than night! mwaaahahahahahaaa! Be my guest!

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 3:29 am
by Solar
beyond infinity wrote: Just a question: how do you initialize malloc().
Not at all. I have a static struct that holds two memnode_t pointers - one to the first and one to the last of the "free nodes" list - that is initialized to hold NULL. (Verified.)

The first call to malloc() thus finds an empty "free nodes" list, and calls the page allocator to demand one or more pages from the kernel. (Verified.)

The page allocator keeps a static copy of heap-end, which is also initialized to NULL. (Verified.) It thus realizes it is called the first time, calls sbrk(0) to determine current heap-end, does the calculation to find out whether heap-end is page aligned (it is not), determines the offset to the next page boundary, and calls sbrk() with that offset again. (Verified.)

That call to sbrk() coredumps. (Verified.)

I'll post the source and doodle output to my Wiki scratchbook ASAP so you can pick on my pre-release code. ;)

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 4:07 am
by Solar
::) :-\ ::) :-[ ::)

Forget what I wrote. While stripping down the code to the bare necessities prior to posting it here, I think I found what I've been doing wrong...

Stupid stupid Dobbie, got to stick my ears in the oven door again...

[me=Solar]slaps his forehead. Hard. Twice.[/me]

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 4:15 am
by Candy
Solar wrote: Forget what I wrote. While stripping down the code to the bare necessities prior to posting it here, I think I found what I've been doing wrong...
Well, come on, what is it? I'm quite wondering what it is :)...

Most importantly, is it in your code or "their" code?

PS: this is the kind of method I use for debugging my os... plus the bochs debugger... I refuse to add screen output except for in a module where it can print using the proper interfaces. Busy debugging module loader for a while now... :) (that was a joke, didn't have a real day off for osdev in the past 2 years or so).

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 4:20 am
by Solar
Well... I'm ashamed to admit this, but...

ahem...

In the [tt]sprintf()[/tt] statement mentioned above...

...well, just don't forget the '&', ok? :-\

It still bugs out, but that's a different area of the code now.

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 4:38 am
by Candy
Solar wrote: Well... I'm ashamed to admit this, but...

ahem...

In the [tt]sprintf()[/tt] statement mentioned above...

...well, just don't forget the '&', ok? :-\

It still bugs out, but that's a different area of the code now.
hm...

Needn't remind you of that OTHER thread in which you implied to me that I had to dereference a pointer... now it's you not referencing an object... :).

Did you try "-W -Wall -Werror -pedantic -pedantic-errors -std=c99" ? That's my default compiler setting and it catches all the "easy" errors I keep making out of lazyness (pointer dereferencing, sloppy casting, not declaring i etc.). When I need to get something done I turn off -pedantic-errors since it commonly bugs about nonimportant stuff it can easily fix for me (variable order in constructor, stuff like that).

Gcc 4+ gives loads more warnings, even on stuff I know to be correct. Still gotta figure out how to code that in accordance with it.

[edit] second note, in newer gcc's (4.0+, possibly newer 3.3) -W is renamed to -Wextra, due to the ambiguosity of -W). [/edit]

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 4:43 am
by Solar
-Wall is on, but it seems I was not paranoid enough. ;)

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 7:58 am
by durand
There is a utility on Solaris and available on linux called pstack. It takes the core dump and prints stack information. (function call stack). You can use this to see exactly where the signal/exception occurred. I use it regularly. Perhaps you can find something similar.

Re:A debug session, darker than night...

Posted: Mon Feb 06, 2006 8:11 am
by Solar
gdb can do the same for you.

My test drivers are per function, i.e. there's never more than one function under test. (Well OK, some of them depend on each other, but usually the "lower" function has been already tested by then.) Which means I usually have a pretty good idea where it broke - just the "why" is sometimes a bit elusive, especially when twiddling with early-startup etc... ;)

Anyway, the problem has been solved already. (Stupid, stupid /me...) I guess I'll even add an optimization or two tonight before releasing v0.4 tonight or tomorrow.

Re:A debug session, darker than night...

Posted: Wed Feb 08, 2006 8:21 am
by srg
Interesting, so your end of heap must be page aligned. Is this the recommend course of action. I was planning on it to be any arbitary address and then the page allocator would work out wheather a page, and how many should be allocated. Assuming the marker is page alligned would make the program simpler (keeping with the KISS phylosophy), would it lead to an easier design with less bugs?

thanks

Re:A debug session, darker than night...

Posted: Wed Feb 08, 2006 9:03 am
by Solar
srg wrote: Interesting, so your end of heap must be page aligned.
No, it doesn't have to. I just thought it would be nice if the hosting kernel knew that memory allocations only ever happen in page-sized blocks.

That the end-of-heap is not page-aligned at program startup was somewhat of a spoiler, of course. As I was eager to get v0.4 released, I didn't make an extra effort to seperate that first (non-aligned, non-page-sized) request from the others, but could of course do so in a later release.

As I am fully intending to lean on dlmalloc() for the final release of PDCLib, however, I don't think I will put much effort into the malloc() as it stands right now. I might change my mind after looking into dlmalloc(), but other things are more important right now.
Assuming the marker is page alligned would make the program simpler (keeping with the KISS phylosophy), would it lead to an easier design with less bugs?
Not sure what you mean with "program". But if the kernel knew that all memory requests would be multiples of the system page size, it's easy to imagine how it would make kernel-space memory management easier. For one, you don't have to worry about partial page allocations from user-space.

Re:A debug session, darker than night...

Posted: Wed Feb 08, 2006 9:31 am
by srg
Solar wrote:
srg wrote: Interesting, so your end of heap must be page aligned.
No, it doesn't have to. I just thought it would be nice if the hosting kernel knew that memory allocations only ever happen in page-sized blocks.

That the end-of-heap is not page-aligned at program startup was somewhat of a spoiler, of course. As I was eager to get v0.4 released, I didn't make an extra effort to seperate that first (non-aligned, non-page-sized) request from the others, but could of course do so in a later release.

As I am fully intending to lean on dlmalloc() for the final release of PDCLib, however, I don't think I will put much effort into the malloc() as it stands right now. I might change my mind after looking into dlmalloc(), but other things are more important right now.
Assuming the marker is page alligned would make the program simpler (keeping with the KISS phylosophy), would it lead to an easier design with less bugs?
Not sure what you mean with "program". But if the kernel knew that all memory requests would be multiples of the system page size, it's easy to imagine how it would make kernel-space memory management easier. For one, you don't have to worry about partial page allocations from user-space.
This could be a slight misunderstanding of how dlmalloc() works, but how would you make sure that its requests are multples of the system page size. This is one reason I'm going to have a look at bget as well, as for allocating memory from the kernel, you have to do it yourself and tell it how much extra memory it now has to play with.

Re:A debug session, darker than night...

Posted: Wed Feb 08, 2006 4:13 pm
by Rob
I'm not a (C) lib guru or anything. But I don't see why you couldn't either have the func (dl)malloc(number_of_pages) instead of number of bytes. Or you can round the number of bytes up to the nearest full page. Or you can return an error if it isn't a multiple of the pagesize?

Re:A debug session, darker than night...

Posted: Thu Feb 09, 2006 1:25 am
by Solar
@ srg, Rob:

Huh?

Erm...

OK, short discourse into user-space memory management.

malloc() is defined by the C standard. You ask it for X bytes, you get X bytes, period. You can't make malloc() work for X pages instead and expect C coders to live with that.

But as malloc() is a user-space function itself, it has to rely on the kernel to provide the memory it passes out to an application. sbrk(), mmap(), doesn't really matter. The thing is, such kernel calls are expensive. You don't want to make a system call every time someone asks for 8 byte more memory. Thus, the main purpose of malloc() is that of a buffer / cache between the kernel memory management and the user space application.

dlmalloc() is Doug Lea's implementation of malloc(). (Which makes it fundamentally different from BGET, which doesn't mimic the malloc() defined by the C standard.) If I adapt dlmalloc() for PDCLib, the resulting function will still be called malloc(), not dlmalloc().

And I don't intend to just take dlmalloc(), rename it, and make it compile. If I adapt it to PDCLib, that means a complete refactoring of the thing. I want to take it apart, understand it, and reassemble it in a way that is consistent with the rest of PDCLib.

And in the process, I am quite sure I can make it so that even dlmalloc() does its memory requests in multiples of page size.