Probably the worst bug to debug

Bender · Post by **Bender** » Tue Dec 23, 2014 9:27 pm

For the past 2 days, I was debugging my code because it caused a very strange segmentation fault, strange because it did not happen while my code was executing, but it was happening inside exit().
Firstly, this rules out some of the most common cases:

1. Accessing an invalid memory region / NULL pointer / already freed region (then I would've got the segfault pretty earlier)
2. Double free (same as above)
3. Accessing read-only-memory (e.g. Code Segment -- no, same as above)
etc..

Since the program is crashing during exit, this probably indicates that I might've caused a stack overrun, by doing an operation to a buffer created on stack, such that the size specified during the operation is greater than what actually the buffer is. For this to happen the crash must've happened very early in the program. Out of ideas, I went to #osdev, where sortie asked me if calling "exit(0)" causes a segfault, -- and it did, which means that this isn't a problem with the stack, since explicitly calling exit shouldn't require the return address to be on stack, but rather my 'exit' is somehow crashing. I searched on google for possible cases, one of them being memory and heap corruption. I ran a bunch of tools valgrind, gdb, electric fence, GCC's built-in address sanitizer, etc. and none of them actually told me what actually was causing the segfault, bringing me to a point where I considered that my libc had bugs. GDB was interesting though, as the program would work fine in most cases.

Then I updated my glibc, same result, upgraded my compiler, same result, and I went to point of upgrading my entire distro, well uh, same result.

Since a lot of people on #osdev, ##c, and #glibc suggested me to re-run my code under valgrind, I finally decided to pay serious attention to it's output, and it pointed me to an internal glibc function: _IO_flush_all_lockp, under __run_exit_handlers, which is called by exit @ genops.c. Unfortunately, I couldn't find much info about it, except another person having the same problem, but due to using uninitialized pointers.

It was 2 days already, I finally decided to take a look at the glib sources, and found my "_IO_flush_all_lockp", looking a what it was doing which seemed to flush all open streams, and then locking them, I realised my fault.

Code: Select all

FILE* fp = fopen("filename.ext");
....
free(fp); /** error: Must be fclose **/

Now that's pretty clean, debuggers won't suspect a thing, since I'm just freeing memory someone (in this case the kernel perhaps), allocated to me, GCC wouldn't warn me, since I could always allocate a "FILE*" pointer myself, for some unusual reason. Really hard to detect if your source files are long, worse even, it was a typo. And poor libc is trying to access that, since the stream is still open, and BOOM.

The most annoying part of this bug was that it'd randomly happen, for example making small "so-called fixes" in the program, would make it look like the bug disappeared, but after a few runs, it'd appear again.

/me checks again to see if the bug is actually solved.

no92 · Post by **no92** » Wed Dec 24, 2014 3:32 am

Why is this in the Auto-Delete Forum? It's extremely good and useful information.

iansjack · Post by **iansjack** » Wed Dec 24, 2014 4:22 am

Strange. I always understood that exit() was guaranteed to close all open (stdio) files even if the programmer forgot to.

Bender · Post by **Bender** » Wed Dec 24, 2014 6:48 am

iansjack wrote:Strange. I always understood that exit() was guaranteed to close all open (stdio) files even if the programmer forgot to.

Yes, and that's what it was trying to do, but I had (by mistake), freed the file pointer instead of closing it, and hence it caused a segfault while attempting to close it.

gravaera · Post by **gravaera** » Wed Dec 24, 2014 1:15 pm

Reading a well done writeup about somebody else's bug hunting is always fun, because you feel that kinship with your own past experiences

--Peace out,
gravaera

mathematician · Post by **mathematician** » Thu Dec 25, 2014 3:01 am

Back in the days of MS-DOS I would spend days trying to debug an assembly language program,and usually the culprit would turn out to be a hardware interrupt modifying some variable or other.

Roman · Post by **Roman** » Thu Dec 25, 2014 3:04 am

Isn't GDB able to trace the call stack? If so, it would be easy to find, where's the problem.

Bender · Post by **Bender** » Thu Dec 25, 2014 4:30 am

Roman wrote:Isn't GDB able to trace the call stack? If so, it would be easy to find, where's the problem.

The backtrace told me that it was "exit()" that was causing the fault, but it's highly unlikely that glibc would have a bug, since it's got a ton of programs using it, and even if there is, a bug in exit, a function used by every C program out there, impossible.
The real bug was in "free(fileptr)" -- but doing that is (by language) perfectly legal, although, undefined, since you're supposed to use fclose for file streams.

Combuster · Post by **Combuster** » Thu Dec 25, 2014 5:35 am

Bender wrote:doing that is (by language) perfectly legal, although, undefined

Actually, the language specification states under free:

C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.

So passing something you obtained from fopen() is not a valid parameter, and thus not legal.

That is apart from the fact that undefined behaviour is not to be considered legal in the first place.

KemyLand · Post by **KemyLand** » Tue Dec 30, 2014 2:47 pm

Combuster wrote:
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.

There are some functions that return legally free()able pointers, such as strdup().

onlyonemac · Post by **onlyonemac** » Sat Jan 03, 2015 7:28 am

KemyLand wrote:There are some functions that return legally free()able pointers, such as strdup().

Furthermore in this case the compiler wouldn't know where the pointer was obtained from.

cyx · Post by **cyx** » Sat Jan 10, 2015 9:40 pm

KemyLand wrote:
Combuster wrote:
Bender wrote:doing that is (by language) perfectly legal, although, undefined
Actually, the language specification states under free:
C standard wrote:ptr - Pointer to a memory block previously allocated with malloc, calloc or realloc.
So passing something you obtained from fopen() is not a valid parameter, and thus not legal.
There are some functions that return legally free()able pointers, such as strdup().

You can make any function return a legally free()able pointer as long as it allocates it with malloc, calloc or realloc

Kevin · Post by **Kevin** » Sun Jan 11, 2015 6:25 am

KemyLand wrote:There are some functions that return legally free()able pointers, such as strdup().

strdup() isn't standard C (yet), and POSIX defines free() so that it's legal to pass a "pointer earlier returned by a function in POSIX.1‐2008 that allocates memory as if by malloc()".

OSDev.org

Probably the worst bug to debug

Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug

Re: Probably the worst bug to debug