OSDev.org

Posted: **Wed Apr 06, 2011 12:03 pm**

OK, so I needed another way to debug this lousy flat-memory model (in my terminal application), that after a while crashes because of a corrupt heap. I think the heap-manager actually works, but it is pointers that gets corrupts, which leads to random overwrites. A few months ago I had another debug-technique aimed to find this elusive bug. Then I allocated every memory chunk page-aligned, and never reused pages. I found some problems this way, but the heap corruption problem just disappeared (because every object was page-aligned, and there where no linked lists of free/used memory blocks that could become corrupt).

The method is really simple. I added a new include file that looks like this (it's a little shorter than the real thing):

Code: Select all

; DEFINE DO_API_CHECK       ; decomment to enable API checks 

IFDEF DO_API_CHECK

ApiSaveEax  Macro
    push eax
            Endm

ApiCheckEax Macro
    local check_ok
    
    push bp
    mov bp,sp
    pushf
    cmp eax,[bp+2]
    je check_ok
;
    int 3

check_ok:
    popf
    pop bp
    pop eax
            Endm

ApiSaveEbx  Macro
    push ebx
            Endm

ApiCheckEbx Macro
    local check_ok
    
    push bp
    mov bp,sp
    pushf
    cmp ebx,[bp+2]
    je check_ok
;
    int 3

check_ok:
    popf
    pop bp
    pop ebx
            Endm

ELSE

ApiSaveEax  Macro
            Endm

ApiCheckEax Macro
            Endm

ApiSaveEbx  Macro
            Endm

ApiCheckEbx Macro
            Endm

ENDIF

A sample of how to use it: (the function will return something in edi)

Code: Select all


Somefunction Proc
    ApiSaveEax    
    ApiSaveEbx    
    ApiSaveEcx    
    ApiSaveEdx    
    ApiSaveEsi    

; do something
   mov edi,1234h

    ApiCheckEsi
    ApiCheckEdx
    ApiCheckEcx
    ApiCheckEbx
    ApiCheckEax
    retf32
Somefunction Endp

Yesterday I implemented this API-checking function in some hundred functions, and tested. It turns out that the bug that corrupts the heap problably is in the DeleteSocket function, which failed to save lower part of ESI (SI). Because the compiler is aggressively optimizing code by using registers as much as possible, it is quite likely that some pointer will point to some new (random) memory block as ESI changes. I already knew before that it was related to the connection/deletion of sockets, because when I reduced this, heap corruption errors became less common.

If I had validated with a segmented memory-model I would have caught this problem very fast, but with paging only there is no simple way to find it.

Posted: **Wed Apr 06, 2011 12:33 pm**

Or you could have run your program under Valgrind's MemCheck (or an equivalent debugging tool)...

An alternative method would be to map the heap read-only (or even no-access) and then track all reads or writes to it. Complex, yes, but entirely workable.

Posted: **Wed Apr 06, 2011 1:13 pm**

Owen wrote:Or you could have run your program under Valgrind's MemCheck (or an equivalent debugging tool)...

How would that be possible? The API-checker is put at the OS side of the API in order to guarantee that the register-usage rules defined for the compiler are upheld. When a function thrashes a register that the compiler think it should retain, some pointer (or value) will change, and then when something is done with the pointer, it would thrash some memory content (or, in best case, pagefault). I don't see how MemCheck could detect that. If every memory object was mapped to it's own selector, OTOH, a modified pointer would very often fault, and if it didn't, it wouldn't be able to overwrite any other memory object (unless the selector is thrashed).

Owen wrote:An alternative method would be to map the heap read-only (or even no-access) and then track all reads or writes to it. Complex, yes, but entirely workable.

Isn't the heap by default writable? How could a readonly heap be useful?

Posted: **Wed Apr 06, 2011 2:28 pm**

rdos wrote:
Owen wrote:Or you could have run your program under Valgrind's MemCheck (or an equivalent debugging tool)...
How would that be possible? The API-checker is put at the OS side of the API in order to guarantee that the register-usage rules defined for the compiler are upheld. When a function thrashes a register that the compiler think it should retain, some pointer (or value) will change, and then when something is done with the pointer, it would thrash some memory content (or, in best case, pagefault). I don't see how MemCheck could detect that. If every memory object was mapped to it's own selector, OTOH, a modified pointer would very often fault, and if it didn't, it wouldn't be able to overwrite any other memory object (unless the selector is thrashed).

Have you ever even used MemCheck? Do you even have a clue how Valgrind works?

Hint: Valgrind is a dynamic binary translator.

rdos wrote:
Owen wrote:An alternative method would be to map the heap read-only (or even no-access) and then track all reads or writes to it. Complex, yes, but entirely workable.
Isn't the heap by default writable? How could a readonly heap be useful?

A readonly heap is a heap which generates a page fault on every write...

Posted: **Wed Apr 06, 2011 10:48 pm**

Owen wrote:Hint: Valgrind is a dynamic binary translator.

Valgrind is only avaliable for Linux, and as such does not have much relevance for other OSes. If I was to port a binary translator I would chose to port a JVM, not Valgrind.

Posted: **Thu Apr 07, 2011 3:58 pm**

rdos wrote:
Owen wrote:Hint: Valgrind is a dynamic binary translator.
Valgrind is only avaliable for Linux, and as such does not have much relevance for other OSes. If I was to port a binary translator I would chose to port a JVM, not Valgrind.

There are binary translators and binary translators, my friend. The JVM is one, valgrind is the other.

Posted: **Thu Apr 07, 2011 4:10 pm**

Valgrind also works on other platforms. Such as Mac OS X/Darwin; you can find it in MacPorts*

(* Though the port doesn't work for me. A variety of factors could be in play. I, personally, suspect that the fact that the port built the (unsupported!) 64-bit version is probably the culprit)

Posted: **Fri Apr 08, 2011 1:14 am**

I bet that Valgrind is tightly coupled to GCC, and its Libc, meaning it would be just as easy to write it from scratch when that target is not GCC, not LIBC, not ELF-type executables and not a "Unix-like" platform. I won't believe it is truely portable until I've seen a Win32 version.

Posted: **Fri Apr 08, 2011 2:22 am**

You're just arguing out of ignorance here.

Speaking of which, why don't you port a console emulator? Several modern ones have an integrated recompiler as well, among which several known portable ones.

Posted: **Tue Apr 12, 2011 2:28 am**

Combuster wrote:You're just arguing out of ignorance here.

I thought I argued out of laziness. You know, there is only so much time available, and when time is scarce, one needs to work on things that are useful and ignore things that are not. I just cannot see the benefit of porting Valgrind to RDOS. It will (probably) take too much time, and it won't solve enough issues to be time-efficient.

OSDev.org

Designing an API-checker

Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker

Re: Designing an API-checker