Scoopta wrote:I've ported the virtual allocator to Linux userspace using mmap with the hope that it'd be easier to catch problems but I'm not actually sure how to test it. How do you find memory leaks or problems in a memory allocator? All my experience is around userland C with tools like libasan which I don't believe will work for this.
Problems in the memory allocator itself should be relatively easy to test. Just allocate and deallocate memory, with perhaps instrumentation turned on in debug to verify the allocator meta-data.
I don't worry about memory leaks myself. I have a mark/sweep garbage collector handle freeing memory, but the same code can also be used to look for memory that is not referenced and should have been freed, but hasn't been.
Basically, when you want to find a leak, you:
- Start from known roots - global variables, thread stacks, thread CPU register state. For each datum that looks like a pointer to the heap, look up in the heap whether there is an allocation for that pointer.
- For any allocations found from the roots, scan those allocations looking for data that looks like a pointer in the heap, as you did for the roots.
- This will result in more allocations to scan. Keep doing that until you run out of allocations to scan (basically, until you have no more allocations you haven't already scanned)
- This will leave you with two types of allocations - Allocations that have been resolved from roots, possibly via other allocations. These allocations are still in use. And allocations that have not been resolved from roots. These are your leaked allocations.
In a mark-sweep garbage collection, these leaked allocations are the garbage, and my code sweeps those away and reuses them.
You can instead report them as leaks, though.
See
Boehm GC for details.
My own allocator is
here, which operates similarly to the Boehm GC, but is simpler (it's not entirely general purpose, it doesn't handle arbitrarily big allocations.)
I also came across a small GC library
here, which is a lot simpler than mine, but has unbounded stack growth as the scanning is recursive (you can't really do that with an 8K kernel stack.) Something like this might be worth studying to form the basis of a leak detector.
Just like with GC, you won't want threads running, mutating your graph of references, so you'll want to "stop the world" when doing this leak detection. Perhaps it can be an idle time task, when other threads are not running anyway?
Refs:
https://hboehm.info/gc/leak.html
http://thewrongchristian.org.uk:8082/fi ... ibk/slab.c
https://github.com/mkirchner/gc
devc1 wrote:
use your memory map and free every address page per page, try to free it another time without allocating to check if the free function works properly.
Of course, the only reasonable thing for an allocator to do when faced with a double free is some sort of panic. While it is perhaps relatively easy to catch this case and preserve the meta-data in a correct form, this is a case that already spells disaster for the wider kernel.
If you've freed a block of memory twice, it's clear at least two pieces of code thinks it owns the region, and at least one has been using the region after it is invalid.
Such a situation should be fatal for a kernel, with as much information preserved in the panic report as possible (such as a stack trace where the second free has come from) so that the source of the error can be found.
It is not safe to continue in this circumstance, though (IMO.) The kernel is compromised, and these sorts of faults are often sources of exploits.
Scoopta wrote:Additionally how do I test my physical allocator or even port it to userspace?
I wouldn't bother. Physical memory allocation is at a different level to virtual address space mapping, and there isn't really an equivalent in user space.
If you were really desperate to test it though, you could use a fixed size file to represent your physical memory, and use lots of small mmaps to map each page of that file to a user virtual address at the page level. You can then allocate "physical pages" from your file, and map them into your address space using mmap.
But your underlying kernel is unlikely to thank you, though I think most modern kernels use some sort of tree based virtual address space map these days, rather than a linear sorted list more common in older kernels.