Page 1 of 1

Understanding sbrk

Posted: Mon May 25, 2020 4:43 pm
by OSwhatever
sbrk is kind of a relic but I see many implementations that rely on this old function call. This is a bit unfortunate because now days you don't want to rely on continuous memory for your allocators. Using mmap or similar is better as mmap just allocates the chunk of memory somewhere in the virtual address space. This is of course better as it allows many different allocators to coexist, one module can be implemented in a complete different language and a different implementation of the global allocator.

Then I read that some allocators can actually deal with that sbrk doesn't return continuous memory. sbrk would just be converted to a mmap call that returns the new location. However, when calling sbrk with a negative number thus reducing the heap space the sbrk implementation must have some kind of history so that it can unmap the correct portions in the virtual address space.

Is this correct that sbrk is actually allowed to return discontinuous memory?

Re: Understanding sbrk

Posted: Mon May 25, 2020 5:50 pm
by alexfru
Single UNIX Specification version 2 doesn't say anything about the memory necessarily being contiguous.

IBM z/OS says:
z/OS wrote: The storage space from which the brk() and sbrk() functions allocate storage is separate from the storage space that is used by the other memory allocation functions (malloc(), calloc(), etc.). Because this storage space must be a contiguous segment of storage, it is allocated from the initial heap segment only and thus is limited to the initial heap size specified for the calling program or the largest contiguous segment of storage available in the initial heap at the time of the first brk() or sbrk() call. Since this is a separate segment of storage, the brk() and sbrk() functions can be used by an application that is using the other memory allocation functions. However, it is possible that the user's region may not be large enough to support extensive usage of both types of memory allocation.

Prior usage of the sbrk() function has been limited to specialized cases where no other memory allocation function performed the same function. Because the sbrk() function may be unable to sufficiently increase the space allocation of the process when the calling application is using other memory functions, the use of other memory allocation functions, such as mmap(), is now preferred because it can be used portably with all other memory allocation functions and with any function that uses other allocation functions. Applications that require the use of brk() and/or sbrk() should refrain from using the other memory allocation functions and should be run with an initial heap size that will satisfy the maximum storage requirements of the program.

The sbrk() function is not supported from a multithreaded environment, it will return in error if it is invoked in this environment.

Note

This function is kept for historical reasons. It was part of the Legacy Feature in Single UNIX Specification, Version 2, but has been withdrawn and is not supported as part of Single UNIX Specification, Version 3. New applications should use malloc() instead of brk() or sbrk().

If it is necessary to continue using this function in an application written for Single UNIX Specification, Version 3, define the feature test macro _UNIX03_WITHDRAWN before including any standard system headers. The macro exposes all interfaces and symbols removed in Single UNIX Specification, Version 3.
Mac OS X reserves 4 MB for sbrk().

So, if you just set aside a relatively small portion of the address space for sbrk(), you should be OK.

Re: Understanding sbrk

Posted: Tue May 26, 2020 7:59 am
by nullplan
OSwhatever wrote:Is this correct that sbrk is actually allowed to return discontinuous memory?
Since both brk() and sbrk() are absent from the current version of POSIX, I have no idea here. I would think not, however, given that most applications calling sbrk() are going to be legacy applications, and they might just assume that sbrk() returns contiguous memory. BTW: calling sbrk() with a negative number is not guaranteed to do anything, and indeed e.g. on Linux, sbrk() cannot shrink the heap area. On Linux, the system call SYS_brk takes one pointer argument and returns a pointer, namely it tries to increase the program break to the parameter, and it will always return the new program break afterwards. You see failure when the return value equals the program break before the call. So no implementation of sbrk() wrapping SYS_brk can have a shrinking heap.

Re: Understanding sbrk

Posted: Tue May 26, 2020 9:15 am
by OSwhatever
nullplan wrote:Since both brk() and sbrk() are absent from the current version of POSIX, I have no idea here. I would think not, however, given that most applications calling sbrk() are going to be legacy applications, and they might just assume that sbrk() returns contiguous memory. BTW: calling sbrk() with a negative number is not guaranteed to do anything, and indeed e.g. on Linux, sbrk() cannot shrink the heap area. On Linux, the system call SYS_brk takes one pointer argument and returns a pointer, namely it tries to increase the program break to the parameter, and it will always return the new program break afterwards. You see failure when the return value equals the program break before the call. So no implementation of sbrk() wrapping SYS_brk can have a shrinking heap.
It's Newlib in particular I'm concerned with here. Newlib is a very popular clib among embedded systems and it is hardly legacy. malloc of newlib relies on sbrk, which makes perfectly sense in a system without any MMU. However, it still relies on sbrk on systems with an MMU including Linux. Kind of strange really since Newlib isn't that old. I read that Newlib allows non-continuous memory fro sbrk, however I haven't any conclusive proof. It seems like it's mostly to allow concurrent calls to sbrk rather than non-continuous memory.

Re: Understanding sbrk

Posted: Tue May 26, 2020 11:59 am
by nullplan
Newlib uses dlmalloc. dlmalloc does not use sbrk() directly, but rather through an abstraction called MORECORE. However, it depends on MORECORE having some rather peculiar characteristics, that pretty much can only be fulfilled by something approximating sbrk(). I have worked with Linux on ARM before, and I know that at least there, mmap() likes to return progressively lower addresses. If this happens with MORECORE, then many things will fail. For instance, malloc_extend_top() expects MORECORE to return an address larger than old_end, which is the end of the heap before that function was called. Leaving aside for the moment the issue of the invalid comparison, if MORECORE were just a wrapper around mmap(), addresses would not necessarily be monotonic. MORECORE(0) is expected to return the current break, and with a negative number it is expected to at least try to reduce the break. If sbrk() returns decreasing addresses, dlmalloc will start using addresses never allocated. If it returns discontinuous, but monotonically increasing addresses, then things might work out.
OSwhatever wrote:Newlib is a very popular clib among embedded systems and it is hardly legacy.
Newlib contains tons of code to support compilers that don't know ANSI C yet. You were saying?
OSwhatever wrote:It seems like it's mostly to allow concurrent calls to sbrk rather than non-continuous memory.
That is also my assessment, having read some of the code.

Re: Understanding sbrk

Posted: Sun May 31, 2020 9:52 pm
by linguofreak
I'm not sure that brk() and sbrk() were ever meant to be called by application code in the first place. Malloc (or the allocator for the language runtime being used) called them on the back end, and applications were supposed to use that allocator.

In any case, brk() and sbrk() can be implemented entirely in userspace if malloc() really cares about them being present (which, from a brief perusal of the code, is what newlib seems to do).