Page 1 of 2

malloc implementation

Posted: Wed Oct 30, 2019 7:41 am
by Thpertic
I’ve looked at the Kernighan and Richie’s implementation of an allocator. I understood how it works, but it uses sbrk() in the function to ask more space.
I’m wondering, since that function is deprecated (because the address space is not contiguous anymore), I can just allocate some memory from my physical memory manager and map it, right?

Another question I have is if there has to be a difference between the allocation in the kernel space and user space (use kmalloc and malloc vs just malloc). Can’t the user just use a syscall? Or you have to differentiate because calling every time a syscall (calling it just to allocate more space maybe) is wasteful?

Re: malloc implementation

Posted: Wed Oct 30, 2019 9:28 am
by nullplan
Thpertic wrote:I’ve looked at the Kernighan and Richie’s implementation of an allocator. I understood how it works, but it uses sbrk() in the function to ask more space.
I’m wondering, since that function is deprecated (because the address space is not contiguous anymore), I can just allocate some memory from my physical memory manager and map it, right?
Don't mix up your allocators. malloc() is there to allocate an arbitrary amount of memory. Very often, that is not actually what you want. You may want to keep like allocations close together (so have one large block of memory to allocate thread descriptors and kernel stacks from, another for device descriptors and whatnot).

But if you do want a malloc() implementation in your kernel, you generally want to allocate your memory contiguously from a certain block. You may even end up emulating sbrk(), by putting aside a block for kmalloc() to use and account for additional memory with a high water mark. Add in a possibility to void large blocks of unused memory, and you're good to go.

Keep in mind that you only need this block to be contiguous in virtual memory. In physical memory it can be all over the place, or not even actually there.
Thpertic wrote:Another question I have is if there has to be a difference between the allocation in the kernel space and user space (use kmalloc and malloc vs just malloc). Can’t the user just use a syscall? Or you have to differentiate because calling every time a syscall (calling it just to allocate more space maybe) is wasteful?
syscalls are necessary for all the things userspace cannot do on its own. However, efficiently dividing memory blocks already allocated from the OS into smaller blocks to be used by the application is something it can do. As such, that syscalls would be needless.

Also, your kmalloc() should allocate kernel memory, and thus return kernel memory pointer, whereas userspace will need user space pointers.

Most OSes only include a possibility to allocate more pages, and leave further division up to userspace. This also allows userspace to switch to other implementations without mandating those implementations for all other processes. Fundamentally, implementing malloc() as a system call is a failure to separate concerns.

Re: malloc implementation

Posted: Thu Oct 31, 2019 1:43 am
by ~
malloc/realloc/free is the very last step in the chain
of implementing basic memory management functions.

Before that, you have to write several functions
and think up the basic paging and physical page operations
you need, which can take you at year if you do it
without any initial help.

Why not start with what others have already developed
to make clear things like paging and physical management?

For example, look at the list of basic operations I have
added to my kernel for paging:

- Bitmap search for physical pages.
- Grouping allocations in physical and virtual pages aligned to those block sizes.
- Allocating contiguous physical pages as cluster at once.
- Modify paging entries.
- Reserving for new page directory entries.
- Zeroing out new page tables for coherence.

Look at my list of functions, and my OS code
in my signature for the implementation
I've working in this year:
http://f.osdev.org/viewtopic.php?f=1&t=34526

Re: malloc implementation

Posted: Thu Oct 31, 2019 2:03 am
by Thpertic
~ wrote:malloc/realloc/free is the very last step in the chain
of implementing basic memory management functions.
I already have a physical memory manager that keeps track of the free pages in a run-length encoded stack and a virtual memory manager that maps and zeroes the pages. I don’t know how to link all together. (The starting address is arbitrary?)
nullplan wrote:Don't mix up your allocators. malloc() is there to allocate an arbitrary amount of memory. Very often, that is not actually what you want. You may want to keep like allocations close together (so have one large block of memory to allocate thread descriptors and kernel stacks from, another for device descriptors and whatnot).
I’ll have to check this out! Maybe when I’ll start multitasking and device drivers (?).
nullplan wrote:But if you do want a malloc() implementation in your kernel, you generally want to allocate your memory contiguously from a certain block. You may even end up emulating sbrk(), by putting aside a block for kmalloc() to use and account for additional memory with a high water mark. Add in a possibility to void large blocks of unused memory, and you're good to go.
Ok, but from what certain block? Do I decide it or maybe at the end of the kernel?

Re: malloc implementation

Posted: Thu Oct 31, 2019 2:30 am
by ~
To come up with a malloc, you need to find free virtual address blocks that are apart enough to allow for reasonable realloc expansion.

You have to write a function that searches such aligned blocks of different sizes as virtual clusters for efficiency.

I also like to group physical pages in contiguous page clusters aligned to 4 pages, so I know that I will never possibly run out of free memory for things like new page directories, emergency system data... with always at least 4 contiguous pages, or nothing.

- Call malloc.
- It has to find free pages, minimum actual allocated size is 4096.
- Once you find a physical page(s), mark as used.
- Create entries in your paging structures that are aligned to something.
- Return the virtual address.
- You have 3 bits of AVL. Values 1 to 7 (1 to 3 actually for 2 or 3 contiguous blocks) can be used to color blocks from different allocations, so if you allocate all CONTIGUOUS page table entries of this allocation with the same AVL value, you will know immediately where each of your blocks are.

Re: malloc implementation

Posted: Thu Oct 31, 2019 3:58 am
by linguofreak
Thpertic wrote:
~ wrote:malloc/realloc/free is the very last step in the chain
of implementing basic memory management functions.
I already have a physical memory manager that keeps track of the free pages in a run-length encoded stack and a virtual memory manager that maps and zeroes the pages. I don’t know how to link all together. (The starting address is arbitrary?)
Basically, your virtual memory manager gets a request for so many pages to be mapped (maybe from user space, maybe from kernel space), and, possibly, an address that the pages should be mapped at, if possible. The virtual memory manager then determines if the requested virtual address (if one was requested) is available, and looks to see if the requester indicated that it absolutely needed that address (if you provide such a flag, which Posix mmap does). If the requester absolutely needs that exact address, and it is not available, the virtual memory manager returns an error. If the address isn't available and the requester did not indicate that it needed exactly the address requested, or if no address was requested, the virtual memory manager decides what virtual address to use. Once it knows what virtual address it's using the virtual memory manager then asks the physical memory manager for the requested number of pages, and the physical memory manager removes that many pages from the stack and returns a linked list (or whatever) containing those pages, and the virtual memory manager writes page table entries for the virtual pages it is allocating, pointing each of the virtual pages at one of the physical pages it received from the physical allocator. Once this is done, it returns a pointer to the mapped region to the caller.

Once you have that working, your kernel allocator(s) and userspace allocator(s) have the infrastructure they need to get more memory once they run out of whatever heap space they're given initially.

As far as userspace allocators go, in the Unix world there's generally a single system-wide libc that almost all applications use (it's still a userspace library, not part of the kernel), with very few exceptions, and it provides malloc(), free(), etc. In the Windows world, it's more commonly the compiler vendor that provides the C library (or, at least, this was the case historically).

Re: malloc implementation

Posted: Thu Oct 31, 2019 10:05 am
by ~
I have functions for specific addresses, virtual and physical ones.

I also have the most common malloc-like functions that just return any suitable virtual address from the first free pages it finds.

Re: malloc implementation

Posted: Fri Nov 01, 2019 8:14 am
by Thpertic
So I need a function to ask for an address (virtual one). I like the idea of the flag to tell if it’s needed that one or it’s ok whatever else. But from what virtual address should I start? 0x0 and going up to 3GB?

Same thing for the malloc, what address would be right?

Re: malloc implementation

Posted: Fri Nov 01, 2019 8:21 am
by ~
Windows seems to have EXEs start around 4MB in the source code (ORG directives), but if you try base address directives to 3, 5, 6, etc. MB, there seems to be a limit.

Re: malloc implementation

Posted: Fri Nov 01, 2019 1:51 pm
by Thpertic
ok, 4MB to allocate a process, but where should I place the kernel heap? The immediate end of the virtual kernel address?

Re: malloc implementation

Posted: Sat Nov 02, 2019 1:34 am
by Octocontrabass
You probably shouldn't take advice from ~ on this or any other subject.

The kernel heap can go wherever you want - it's your kernel, so you pick a decent area in the virtual address space.

Programs must be loaded where the executable headers say they can be loaded. Programs can be position-independent, which means you can put them anywhere in the virtual address space (within reason; there's usually an alignment requirement). If you want, you can require all programs to be position-independent.

When handing out pages to a program that needs memory but doesn't care what address you use, I'd recommend avoiding 0. Some programming languages use 0 to mean an invalid address, so "interesting" things will happen if 0 is actually a valid address. You might choose to just avoid the first megabyte or so unless it's specifically requested. You might choose to prevent allocations at 0 entirely, to prevent programs from making system calls with valid pointers to 0 (there were some interesting Linux kernel bugs related to this).

Re: malloc implementation

Posted: Sun Nov 03, 2019 10:56 am
by Schol-R-LEA
I missed this thread earlier, but looking over it now, I have a few things to add.

First, it may help to separate the general concept of a system call from the specific mechanisms such as syscall, sysenter, soft interrupts, call gates, etc. The key element of a system call is pretty much in the name: something which calls on a system (i.e., kernel space) service. A key aspect of this is that, since the service runs in kernel mode (e.g., kernel protected memory, kernel privilege level, etc.), you generally want to limit the operations of the system call to as close to the minimum which is necessary to run in kernel space as possible, to limit the window of vulnerability - while the kernel ought to be the safest part of the system, bugs do occur, so the less time you spend running with kernel privileges, the better.

Also, switching from user mode to kernel mode is often a costly operation, depending on how the call mechanism is implemented, so as a rule, the fewer calls to the kernel your user applications make, the better. This isn't a hard and fast rule, but few kernel designs avoid this problem. This complements the previous comment, but also opposes it, as it means that you sometimes want to do as much as possible in the kernel before returning to the user application, which makes the whole thing something of a balancing act.

As an aside, note that the standard functions corresponding to POSIX system calls are themselves userspace wrappers for the calls themselves, rather than bare system calls. This is for a number of reasons, most notably to abstract the system call mechanism across different platforms and implementations - the system call mechanisms for the PDP-11 were wholly different from those of the x86, MIPS, ARM, SPARC, RISC-V, etc., which in turn generally don't resemble each other, either. Even systems such as Windows do this with the system libraries, which makes changing the details of the system call mechanism in the future (and portability to new platforms) easier on the OS developers. Note also that some - such as malloc() - are a good deal more than just wrappers, even when they are a direct analogue to the system call.

Also, if you are using per-process virtual memory spaces (which most modern OSes, do unless they are explicitly Single Address Space systems - SASOSes have become somewhat more common now that 64-bit address spaces make running out of virtual memory unlikely in the near future, but for x86 are still the exception) and a higher-half kernel (pretty much universal today, with some ISAs such as MIPS actually making it more or less required), then what the user application sees is going to be a contiguous virtual memory space in lower memory regardless of the physical layout. While memory locality significantly impacts performance (as nullplan mentioned), for most user programs there is no need to know whether a given block of physical memory is contiguous or not - the user process usually only sees the virtual memory layout, and that is up to the application (and the compiler and linker), not the kernel.

It is crucially important not to confuse the user-space memory allocation functions with those in the kernel, too. The kernel memory allocator - that is to say, the one which allocates memory for the kernel itself - has very different requirements and priorities from either the user-space allocator, or the userspace memory allocator which the system calls such as sbrk() operate on. The kernel allocator must come first, and must have priority over any other subordinate allocator, though in practice they are mostly disjoint. Broadly speaking, the kernel allocator works on kernel memory with no real overlap to userspace, the system call works on user-space memory at a coarse-grained level, the user functions manage the user-space memory at a finer-grained level. While the top-level kernel allocator still has to decide the virtual-to-physical mapping of all of the pages, and decides which pages can be used for mapping a given piece of user addressing to, its main concern is with its own memory, period. The kmalloc() within the kernel should be separate both from the VM->PM mapping, and the user-space mapping.

All of this is to say that, regarding the user-space version of malloc(), you generally want to have it do as much of the memory management for a user space process in userspace itself. The K&R implementation is very bare-bones, even for its time, and is meant only as a minimal example of how to make a userspace allocator; in practice, malloc() implementations do rather more local management, especially in implementations which aren't older than most currently practicing programmers.

Note that even in the K&R implementation (I'm looking at the version posted here, but AFAICT it is the one from the 2nd edition and presumably the same one you are looking at) is more than just a wrapper for sbrk(). Their malloc() batches the sbrk() calls as much as possible, allocating a significant chunk of memory (the minimum being NALLOC * sizeof(Align), which assuming a long is 4 bytes, gives us 4KiB - which does happen to match a standard x86 page in p-mode, but more on this later) every time it called to the kernel for more regardless of how much was immediately required. The fact that they actually put the system call in a separate function, morecore(), says a lot - it shows that they wanted to put as much distance from the system call mechanism and the local allocator as they could, for a number of reasons.

Code: Select all

typedef long Align; /* for alignment to long boundary */

union header { /* block header */
   struct {
       union header *ptr; /* next block if on free list */
       unsigned size; /* size of this block */
   } s;
   Align x; /* force alignment of blocks */
};

typedef union header Header;

static Header base; /* empty list to get started */
static Header *freep = NULL; /* start of free list */

/* malloc: general-purpose storage allocator */
void *malloc(unsigned nbytes)
{
   Header *p, *prevp;
   Header *morecore(unsigned);
   unsigned nunits;
   nunits = (nbytes+sizeof(Header)-1)/sizeof(header) + 1;
   if ((prevp = freep) == NULL) { /* no free list yet */
      base.s.ptr = freeptr = prevptr = &base;
      base.s.size = 0;
   }
   for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr) {
      if (p->s.size >= nunits) { /* big enough */
        if (p->s.size == nunits) /* exactly */
           prevp->s.ptr = p->s.ptr;
        else { /* allocate tail end */
           p->s.size -= nunits;
           p += p->s.size;
           p->s.size = nunits
             }
        freep = prevp;
        return (void *)(p+1);
      }
      if (p == freep) /* wrapped around free list */
         if ((p = morecore(nunits)) == NULL)
             return NULL; /* none left */
      }
}

#define NALLOC 1024 /* minimum #units to request */

/* morecore: ask system for more memory */
static Header *morecore(unsigned nu)
{

  char *cp, *sbrk(int);
  Header *up;

  if (nu < NALLOC)
    nu = NALLOC;

  cp = sbrk(nu * sizeof(Header));

  if (cp == (char *) -1) /* no space at all */
    return NULL;

  up = (Header *) cp;
  up->s.size = nu;
  free((void *)(up+1));

  return freep;
}

/* free: put block ap in free list */
void free(void *ap) {
  Header *bp, *p;
  bp = (Header *)ap - 1; /* point to block header */
  for (p = freep; !(bp > p && bp < p->s.ptr); p = p->s.ptr)
    if (p >= p->s.ptr && (bp > p || bp < p->s.ptr))
      break; /* freed block at start or end of arena */
  if (bp + bp->size == p->s.ptr) {
    bp->s.size += p->s.ptr->s.size;
    bp->s.ptr = p->s.ptr->s.ptr;
  } else
      bp->s.ptr = p->s.ptr;

  if (p + p->size == bp) {
    p->s.size += bp->s.size;
    p->s.ptr = bp->s.ptr;
  } else
    p->s.ptr = bp;
  freep = p;
}
Note that, contrary to Tilde's nonsensical blathering, the memory is tracked using a pointer to the start of the allocated memory, and a linked list for previously allocated and then freed blocks, rather than a single array. While a list is not the only representation possible - indeed, a balanced tree or hash table would probably be better, though given the minimalism typical of K&R they probably decided not to bother - but a simple array really wouldn't cut it.

The specific design of the allocator is fairly simple and clever, though it does assume that sbrk() always returns sequential blocks, which is something which Nullplan mentioned earlier.

However, you may find it helpful to track the local memory units separately from the allocated blocks they reside in, especially if the allocation block size is significantly larger than what was expected in this design. Part of the locality issue Nullplan mentioned is being able to allocate groups of related but disjoint variables (say, a group of list nodes) in a single block of memory to make freeing that node back to the OS faster at some future point. This doesn't come up often, and malloc() itself is a bit too simplistic to facilitate this, but it is something which a more sophisticated userspace allocator may need to allow.

As a final note, I should mention that the K&R code is meant to be a generic skeleton, and there are some parts of this that don't necessarily fit well with specific platforms such as the x86. For example, the size for x86 pages in 32-bit protected mode can be either 4KiB or 4MiB, depending on the page table flags. Since the size of the allocation blocks is defined as NALLOC * sizeof(Align), this only works for smaller page size, and even then, only if long is fixed at 32 bits. However, the C standard doesn't fix long at all - it can vary from system to system or even compiler to compiler, being defined as being "equal or greater than the defined size of int", roughly meaning it can be anything from 16 bits up. If the defined size of long doesn't match the size of a system alignment (as may be the case for a 64-bit processor if a long is defined as 32 bits, for example), or the combination of the alignment size and the defined value of NALLOC doesn't match the page size, this could cause problems. Thus, even if you're using the K&R code as a base for your userspace allocator, you'll need to tailor it a bit to fit the system - which may mean allowing some things which are hardcoded in the example code to be dynamically redefined in your actual library function.

Re: malloc implementation

Posted: Tue Nov 05, 2019 5:26 pm
by Thpertic
First of all, thank you for the answer!
Schol-R-LEA wrote:A key aspect of this is that, since the service runs in kernel mode (e.g., kernel protected memory, kernel privilege level, etc.), you generally want to limit the operations of the system call to as close to the minimum which is necessary to run in kernel space as possible, to limit the window of vulnerability - while the kernel ought to be the safest part of the system, bugs do occur, so the less time you spend running with kernel privileges, the better.

Also, switching from user mode to kernel mode is often a costly operation
Yeah sure, that was exactly what I thought.
Schol-R-LEA wrote:Also, if you are using per-process virtual memory spaces (...) and a higher-half kernel (...), then what the user application sees is going to be a contiguous virtual memory space in lower memory regardless of the physical layout. While memory locality significantly impacts performance (as nullplan mentioned), for most user programs there is no need to know whether a given block of physical memory is contiguous or not - the user process usually only sees the virtual memory layout, and that is up to the application (and the compiler and linker), not the kernel.
Yes, I have that memory format and the kernel is located in the higher half; so the process could see from 0x1000 to 0xC0000000 for him while the rest for the kernel, right?
Schol-R-LEA wrote:It is crucially important not to confuse the user-space memory allocation functions with those in the kernel, too. The kernel memory allocator - that is to say, the one which allocates memory for the kernel itself - has very different requirements and priorities from either the user-space allocator, or the userspace memory allocator which the system calls such as sbrk() operate on. The kernel allocator must come first, and must have priority over any other subordinate allocator, though in practice they are mostly disjoint. Broadly speaking, the kernel allocator works on kernel memory with no real overlap to userspace, the system call works on user-space memory at a coarse-grained level, the user functions manage the user-space memory at a finer-grained level. While the top-level kernel allocator still has to decide the virtual-to-physical mapping of all of the pages, and decides which pages can be used for mapping a given piece of user addressing to, its main concern is with its own memory, period. The kmalloc() within the kernel should be separate both from the VM->PM mapping, and the user-space mapping.

All of this is to say that, regarding the user-space version of malloc(), you generally want to have it do as much of the memory management for a user space process in userspace itself.
Ehm ok... the kernel must have the priority and kmalloc() and malloc() have to be different as the first has to work for the kernel but be prepared to map user space (didn’t really got this one) too while the second has just to map the user space. Could you please give an example?
Schol-R-LEA wrote:The K&R implementation is very bare-bones, even for its time, and is meant only as a minimal example of how to make a userspace allocator; in practice, malloc() implementations do rather more local management, especially in implementations which aren't older than most currently practicing programmers.

Note that even in the K&R implementation (I'm looking at the version posted here, but AFAICT it is the one from the 2nd edition and presumably the same one you are looking at) is more than just a wrapper for sbrk(). Their malloc() batches the sbrk() calls as much as possible, allocating a significant chunk of memory (the minimum being NALLOC * sizeof(Align), which assuming a long is 4 bytes, gives us 4KiB - which does happen to match a standard x86 page in p-mode, but more on this later) every time it called to the kernel for more regardless of how much was immediately required. The fact that they actually put the system call in a separate function, morecore(), says a lot - it shows that they wanted to put as much distance from the system call mechanism and the local allocator as they could, for a number of reasons.
Yes, I am looking at this version and I found it really interesting mostly because it’s pretty straightforward and easy to understand. I couldn’t really find any other malloc() implementation that didn’t use sbrk() and the skeleton of this example so I stuck with it.
I thought that it only needed an allocator that starting from an address (for example at the end of the kernel) it keeps mapping there as an heap should do.
Schol-R-LEA wrote:However, you may find it helpful to track the local memory units separately from the allocated blocks they reside in, especially if the allocation block size is significantly larger than what was expected in this design. Part of the locality issue Nullplan mentioned is being able to allocate groups of related but disjoint variables (say, a group of list nodes) in a single block of memory to make freeing that node back to the OS faster at some future point. This doesn't come up often, and malloc() itself is a bit too simplistic to facilitate this, but it is something which a more sophisticated userspace allocator may need to allow.
I didn’t understand this too much so I think I leave it for a later implementation...
Schol-R-LEA wrote:As a final note, I should mention that the K&R code is meant to be a generic skeleton, and there are some parts of this that don't necessarily fit well with specific platforms such as the x86. For example, the size for x86 pages in 32-bit protected mode can be either 4KiB or 4MiB, depending on the page table flags. Since the size of the allocation blocks is defined as NALLOC * sizeof(Align), this only works for smaller page size, and even then, only if long is fixed at 32 bits. However, the C standard doesn't fix long at all - it can vary from system to system or even compiler to compiler, being defined as being "equal or greater than the defined size of int", roughly meaning it can be anything from 16 bits up. If the defined size of long doesn't match the size of a system alignment (as may be the case for a 64-bit processor if a long is defined as 32 bits, for example), or the combination of the alignment size and the defined value of NALLOC doesn't match the page size, this could cause problems. Thus, even if you're using the K&R code as a base for your userspace allocator, you'll need to tailor it a bit to fit the system - which may mean allowing some things which are hardcoded in the example code to be dynamically redefined in your actual library function.
I’ll try to adjust that implementation to fit my OS.
I’m thinking of starting the kernel heap from the end of the kernel (obtained from the linker) (kmalloc() implementation) and in the future start a process from 1MB (malloc() implementation).

Re: malloc implementation

Posted: Wed Nov 06, 2019 11:07 am
by ~
https://danluu.com/malloc-tutorial/

The tutorial says that you can save from fragmentation if you consider joined contiguous free blocks (if you can reuse, do, and if you can't, allocate more). But if there are several reserved blocks interleaving free and smaller blocks than what you need from there on, those blocks will remain unused.

Wouldn't it be better to allocate as much memory in a single malloc as the program will need while it runs and assign an address from this block to the rest of dynamic variables for fragmentation?

Re: malloc implementation

Posted: Wed Nov 06, 2019 11:59 am
by Schol-R-LEA
TL;DR: You don't need to worry about malloc() in your OS dev project, because malloc() isn't part of the OS in the first place. You won't need it until you start writing applications, and by the time you do, you'll probably have started porting GNU LibC to your OS and you'll just use the one that comes with that for most purposes. There's no reason whatsoever for you to write your own general-purpose malloc() unless you're re-implementing the whole C library, at which point you have bigger worries than just that.

Also, ignore everything ~ says.

Now on to the overly-long version.
Thpertic wrote:
Schol-R-LEA wrote:It is crucially important not to confuse the user-space memory allocation functions with those in the kernel, too. The kernel memory allocator - that is to say, the one which allocates memory for the kernel itself - has very different requirements and priorities from either the user-space allocator, or the userspace memory allocator which the system calls such as sbrk() operate on. The kernel allocator must come first, and must have priority over any other subordinate allocator, though in practice they are mostly disjoint. Broadly speaking, the kernel allocator works on kernel memory with no real overlap to userspace, the system call works on user-space memory at a coarse-grained level, the user functions manage the user-space memory at a finer-grained level. While the top-level kernel allocator still has to decide the virtual-to-physical mapping of all of the pages, and decides which pages can be used for mapping a given piece of user addressing to, its main concern is with its own memory, period. The kmalloc() within the kernel should be separate both from the VM->PM mapping, and the user-space mapping.

All of this is to say that, regarding the user-space version of malloc(), you generally want to have it do as much of the memory management for a user space process in userspace itself.
Ehm ok... the kernel must have the priority and kmalloc() and malloc() have to be different as the first has to work for the kernel but be prepared to map user space (didn’t really got this one) too while the second has just to map the user space. Could you please give an example?
OK, I wasn't very clear, I think. I should have said 'system memory manager' rather than 'kernel allocator', as it is much more than just kmalloc() (or whichever function or functions which allocate memory for the kernel's own use in your design - even for a POSIX implementation, you aren't obligated to use the Unix design or naming conventions for the internal parts of the kernel). The system memory manager isn't necessarily a single unit so much as a system-wide allocation policy and the collection of functions which implement it, covering all system-wide memory management including the vm->pm mapping and where to page in and page out.

The thing is, the user-space functions malloc() and free() aren't part of that at all - malloc() and related allocation functions such as calloc() make requests to the kernel from memory (sbrk() in most conventional POSIX implementations, or at least older ones which are more readily available), but they are purely library functions, not kernel operations. They are not part of the OS at all, and you only need them when implementing the C standard library's memory management - they are orthogonal to the OS itself, and the memory management which they do is only inside the user process, for that user process itself.

Technically speaking, your system doesn't need [m|c|r]alloc()/free() at all, if you only have user applications written in languages other than C (where the libraries are not themselves implemented in C either), or you only allow static memory allocation in user processes. This scenario is pretty absurd for a POSIX system, admittedly, but the point is that they aren't OS functions at all. You don't need to even think about them until you are porting the C library for your userspace applications.

Regarding sbrk(), note that there is the system call referred to as SBRK in the formal Unix/POSIX documentation, and the sbrk() library function, which is a thin wrapper around the system call. Most of the time we are talking about both at once, since the functions (or in some implementations, macros) are basically the same as the system calls regardless of the actual system call mechanism, but it is worth bearing in the back of your mind that what your are seeing with the sbrk() isn't the system call as such.

Now, the BRK and SBRK system calls only matter if you are strictly implementing POSIX at the OS level - sbrk() is purely Unix thing and you have no obligation to implement it at all unless you are building a Unix. If you are writing a new OS which isn't a POSIX implementation on the Application Binary Interface (ABI) level, but you intent to provide a POSIX compatibility layer, your own kernel interface can work however you want it to be so long as you provide a set of wrappers (whether as alternative system calls, or more realistically, as library functions) which behave the same as the POSIX Application Programmer Interface (API) - which might include a sbrk() library function, or not, since (as you noted yourself) sbrk() has been deprecated on most major current Unices (Linux, FreeBSD, MacOS) for some time now with mmap() being the modern equivalent since the late 1980s.

Note that strictly speaking, the purpose of mmap() isn't memory allocation, but demand paging - it maps a disk file into virtual memory - but because allocation is a part of this, IIUC it has become the basis for memory allocation strategies on several POSIX systems.

Why? Because BRK and SBRK are too simplistic, and - among other things - provide no means for returning memory to the OS during the lifetime of the process (technically, BRK sort of does, but it's never used that way AFAICT). With the newer design, the companion function munmap() allows a process to return memory to the system. mmap() also allows the process to request certain memory properties or the allocated memory (e.g., read-only, shared with other processes, etc.), and there are also three additional system calls, mprotect(), msync(), and madvise() for more fine-grained manipulation of memory which was already mapped.

There are some aspects of mmap() which are problematic for a simple implementation, as well, regarding whether successive pages are guaranteed to be contiguous or not, but it sounds as if they can be worked around.

(Comments and corrections welcome; POSIX hasn't exactly been a major concern for me, since my implementation language and OS design plans do not resemble C and Unix in the slightest.)
Thpertic wrote:
Schol-R-LEA wrote:The K&R implementation is very bare-bones, even for its time, and is meant only as a minimal example of how to make a userspace allocator; [...]
Yes, I am looking at this version and I found it really interesting mostly because it’s pretty straightforward and easy to understand. I couldn’t really find any other malloc() implementation that didn’t use sbrk() and the skeleton of this example so I stuck with it.
I thought that it only needed an allocator that starting from an address (for example at the end of the kernel) it keeps mapping there as an heap should do.
OK, that's an understandable problem, as most tutorials do still use sbrk() for the sake of simplicity (and also because most of them are out of date, and were based on information which was already out of date to begin with).

While we normally don't suggest this (since production-grade code tends to be a mess, or at least hard to follow as it does a lot of things which are confusing but pragmatic), if you really want to write a modern malloc(), you'll want to look at more robust implementations such as the one which comes with glibc, or even at some of the high-performance versions such as DLmalloc() (a somewhat dated overview of which can be found here, with additional material for it on Doug Lea's home page), JEmalloc() (which is the FreeBSD version), or TCMalloc(). Don't expect these to be readily understandable, however, especially the last three, as they are Heavy Wizardry and do some things which are hard to follow without a spending a few hours studying them. Also, the high-performance ones are more specialized, and may not be ideal for all uses.

And of course, any given C program can have a handrolled memory allocator - or even a conservative garbage collector, such as the Boehm Collector - rather than using the standard functions, and implementations of other languages likewise, but that's going a bit off-topic.

Some additional theory can be found here, to clarify the whole behavior of memory in a running program.

So why do all of these implementations of malloc() which you see in tutorials use sbrk()? Because the writers were lazy, and didn't want to have to clutter up the tutorial with the more complex interface provide by MMAP and MPROTECT. To repeat what the Wiki says over and over again: There are no true tutorials for OS development, and the ones which are out there claiming to be are critically flawed. They can be a guide to OS development, and help you understand what needs to be done, but in a very real sense, once you have your development and testing platforms set up and can start designing your OS, you are on your own. We can give you advice, and the wiki can provide information, but the OS is yours.