OSDev.org

Posted: **Sat May 22, 2021 9:45 pm**

So, I believe that I have succeeded in making just about the worst initializer for the physical frame allocator (and by extension, one of the worst allocators ever). I figure if anyone wants to die a little inside, I'll go ahead and show what I did.

Basically, I used a bitmap that mapped all 4k blocks in the entire 4GB address space. Any unusable memory was considered to be allocated, and any gaps were considered to be allocated. This whole bitmap was more 131KB in size.

This is the initializer code:

Code: Select all

int init_memory_map(multiboot_info_t *mbd, uint32_t page_size) {

	/* We only actually use the mmap_t here, and so we're going
	 * to go ahead and define it within the function
	 */
	struct mmap_t {
		uint32_t size;
		uint64_t base;
		uint64_t len;
		uint32_t type;
	} __attribute((packed));

	/* Make sure that page size is a power of 2 */
	if ((page_size == 0) || ((page_size & (page_size - 1)) != 0)) {
		printf("Page size must be a power of 2!\n");
		return 0;
	}

	unsigned char check_flag = (1 << 0) & mbd->flags;
	
	// Check flags to make sure memory mapping is sane
	if (!check_flag) {
		printf("Flag is not set at bit 0!\n");
		return 0;
	}

	check_flag = (1 << 6) & mbd->flags;

	if (!check_flag) {
		printf("Flag is not set at bit 6!\n");
		return 0;
	}

	// Get the address and length to set up
	struct mmap_t *mmap_addr = (struct mmap_t *) mbd->mmap_addr;
	uint32_t arr_size = (uint64_t) mbd->mmap_length / sizeof(struct mmap_t);

	/* Make a copy. Since it isn't very big for an x86 system, we'll have
	 * it the stack. That way it'll get destroyed when we finish initializing
	 * the physical memory and we also don't need to worry about clobbering it
	 * when we do our bitmap
	 */
	struct mmap_t entries[arr_size];

	memcpy(entries, mmap_addr, mbd->mmap_length);

	/* The exponent of the page_size as a power of 2 */
	uint32_t page_power = 0;
	uint32_t page_size_copy = page_size;

	for (page_power; page_size_copy; page_power++) {
		/* This ensures that page_power does not end
		 * up one greater than it should be
		 */
		if ((page_size_copy >> 1) == 0) {
			break;
		}
		page_size_copy >>= 1;
	}

	/* Get the base + length to determine total "usable"
	 * ram
	 */
	uint64_t last_address = entries[arr_size-1].base +
	       			entries[arr_size-1].len;

	// Get the number of pages
	uint32_t blocks = last_address / (1 << page_power); 

	// Get the number of bytes for the bitmap, each bit is 1 page
	bitmap_size = blocks / 8;
	uint64_t end = 0;
	uint64_t top_check = 0;
	uint64_t bottom_check = 0;
	uint64_t sb_addr = (uint64_t) &stack_bottom;
	uint64_t st_addr = (uint64_t) &stack_top;
	uint64_t ke_addr = (uint64_t) &kernel_end;


	/* Find available memory after the end of the kernel
	 *
	 * After that, align to the next nearest page boundary (unless the
	 * base is already aligned to a page boundary), and then use that
	 * to store the bitmap.
	 */	
	for (uint32_t i = 0; i < arr_size; ++i) {
		end = entries[i].base + entries[i].len;

		/* In order to check the space, we are going to check
		 * to see if end-bitmap_size is aligned on a page boundary.
		 * If it is not, then we are going to round down
		 * to the nearest page boundary. If it is, we are going
		 * to use it as is. That way, we can ensure that we
		 * are actually using pages to allocate memory for the
		 * bitmap
		 */
		top_check = (((end-bitmap_size) % page_size) != 0) ?
			((end-bitmap_size) & ~(page_size-1)) :
			(end-bitmap_size);

		/* Additionally, we are going to check to see if the aligned base
		 * can fit into memory. This time we are rounding up
		 */
		bottom_check = ((entries[i].base % page_size) != 0) ?
			((entries[i].base & ~(page_size-1)) + page_size) :
			(entries[i].base);
		bottom_check += bitmap_size;

		if (entries[i].type != MULTIBOOT_MEMORY_AVAILABLE) {
			continue;
		}

		/* We found some space between the stack and the kernel */
		if (st_addr < (1 << 20) && 
			top_check > st_addr && 
			end < (1 << 20)) {

		       phys_mem_bitmap = top_check;
		       break;
		}

		/* We found memory below the stack */
		else if (sb_addr < (1 << 20) && 
			bottom_check < sb_addr) {
			phys_mem_bitmap = bottom_check - bitmap_size;
			break;
		}

		/* The stack is above the kernel, but we did manage to find
		 * some memory that is below the kernel and the stack
		 *
		 * In this case, I don't think that it actually matters if
		 * we use the top range or bottom range for the bitmap
		 */
		else if (sb_addr > (1 << 20) &&
				bottom_check < sb_addr &&
				bottom_check < (1 << 20)) {
			phys_mem_bitmap = bottom_check - bitmap_size;
			break;
		}

		/* The bottom of the stack is at a higher address than the
		 * kernel, and we found some memory between the kernel and the
		 * stack bottom
		 */
		if (sb_addr > (1 << 20) &&
			bottom_check - bitmap_size > ke_addr &&
			bottom_check < sb_addr) {
			phys_mem_bitmap = bottom_check - bitmap_size;
			break;
		}
		
		/* The bottom of the stack is at a higher address than the kernel
		 * and we found some memory after the top of the stack.
		 */
		else if (sb_addr > (1 << 20) && 
			st_addr < top_check) {
			phys_mem_bitmap = top_check;
			break;
		}

		/* We went through all of the memory, but didn't find
		 * anything we liked. Time to give the kernel a panic.
		 */
		if (i == (arr_size - 1)) {
			printf("Failed to find memory for the bitmap!\n");
			return 0;
		}
	}

	// Set all bits to 1
	memset((uint8_t *) phys_mem_bitmap, 0xFF, bitmap_size);

 	uint32_t ndx = 0;
	uint64_t base = 0;
	uint64_t limit = 0;
	uint64_t address = 0;

	/* TODO: Find a faster way to do this */	
	for (uint32_t i = 0; i < blocks; ++i) {
		/* Get the address and determine if it
		 * is inside of a mmap_t structure
		 */
		address = i*(1 << page_power);
		base = entries[ndx].base;
		limit = entries[ndx].len;

		/* This means that there is a gap between
		 * two different mmap_t entries.
		 */ 
		if (address < base) {
			// The bit has already been set
			printf_serial("Bit has already been set... continuing\n");
			continue;
		}

		/* This means that the address is within the physical memory bitmap */
		if (address >= phys_mem_bitmap && 
			address <= phys_mem_bitmap + bitmap_size) {
			printf_serial("The bit is within the bitmap... continuing\n");
			continue;
		}

		// Clear the bit if the memory is available
		if (entries[ndx].type == MULTIBOOT_MEMORY_AVAILABLE) {
			((uint32_t *) phys_mem_bitmap)[i/8] &= ~(1 << (i%8));
		}

		// Ignore if the memory is not specifically available

		/* Determine if the next address is within the 
		 * mmap struct
		 */
		if ((address+(1 << page_power)) > (base+limit)) {
			++ndx;
		}
	}
	
	return 1;
}

Basically, it tries to find available memory for the bitmap, avoiding the kernel and stack regions, and then sets all of the bits in the range to 0xFF. After that, it goes through the memory map and then figures out which bits are in the available memory regions, not part of the bitmap, and then sets those bits to zero. Ideally, it should also be checking for stack and kernel regions, though. The reason I had the phys_mem_bitmap as a uint64_t is because I figured that eventually I'll want to port this kernel to the x86_64. I figured that if it is running on an x86, the addresses will obviously never exceed the 4GB range, and so I'm good to truncate half of it when I treat it as a pointer. Ideally, I'll have those settled in the arch directory (i386, 86-64), though.

It literally took like two or three minutes just for it to go through the whole bitmap.

Now that I've made something so unspeakable, it's time to redo it. I figured do it once poorly just to get a feel for it, and then do it better a second time. I'm not sure exactly what data structure I'm going to use next time around. I could save space by having three or four structs that each contain a bitmap instead. Each one would then cover a region of usable memory (since there are large chunks that just plain aren't usable). I considered doing the buddy system, but based on everything I've read about it, it's extremely circular between using it as an allocator and needing an allocator for it.

Posted: **Sat May 22, 2021 11:16 pm**

Hi,

We can suggest possibly looking into a freestack for the frame allocator: it takes about 4 or 8 bytes of storage only, constant time execution, easily supports coloring, and easy to implement (its literally just storing the top of the stack in a global and implementing a set of free lists in the frames themselves.) However... it can be tricky to implement with paging - it requires careful interaction between the frame allocator and page allocator. Please only consider this as a possible suggestion only -- there are lots of ways to go about frame management and a bitmap would suffice. I am biased here given this is what we use.

Each one would then cover a region of usable memory

This is the basic idea behind a zone allocator: you can allocate from selected regions (specifically for legacy DMA mapping) or upper regions as well as give select properties to the regions or even use different types of allocators depending on the region being allocated from.

Posted: **Sun May 23, 2021 8:58 am**

As basic allocator, I can recommend a memblock allocator.

Basic idea is this: You have a list of the available memory regions, and a list of the reserved memory regions.

Code: Select all

struct memblock {
  phys_addr_t start;
  size_t len;
};

struct memblock memavail[32];
struct memblock memreserved[64];
size_t navail;
size_t nreserved;

The array sizes are arbitrary, but ought to be sufficient for a start. If you run out of memory in these, you may need to look into extending the scheme.

The intervals are kept in order and non-overlapping. So, the two basic operations are adding and removing memory ranges. With the preceding properties, these are not so hard:

Code: Select all

static void add_range_to_memblock(struct memblock *arr, size_t *n, size_t limit, phys_addr_t start, size_t len) {
  size_t i;
  for (i = 0; i < *n; i++)
    if (arr[i].start > start)
      break;

  /* can we just add this to the preceding block? */
  if (i && arr[i-1].start + arr[i-1].len >= start) {
    size_t overlap = arr[i-1].start + arr[i-1].len - start;
    arr[i-1].len += len - overlap;
/* is it possible to unite this with the next block now? */
    if (i < *n && arr[i-1].start + arr[i-1].len == arr[i].start) {
      arr[i-1].len += arr[i].len;
      if (i + 1 < *n)
        memmove(arr + i, arr + i + 1, (*n - i - 1) * sizeof (struct memblock));
      --*n;
    }
  }
  /* can we just add this to the following block? */
  else if (i < *n && start + len >= arr[i].start) {
    size_t overlap = start + len - arr[i].start;
    arr[i].start = start;
    arr[i].len += len - overlap;
  }
  /* need to add a new block */
  else if (*n < limit) {
    if (i < *n)
      memmove(arr + i + 1, arr + i, (*n - i - 1) * sizeof (struct memblock));
    arr[i].start = start;
    arr[i].len = len;
    ++*n;
  }
  else
    die("Out of memory adding block at %x", start);
}

static void remove_range_from_memblock(struct memblock *arr, size_t *n, size_t limit, phys_addr_t start, size_t len)
{
  size_t i;
  for (i = 0; i < *n; i++)
    if (arr[i].start >= start)
      break;
  /* Both ends match? Remove entire block */
  if (i < *n && arr[i].start== start && arr[i].len == len) {
    if (i + 1 < *n)
      memmove(arr + i, arr + i + 1, (*n - i - 1) * sizeof (struct memblock));
    --*n;
  }
  /* start matches? Adjust start and len */
  else if (i < *n && arr[i].start == start) {
    arr[i].start += len;
    arr[i].len -= len;
  }
  /* end matches? Adjust len */
  else if (i < *n && arr[i].start + arr[i].len == start + len) {
    arr[i].len -= len;
  }
  /* central match? Adjust list for hole in the middle */
  else if (i < *n && start + len < arr[i].start + arr[i].len) {
    if (*n == limit) die("Out of memory adding hole at %X", start);
    if (i + 1 < *n)
      memmove(arr + i + 1, arr + i, (*n - i - 1) * sizeof (struct memblock));
    size_t offset = start - arr[i].start;
    arr[i+1].start = arr[i].start + offset + len;
    arr[i+1].len = arr[i].len - len - offset;
    arr[i].len = offset;
  }
}

If memory keeps getting exhausted, it is pretty simple to extend the scheme with a linked list. Just remember that you can only allocate a new list node after the kernel has been successfully exempted from allocation, though that should be pretty simple.

In any case, in initialization you now add all the memory given by the memory map as usable RAM to the available memblock, and all the memory given as reserved, as well as the location of the kernel, your modules (if any), and the stack to the "reserved" memblock. Afterwards the allocator is ready.

Allocation then means iterating over the available memory on one hand and the holes between the reserved memory on the other hand, in order to find a hole that fulfills the request. Pretty simple:

Code: Select all

static phys_addr_t *alloc(size_t sz, size_t algn)
{
    size_t avidx, residx = 0;
    for (avidx = 0; avidx < navail; avidx++)
    {
        phys_addr_t start;
        size_t len;
        while (memreserved[residx].start + memreserved[residx].len < memavail[avidx].start && residx < nreserved)
            residx++;
        while (residx < nreserved && memreserved[residx].start < memavail[avidx].start + memavail[avidx].len)
        {
            start = memavail[avidx].start;
            if (memreserved[residx].start + memreserved[residx].len > start)
                start = memreserved[residx].start + memreserved[residx].len;
            len = memavail[avidx].start + memavail[avidx].len - start;
            if (residx < nreserved && memreserved[residx+1].start - start < len)
                len = memreserved[residx+1].start - start;
            if (((start + algn - 1) & -algn) + sz <= start + len)
            {
                start = (start + algn - 1) & -algn;
                add_reserved_range(start, sz);
                dbg_printf("Allocating address %8x\n", start);
                return start;
            }
        }
    }
    return 0;
}

There you go, there's your basic physical address allocator. It returns whatever address is still free, and you can do whatever you want with it. It does not scale very well, so you should avoid making many small requests. Indeed, a kmalloc() implementation or anything of the sort ought to be a few layers beyond this one. Linux uses an allocator not unlike this one for bootstrapping, until the main allocator can be used. Maybe that is a strategy you want to look into. But in any case, this allocator can be initialized in a time linear to the number of memory areas, not their size.

Posted: **Sun May 23, 2021 1:37 pm**

I think what I will do is the memory zone idea that was mentioned. Basically, a struct that has the start of the memory region, the length of the memory region, and then a bitmap that splits that region into pages (I may also add the type of memory). For each available memory region that is provided by the multiboot memory map, I will create one of these zones. This isn't necessarily the best way to do a memory map (both of you have mentioned very good ways), but once I have something down and a good physical memory allocator, I can eventually create a kernel malloc and then play around with different physical memory managers internally.

Posted: **Sat May 29, 2021 1:34 pm**

You said that you assume the entire address space... as in consider all of it to be RAM?

Well, most machines have at least 4GB of RAM, but that is besides the point. You have MMIO and whatnot, so assuming the entire address space is RAM is a bit unsafe.

The page frame allocator should only be covering RAM.

Now, onto the next point.

Bitmaps are perfectly fine with a few optimizations. If you inline the get/set functions, and make them branchless, that will definitely be far faster than per-call branched conditional functions, and this is important considering the get/set functions will be called very very often.

I personally use a bitmap for my page allocator, and I am content with it.

Another optimization is to create an index. I store an index into the bitmap that points to the first free page. Considering that most free pages are in huge contiguous chunks, this is a very fast improvement. When you call to allocate, you check the value at that index. If it's unset, keep on and allocate that page. If it isn't, call your function to reindex and search the bitmap for a free page.

Using an index makes it so instead of always searching the bitmap for a free page when you allocate, you only, in rare occasions, have to actually search the bitmap.

The importance of a fast page frame allocator is high considering that your page frame allocator will be in constant use throughout the runtime of your kernel.

Posted: **Tue Jun 01, 2021 9:06 pm**

rizxt wrote:You said that you assume the entire address space... as in consider all of it to be RAM?

Well, most machines have at least 4GB of RAM, but that is besides the point. You have MMIO and whatnot, so assuming the entire address space is RAM is a bit unsafe.

The page frame allocator should only be covering RAM.

Now, onto the next point.

Bitmaps are perfectly fine with a few optimizations. If you inline the get/set functions, and make them branchless, that will definitely be far faster than per-call branched conditional functions, and this is important considering the get/set functions will be called very very often.

I personally use a bitmap for my page allocator, and I am content with it.

Another optimization is to create an index. I store an index into the bitmap that points to the first free page. Considering that most free pages are in huge contiguous chunks, this is a very fast improvement. When you call to allocate, you check the value at that index. If it's unset, keep on and allocate that page. If it isn't, call your function to reindex and search the bitmap for a free page.

Using an index makes it so instead of always searching the bitmap for a free page when you allocate, you only, in rare occasions, have to actually search the bitmap.

The importance of a fast page frame allocator is high considering that your page frame allocator will be in constant use throughout the runtime of your kernel.

I knew right away that handling the full 4 GB of RAM (right now, this is only x86) as usable RAM was a bad idea. Although, it was more that my intention was to do it poorly the first time, and then do it better having acquired a little more experience. At this point, I rely on the memory map provided by the multiboot spec in order to create data structures for a new memory map that focus exclusively on the available memory (granted, the multiboot spec makes a point to see that it isn't always right, but until I make my own bootloader, I'm going to rely on that).

The thing that makes the page allocator so difficult is that it seems that (at least with x86), it actually matters where you allocating your memory. Apparently the Linux x86 kernel uses the first 16 MiB for ISA drivers as much as possible, and focuses on normal memory first and then high memory. I don't know yet if priority in the memory location matters in other architectures such as x86_64 or ARM. I've been trying to find a way to make the physical memory manager as architecture independent as I can, so that I only have to reimplement certain parts of it for each architecture. However, looking at the wiki page for ARM, it is starting to sound like a physical memory manager might need to be tailored to each architecture.

What do you do about prioritizing certain memory regions in allocation (such as 16 MiB for ISA drivers specifically)?

Posted: **Fri Jun 04, 2021 12:30 am**

Hi,

We started documenting how we do memory management if interested here. Its a bit complicated & being worked on, but the summary is this: we have Page Frame Number (PFN) allocation only for Standard Memory. Low Memory has a separate region-based allocator via a VAD tree. So when we think of "physical memory allocator" what we refer to is the PFN allocator which is only for Standard Memory. Low Memory isn't allocated but rather are reserved regions of the physical address space for use by certain legacy drivers and mapped directly into the current virtual address space.

In summary though, I think its best to use a separate scheme for allocating Low Memory to satisfy alignment and contiguousy issues.

Posted: **Sat Jun 05, 2021 8:51 pm**

neon wrote:Hi,

We started documenting how we do memory management if interested here. Its a bit complicated & being worked on, but the summary is this: we have Page Frame Number (PFN) allocation only for Standard Memory. Low Memory has a separate region-based allocator via a VAD tree. So when we think of "physical memory allocator" what we refer to is the PFN allocator which is only for Standard Memory. Low Memory isn't allocated but rather are reserved regions of the physical address space for use by certain legacy drivers and mapped directly into the current virtual address space.

In summary though, I think its best to use a separate scheme for allocating Low Memory to satisfy alignment and contiguousy issues.

Having a distinction between standard memory and low memory is probably a lot simpler. It's only about 16 MiB, and so there probably isn't going to be much that'll be missed if you specially reserve it for ISA drivers and stuff. I'll definitely take a look at the link.

Posted: **Sat Jun 05, 2021 10:38 pm**

My physical memory allocator (that is used in the main kernel, not the one I posted from the bootloader shim) tries to return the highest possible address. My reasoning is that this way, devices that can handle 64-bit addresses will get memory beyond the 4GB mark by preference, leaving the lower memory to the devices that cannot handle those. But it doesn't make it impossible to give lower memory to a high-mem capable allocation. I know of no case in which it is bad to return a small address to something capable of handling large addresses.

As for ISA DMA, my OS is not currently able to handle it, although it would only require a small change to the allocator. But I have no ISA drivers in my OS, since it is a 64-bit OS, and therefore exceedingly unlikely to be used with ISA hardware. In any case, I wouldn't reserve the entire 16 MB for something used so rarely.

OSDev.org

Worst way to initialize physical frame allocator

Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator

Re: Worst way to initialize physical frame allocator