The general workflow to enable memory management

sunnysideup · Post by **sunnysideup** » Sat Feb 08, 2020 12:03 pm

Well, I'm writing a hobby OS (from scratch). I have written a bootloader which can load my kernel into memory. I have developed some printing routines, and built a dedicated cross compiler and linker.

I was advised to implement memory management next (on the osdev IRC channel). I was advised to do this before even setting up interrupts and handling the keyboard.
I use this as a resource : http://www.brokenthorn.com/Resources/OSDev17.html.

The author sets up a physical memory manager and then moves on to virtual memory...

Is this the way it's traditionally done? (The resource was written in ~2008, it's 2020 now

)

So what is the general workflow? Any good resources?

Octocontrabass · Post by **Octocontrabass** » Sun Feb 09, 2020 5:34 am

Yep, that's the typical design, and it's been that way for a lot longer than 12 years. Everything in your OS uses memory, after all.

The tutorial you're following looks pretty good, but keep in mind most tutorials were written while the author learns OS development, so there can be mistakes or bad advice in them. For example, we typically recommend using GRUB instead of writing your own bootloader since writing a good bootloader takes a lot of work and will distract you from writing the rest of your OS. (But if you think writing a bootloader is fun, then by all means, write a bootloader.)

For other resources, we have a category on the wiki for memory management. Most of those pages have links to even more information.

sunnysideup · Post by **sunnysideup** » Sun Feb 09, 2020 5:46 am

Well, I wrote my own bootloader too

... Nothing fancy, switches into pm and loads the kernel at 1M physical memory... I'm just looking to incorporate memory management.

After that's done, I'll work on interrupt handling and keyboard functionality. Does that seem reasonable?

bzt · Post by **bzt** » Sun Feb 09, 2020 8:36 am

Hi,

sunnysideup wrote:Well, I wrote my own bootloader too ... Nothing fancy, switches into pm and loads the kernel at 1M physical memory... I'm just looking to incorporate memory management.

Good for you! I prefer my own loader as well, but it is not for everybody.

sunnysideup wrote:After that's done, I'll work on interrupt handling and keyboard functionality. Does that seem reasonable?

There's no good answer to that question. It depends on your design, and on your design alone. Read What order should I make things? for further details.

Cheers,
bzt

sunnysideup · Post by **sunnysideup** » Tue Feb 11, 2020 10:49 am

I have written a simple page frame allocator/ physical memory manager... I would love to get inputs and suggestions

...

Ps: Is it acceptable to post long code snippets on the forum?

Code: Select all


// Standard includes:
#include<stdint.h> 

// Constant definitions:
#define PMMAP 0x1000
#define KERNEL_P 0x100000
#define BLOCK_SIZE 4096
#define SECTOR_SIZE 512
#define SECTORS_PER_BLOCK 8
#define BLOCK_SIZE_B 12
#define SECTOR_SIZE_B 9
#define SECTORS_PER_BLOCK_B 3

// Interface and responsibilities:
void  pmmngr_init(uint32_t mapentrycount);
uint32_t* pmmngr_allocate_block();
uint8_t pmmngr_free_block(uint32_t* address);

// External routines/variables:
extern uint32_t __begin[], __end[];  //Why arrays? This is a pretty cool concept to distinguish pointers

// Routines internal to object:
static void pmmngr_toggle_range(uint32_t start,uint32_t end);
static inline void pmmngr_toggle_block(uint32_t block_number);
static inline uint32_t block_number(uint32_t address);
static uint8_t get_lowest_bit(uint32_t hexinp);
static uint8_t extract_bit(uint32_t hexinp,uint8_t bitnumber); 

// Data structues and user defined data types:

uint32_t _mapentrycount;
uint32_t _physical_memory_table[0x8000];

typedef struct mmap_entry
{
    uint32_t  startLo;
    uint32_t  startHi;
    uint32_t  sizeLo;
    uint32_t  sizeHi;
    uint32_t  type;
    uint32_t  acpi_3_0;
} mmap_entry_t;


// Function implementations:
void  pmmngr_init(uint32_t mapentrycount)   //kernel size in 512 byte sectors - I'm not taking high into consideration because 32 bit
{ 
	_mapentrycount = mapentrycount;
	mmap_entry_t* map_ptr= (mmap_entry_t*)PMMAP;

	for (uint32_t i=0;i<0x8000;i++)
	       	_physical_memory_table[i] = 0xffffffff; //Make everything 1 -- Everything is occupied initially
	for( uint32_t i=0;i<mapentrycount;i++)
	{
		if((map_ptr -> type == 1)&&(map_ptr -> startLo >= KERNEL_P))
			pmmngr_toggle_range(map_ptr->startLo, map_ptr->startLo + map_ptr ->sizeLo);
		map_ptr ++;
	}				
	//Now we free the space occupied by the kernel and this memory manager :)
	uint32_t kernel_start = (uint32_t)__begin;
	uint32_t kernel_end = (uint32_t)__end;
	pmmngr_toggle_range (kernel_start,kernel_end);
}

uint32_t* pmmngr_allocate_block()
{
	uint32_t* address;
	for( uint32_t i=0;i<0x8000;i++)
		if (_physical_memory_table[i] < 0xffffffff)
		{
			uint8_t bit = get_lowest_bit(_physical_memory_table[i]);  //bit lies from 0 to 31
				if(bit == 0xff) return 0;
			address = (uint32_t*)((i << 17) + (bit << 12));
			pmmngr_toggle_block(block_number((uint32_t)address));
			return address;
		}
	return 0;
}
uint8_t pmmngr_free_block(uint32_t* address)
{
	if((uint32_t)address % BLOCK_SIZE != 0) return 0;

	mmap_entry_t* map_ptr= (mmap_entry_t*)PMMAP;
	for( uint32_t i=0;i<_mapentrycount;i++)
	{
		if(((map_ptr->startLo) <= (uint32_t)address)&& (((uint32_t)address-map_ptr->startLo) < map_ptr->sizeLo))
		{
			if (map_ptr -> type !=1) return 0;
			else break;
		}
		map_ptr ++;
	}

	uint32_t block = block_number((uint32_t)address);
	uint32_t word = block >> 5;
	uint8_t offset = block % 32;
	if(extract_bit((uint32_t)(_physical_memory_table + word),offset)) return 0;
	pmmngr_toggle_block(block);
	return 1;
}
// Helper function implementations:
static uint8_t get_lowest_bit(uint32_t hexinp)
{
	for(int i=0;i<32;i++)
	{
		if ((hexinp%2) == 0)
			return i;
		hexinp >>= 1;
	}
	return 0xff;
}
static uint8_t extract_bit(uint32_t hexinp,uint8_t bitnumber)  //bitnumber < 32
{
	for(int i=0;i< bitnumber;i++)
	{
		hexinp >>= 1;
	}
	return hexinp%2;
}
static inline uint32_t block_number(uint32_t address)
{
	return address >> BLOCK_SIZE_B;
}

static inline void pmmngr_toggle_block(uint32_t block_number)   //This must not be exposed to the programmer!!!
{
	uint8_t bit = block_number % 8; 
	uint8_t* byte = (uint8_t*)(block_number >> 3);
	byte += (uint32_t)_physical_memory_table;
	*byte ^= (1<<bit);
}
/** start and end are addresses... This just toggles the occupied/ not occupied status of the range**/
static void pmmngr_toggle_range(uint32_t start,uint32_t end)    //Optimize this out of this later
{
	if (start % BLOCK_SIZE != 0){start -= (start%BLOCK_SIZE_B);}
	if (end % BLOCK_SIZE != 0){end -= (end%BLOCK_SIZE_B); end += BLOCK_SIZE;}
	while((end - start) >= BLOCK_SIZE)
	{
		
		if((end - start) >= 32* BLOCK_SIZE)
		{
			uint32_t* byte = (uint32_t*)(block_number(start) >> 3);
			byte = (uint32_t*)((uint8_t*)byte +(uint32_t)_physical_memory_table);
			*byte ^= 0xffffffff;
			start += (BLOCK_SIZE<<5);
			
		}
		else
		{
			pmmngr_toggle_block(block_number(start));
			start += BLOCK_SIZE;
		}
	}
}

Well, It's definitely not the most efficient code... It seems to work for all the basic cases... I would love to hear suggestions

nullplan · Post by **nullplan** » Tue Feb 11, 2020 12:13 pm

Don't mind if I do. First of all some simplifications and corrections. This isn't optimization, it just makes the code easier to read. There is your extract_bit() function, which could just be written as:

Code: Select all

static uint8_t extract_bit(uint32_t hexinp,uint8_t bitnumber)  //bitnumber < 32
{
   return (hexinp >> bitnumber) & 1;
}

Then there's pmmngr_toggle_block(). You know, that is probably undefined behavior what you're doing there. But you can make it beautiful again:

Code: Select all

static inline void pmmngr_toggle_block(uint32_t block_number)   //This must not be exposed to the programmer!!!
{
   _physical_memory_table[block_number >> 5] ^= 1ul << (block_number & 31);
}

Remember, you are writing in C, there is no need to fall back into assembler. Not for this stuff, anyway.

In pmmngr_init() you are not taking the high parts into account because you cannot use them at this time. That's fair, but then you should reject those maps rather than accepting their low parts. Also, you are rejecting all maps that start below 1MB entirely. So on a hypothetical computer that only gives you a single map, going from 0 to top-of-memory, you would reject all maps and then not have any memory available. Maybe clip start and end of these ranges.

I also have no idea what business pmmngr_free_block() has iterating over the memory map. You don't expect anyone to free ranges not belonging to them, do you?

That's all I have for now.

sunnysideup · Post by **sunnysideup** » Tue Feb 11, 2020 12:34 pm

Very cool advice...

Well, for this:

nullplan wrote:Then there's pmmngr_toggle_block(). You know, that is probably undefined behavior what you're doing there. But you can make it beautiful again:

My idea is to have a routine that can toggle whether a 4K block is allocated/not. I could have done a set_block and free_block... I was a bit lazy I guess? (Pun not intended)

nullplan wrote:In pmmngr_init() you are not taking the high parts into account because you cannot use them at this time. That's fair, but then you should reject those maps rather than accepting their low parts. Also, you are rejecting all maps that start below 1MB entirely. So on a hypothetical computer that only gives you a single map, going from 0 to top-of-memory, you would reject all maps and then not have any memory available. Maybe clip start and end of these ranges.

Makes sense... thought about it during implementation too... I'll change it

nullplan wrote:I also have no idea what business pmmngr_free_block() has iterating over the memory map. You don't expect anyone to free ranges not belonging to them, do you?

Well, the idea was that I didn't want something like pmmngr_free_block( 0x9fc00) to happen... (Assume 0x9fc00 is a reserved memory block - EDBA perhaps)... Am I missing something here? Maybe after one enables paging, it would be impossible for such a situation to even occur?
I'm starting to get an idea... I'd love more inputs!

nullplan · Post by **nullplan** » Tue Feb 11, 2020 9:53 pm

sunnysideup wrote:My idea is to have a routine that can toggle whether a 4K block is allocated/not.

I know, and that wasn't the issue. But your address calculation in there. You first cast the block number to ubyte*, then add the base of the array cast to integer. This is undefined behavior, since pointer arithmetic is only defined within arrays. Moreover, it is way overcomplicated for what you wanted to do. And further down you even had a better version of array access.

That's why I made the comment about writing in C, not assembly. Your array is, in the first place, a collection of 32-bit values and should be treated as such. Essentially, memory is not just one giant char array.

sunnysideup wrote:Well, the idea was that I didn't want something like pmmngr_free_block( 0x9fc00) to happen...

I thought that might be the case, but why would that ever happen? That could only happen if someone calls pmmngr_free_block() with an address not previously allocated. That's like calling free() with an address not previously received from malloc(). Correct programs don't do this. Might as well panic in that case, since it shows a programming mistake.

sunnysideup · Post by **sunnysideup** » Wed Feb 12, 2020 1:28 am

Ah... I'm getting the idea about the C vs assembly argument...

For the other issue:

sunnysideup wrote:I thought that might be the case, but why would that ever happen? That could only happen if someone calls pmmngr_free_block() with an address not previously allocated. That's like calling free() with an address not previously received from malloc(). Correct programs don't do this. Might as well panic in that case, since it shows a programming mistake.

Well, I wanted to be a kind of protection from bad programs (that will ultimately be written by me

)... What would you suggest? Keep a (another?) data structure of blocks that have been allocated using pmmngr_allocate() and cross check with that?

linguofreak · Post by **linguofreak** » Wed Feb 12, 2020 2:08 pm

nullplan wrote:Also, you are rejecting all maps that start below 1MB entirely. So on a hypothetical computer that only gives you a single map, going from 0 to top-of-memory, you would reject all maps and then not have any memory available. Maybe clip start and end of these ranges.

But no system that's returning an E820 memory map is going to have such an entry, because you're going to have at least one non-RAM region between 640k and 1M.

nullplan · Post by **nullplan** » Wed Feb 12, 2020 2:33 pm

sunnysideup wrote:Well, I wanted to be a kind of protection from bad programs (that will ultimately be written by me )... What would you suggest? Keep a (another?) data structure of blocks that have been allocated using pmmngr_allocate() and cross check with that?

Well, I would mark that check as optional (something like #ifdef PMM_DEBUG around it), and in case it triggers, I would actually call the panic() function (to be written by you, of course), so that a wrong free() does not merely return an easily ignored error code, but rather, it halts the entire system. That should put enough pressure of suffering on you that you should feel motivated to solve the problem, rather than ignore it. (Yes, sometimes programming is about psychology).

linguofreak wrote:But no system that's returning an E820 memory map is going to have such an entry, because you're going to have at least one non-RAM region between 640k and 1M.

That's why I said hypothetical computer. Although it isn't so hypothetical: At work I deal with a system that has 512MB of RAM as a single block starting from address 0. Not a PC, obviously. Try to minimize assumptions, that keeps your code portable. The PMM code should not really depend on where the information on memory blocks is coming from. There is no reason for such a dependency.

Of course, there's nothing wrong with making a non-portable but widely held assumption, such as that a byte is 8 bits (very few systems do anything else these days), or that negative integers use two's complement representation. Byte order? You might be surprised there.

Octocontrabass · Post by **Octocontrabass** » Thu Feb 13, 2020 6:24 am

linguofreak wrote:But no system that's returning an E820 memory map is going to have such an entry, because you're going to have at least one non-RAM region between 640k and 1M.

Correct. Modern PCs don't return an E820 memory map (since they have only UEFI and no legacy BIOS). Future chipsets will probably allow that area to be ordinary RAM, too, although I'm not sure if any current chipsets allow it.

sunnysideup · Post by **sunnysideup** » Thu Feb 13, 2020 10:30 am

Alright... I'm working on virtual memory and such... I would like some advise on integrating assembly and C... Since I've been working for quite a bit of time, I know how to integrate asm and C the purest way... Like this:

Code: Select all

;File: mem.asm

global get_pdbr
get_pdbr:
	mov eax,cr3
	ret

Code: Select all

//File: C.asm

uint32_t* get_pdbr();

uint32_t *page_directory = get_pdbr();

This is a nice way to do it....
But how can I do it in gcc (something like inline assembly)?
Any recommendations/suggestions?

nullplan · Post by **nullplan** » Thu Feb 13, 2020 11:33 am

There are only two ways to do that: Either write your assembly functions as complete functions in assembly and call them from C code as normal, or use inline assembler. Unfortunately, GCC's inline assembler is Greater Magic, and getting it right is really difficult, so I tend to go for option A. It does mean my code will not be as optimized as it could be, but I'm not aiming for HPC yet. And honestly, most of those assembly functions will be a few instructions long, the function call overhead can probably be discounted until time comes to really try to save that last cycle.

To do what you want to do in inline assembly, you could do this:

Code: Select all

static inline uint32_t get_pdbr(void) {
  uint32_t r;
  asm ("movl %%cr3, %0" : "=r"(r));
  return r;
}

And that would work. This instruction has no side effects, so no volatile is needed, nor is a memory clobber. But already the opposite function (moving something into CR3) is way worse. Do you add a memory clobber? You'd have to, or else GCC doesn't know that a write into CR3 changes the results of future reads from CR3. Do you add a volatile? No, you don't have to as that snippet won't have an output part, so it is volatile by default. How do you transport into GCC the knowledge that re-reading CR3 is only necessary when it was changed in the middle, but rewriting it might be necessary for the side effect of getting a TLB flush? No clue. You might use a global dummy variable.

All these things I just don't have to think about, since I just write global functions, and GCC has no choice but to call them when I tell it to, since GCC doesn't have enough information to elide "superfluous" calls.

Side benefit: If you don't use inline assembly, you can write your code in whatever assembler you like, rather than in GAS.

sunnysideup · Post by **sunnysideup** » Fri Feb 14, 2020 8:15 am

Cool!
Moving back to memory management
In the course of this... I'm facing a bit of a dilemma....

Description of virtual memory map:

* Identity map of first 4 Mb
* Virtual address 0xC0000000 mapped to physical address 0x100000 (Also a 4 Mb map)

My page directory is at physical address 0x9c000.
My page table 1 is at physical address 0x9d000.
My page table 2 is at physical addres 0x9e000.

(I need only two page tables here

... These correspond to the identity map and higher memory map respectively)

Bless the identity mapping.... I can safely access my page directory and page tables as if paging wasn't even enabled. This makes it really easy for me to modify page tables,etc.

Now comes the issue: I may remove this identity mapping... If so, I can already imagine problems creeping up..
Eg. I have physical addresses that I want to access... But I can only access virtual ones. In order to map the virtual address to the required physical address, I need to access the page directory. But I have the physical address of the page directory... *I realize that I'm back where I started.

So, I'm guessing there's a need for some permanent mapping (or some sort of identity mapping for tables and the directory) so that I can forget about all this and get on with my life.

But if I map something permanently, I feel that I'm reducing the flexibilty of the program(kernel) in some sort of way.

What's the way one deals with this issue?

What happens when you **lose** the virtual address of the page directory? You can always get the physical address from cr3, but you have no idea where it's mapped, how to access it, and whatnot. In this case, I don't think one can even change the page directory location using cr3 because you'd be loading a physical address into it, but all that you can view are virtual addresses... I seems like a really scary situation here

Am I missing something?

OSDev.org

The general workflow to enable memory management

The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management

Re: The general workflow to enable memory management