OSDev.org

Posted: **Wed Jun 05, 2019 10:28 am**

Hi everyone,
Here I am trying to implement some kind of a PMM.
I found out that the stack-based approach is somewhat the best. I will place it at the end of the kernel through a linker symbol.
For a single page I'm sure this is really fast to allocate, but I can't understand how to allocate a contiguous chunk of memory.
I mean, I allocate a page at '0x1234' and pop the address, but I can't be sure that the immediately after page is '0x1235'.
Said that, how should I allocate a chunk of memory anyway? I should return multiple addresses (the ones on the stack), right?

The pages are just examples, so don't focus too much on the location I used.

Posted: **Wed Jun 05, 2019 10:43 am**

Hi,

It doesn't really matter. With virtual memory the address space is contiguous, not the physical RAM. This also means you don't want to allocate more than a page at once, for example it's perfectly valid to have:
00000000 - mapped to 1234000
00001000 - mapped to 4321000
00002000 - mapped to 2345000

(Note that 0x1234 is not a valid page, as page frame addresses must be 4096 aligned (the last 3 digits must be 0)).

A slightly better approach than stack is run-length encoded stack. There you push two entries at once: a starting address and the number of pages. This has several advantages:
1. it requires considerably less memory
2. it's trivial to fill up the initial stack from E820 data as it uses the same format

3. you can search for entries with more pages if you really want to allocate contiguous physical pages

Here's an example implementation. My pmm_alloc(pages) accepts a number of pages to allocate argument, but kalloc() only passes 1. Just a side note, that's because there's a specialized pmm_allocslot() when I need 2M physical RAM for large page table entries, that also guarantees 2M alignment.

Cheers,
bzt

Posted: **Thu Jun 06, 2019 6:56 am**

Thank you.
However this implementation of the stack is going to slow down the popping of one page, right? There is no problem in this because of the other advantages but, as I understood, I should pop the couple (address + nPages) decrement it, re-push it and return the address?

Another thing that I'm not sure of is if I should consider the maximum stack length as reserved or I should pop the addresses off it when they are used.

bzt wrote:(Note that 0x1234 is not a valid page, as page frame addresses must be 4096 aligned (the last 3 digits must be 0)).

Thanks to make me remember this, too.

Posted: **Thu Jun 06, 2019 8:22 am**

bzt's suggestion about storing a chunk size is a nice optimization. The cost of re-pushing the entry will be negligible.

With all stack-based allocators, it's worth keeping in mind that they cannot (efficiently) support multi-page allocations that are necessary for some I/O devices. However, if you're just writing your first PMM, that is nothing to worry about.

Posted: **Thu Jun 06, 2019 8:49 am**

Korona wrote:bzt's suggestion about storing a chunk size is a nice optimization. The cost of re-pushing the entry will be negligible.

With all stack-based allocators, it's worth keeping in mind that they cannot (efficiently) support multi-page allocations that are necessary for some I/O devices. However, if you're just writing your first PMM, that is nothing to worry about.

All of this can be solved by simply switching to a bitmap based allocator. It is easy to implement, simple, can allocate as many pages as you want.

Trust me, it has never failed me, it was more than capable of supporting my DMA PATA driver.

Posted: **Thu Jun 06, 2019 6:57 pm**

Thpertic wrote:Thank you.
However this implementation of the stack is going to slow down the popping of one page, right? There is no problem in this because of the other advantages but, as I understood, I should pop the couple (address + nPages) decrement it, re-push it and return the address?

Well, that's one way to do it. But it's much faster to manipulate the entry on the top, and only pop it when it's nPages becomes zero. Then you'll have no slow down. That's what my allocator does. Note that although in theory it can scan for a suitable entry for nPages, in practice (because it's always called to allocate 1 page only) it always picks the top entry right away.

Thpertic wrote:Another thing that I'm not sure of is if I should consider the maximum stack length as reserved or I should pop the addresses off it when they are used.

Not sure if I understand your question. With simple stack, you calculate the maximum size by dividing the size of RAM by page size. With RLE stack, you estimate the number of fragments, and you can dynamically resize your stack if needed. The worst case scenario when memory is totally fragmented (that is, every second page is free and every other is allocated), stack RLE requires exactly the same amount of memory as the simple stack allocator with all memory free. But I would like to point out that reaching worst case is so extreme that you'll never face it in a million years. It's safe to say you'll need no more than 1/4th of the simple stack's starting memory requirement.

Octacone wrote:All of this can be solved by simply switching to a bitmap based allocator.

I would advise against it. Here are the reasons:

Asymptotic behaviour
Bitmap allocation is extremely expensive. You have to scan for a bit, which (even with hardware backed ffs()) is a very slow O(n) alogirthm, where n is the size of RAM. An application can't continue until the memory is allocated, so this is a bottleneck.
Freeing is cheap, true, O(1).

With stack, allocation is very fast, O(1). You just pop an entry.
Freeing is cheap too, O(1), a simple push.

With RLE stack, allocation is still fast, O(1), You just reach for the top entry.
Freeing is a bit more expensive, O(f) where f is the number of fragments in free memory (f is much much less than n by magnitudes, in worst case f = n/page size/2). This is because you have to scan for an entry to merge the freed page into. If you keep the list sorted using bsearch() and insert in place, then this can be O(log2(f)). Also you can do this "in the background", the application don't have to wait for freeing to finish as with allocation.

Now for the memory requirements
Bitmap needs constant 128K to describe 4G RAM.
Stack allocator needs 4M on start when all memory is free, which is the maximum.
RLE stack needs only 8 bytes on start when memory is not fragmented, and 4M in worst case scenario (when fragmentation is at maximum).

If we assume you allocate the same size of memory for RLE stack as for bitmap, then you can have 16384 fragments, which is not that much, but could be enough in most cases.

Things gets a lot worse with 64 bit. Assuming 16T RAM, which is quite a lot. Stack based allocation does not need 64 bit entries, because the least 12 bits are always zero, and if you don't store them, you can describe 32+12 bits of RAM, which is 16T.

Bitmap allocation there needs 32M.
Stack allocator needs 16G on start when all memory is free, which is the maximum.
RLE stack needs only 8 bytes on start when memory is not fragmented, and 16G in worst case scenario.

As you can see, with 32M you can have 4194304 (4 million) fragments, which is quite a lot, hard to imagine to exceed. This means the average memory consumption is the best for RLE stack, and even the worst case scenario requires no more than 1/1024th of the RAM.

Hope this answers your question.

Korona wrote:With all stack-based allocators, it's worth keeping in mind that they cannot (efficiently) support multi-page allocations that are necessary for some I/O devices.

True, but you need that so rarely, that the lost in efficiency does not really count. You allocate special I/O memory only on boot once, and maybe when a user attaches a new device if you support hot-plug. That's uncomparable to how many times applications call sbrk().

Cheers,
bzt

Posted: **Fri Jun 07, 2019 4:14 pm**

Thank you all again.

I'm trying the RLE stack-based allocation.
I set the stack start at the end of the kernel. I used two linker symbols (start and end) to know where the kernel effectively ends. Noted that I've setted it up in the higher half, may I subtract 0xC0000000? Either ways I'm having "end" equals to 0x23c... (??)
Then I'm testing if multiboot_info_t::flags & 0x1 is true, so I can use multiboot_info_t::mem_lower / 4 and multiboot_info_t::mem_upper / 4 (can you please tell me where does this parts starts?). Otherwise, if multiboot_info_t::flags & 0x20 is true I'm going to use this code to get free chunks of memory.

Code: Select all

memory_map_t *mmap = mbt->mmap_addr;
        while ((unsigned int)mmap < (mbt->mmap_addr + mbt->mmap_length)) {
            // Gonna push every free block 
            *stack++ = mbt->mmap_addr; // Or increment the next line
            *stack++ = mbt->mmap_length / 4;
            mmap = (memory_map_t *)(mmap + mmap->size + sizeof(mmap->size));
        }

Correct me if I'm wrong, please.
In fact when I first assign the stack pointer with

Code: Select all

*stack = whatever_number_it_is;

a Page fault comes up.

bzt wrote:
Thpertic wrote:Another thing that I'm not sure of is if I should consider the maximum stack length as reserved or I should pop the addresses off it when they are used.
Not sure if I understand your question.

I mean, when I'm going to push the free addresses I have to count off the kernel and the stack occupied memory (this should increase the minimum stack size to 8 byte to at least 12 byte, right?), but does I have to count the stack as 8/12 byte (initially) or for the entire possible length?

Posted: **Fri Jun 07, 2019 6:37 pm**

Thpertic wrote:Thank you all again.

I'm trying the RLE stack-based allocation.
I set the stack start at the end of the kernel. I used two linker symbols (start and end) to know where the kernel effectively ends. Noted that I've setted it up in the higher half, may I subtract 0xC0000000? Either ways I'm having "end" equals to 0x23c... (??)

You should write your linker script in a way that the labels reflect their true virtual addresses. So If you see your kernel at 0xC0000000 in a memdump, then your start label should be that too. If you see the last byte of your kernel in memory at 0xC0000023c, then your end label should be that plus 1. For the simplest upper half linker script:

Code: Select all

SECTIONS
{
    . = 0xC0000000;
    ks = .;
    .text : {
        *(.text .text.*)
        *(.rodata .rodata.*)
        *(.data .data.*)
    }
    . = ALIGN(4096);
    ke = .;
    .bss (NOLOAD) : {
        . = ALIGN(16);
        *(.bss .bss.*)
        *(COMMON)
    }
}

This will create one loadable segment with all the code and initialized data. Of course this won't be sufficient as your kernel gets more complex, but it is a good start. Notice that there's a page size alignment before it defines the kernel end label. This is important. Your kernel must start and end at page aligned addresses to save you from a lot of trouble.

For the physical address, you should consult the multiboot structures, and get it from there. You can't do that with a linker script or address calculation.

Thpertic wrote:Then I'm testing if multiboot_info_t::flags & 0x1 is true, so I can use multiboot_info_t::mem_lower / 4 and multiboot_info_t::mem_upper / 4 (can you please tell me where does this parts starts?). Otherwise, if multiboot_info_t::flags & 0x20 is true I'm going to use this code to get free chunks of memory.
Code: Select all
memory_map_t *mmap = mbt->mmap_addr;
        while ((unsigned int)mmap < (mbt->mmap_addr + mbt->mmap_length)) {
            // Gonna push every free block 
            *stack++ = mbt->mmap_addr; // Or increment the next line
            *stack++ = mbt->mmap_length / 4;
            mmap = (memory_map_t *)(mmap + mmap->size + sizeof(mmap->size));
        }
Correct me if I'm wrong, please.

The way you jump to the next entry looks suspicious. If mbt is a struct that has the same size as a multiboot entry, why not mbt++? And why do you divide the length by 4? To convert to pages, you should do length >> 12 (because 4096 = 2^12). Also you seem to mix mbt and mmap. Decide which one you use.

Thpertic wrote:In fact when I first assign the stack pointer with
Code: Select all
*stack = whatever_number_it_is;
a Page fault comes up.

No wonder

You're dereferencing an unset pointer. To give an address to a pointer, you should use

Code: Select all

stack = whatever_number_it_is;
stack = &whatever_label_it_is;

If you point it to a scalar type, then you dereference and access the pointed value with "*stack". If you use a struct, then you use "->" and field name like with mbt. In this regard "*stack++" is quite misleading, one would expect that it increments the pointed value, but no. For that, you need "(*stack)++". Don't get me wrong, but are you sure you have the required knowledge?

Thpertic wrote:I mean, when I'm going to push the free addresses I have to count off the kernel and the stack occupied memory (this should increase the minimum stack size to 8 byte to at least 12 byte, right?), but does I have to count the stack as 8/12 byte (initially) or for the entire possible length?

Obviously you must exclude every memory you use otherwise it would be served as free and applications would overwrite your kernel when they receive those pages.

About your stack size probably yes, because now you have to store 2 entries:
Let's assume you got an entry from multiboot [a,n] and your kernel is located in the physical RAM at [ks, ke], which is within [a, a+n*pagesize]. Then you want to push [a, (ks-a)/pagesize] and [ke, n-ke/pagesize].

Code: Select all

+-+ first line: free memory reported (mbt)
### second line: your kernel (physical ks to physical ks + (ke - ks))
+-+ third line: free memory you push onto the stack (a to a+n)

Possible cases:
+---+
     ####
+---+      that's cool, no overlapping, push mbt as is

  +----+   more than interesting, partially overlap, would mean corrupt memory map or your kernel loaded partially in ROM?
####
????????

+----+     likewise, should never happen either
   #####
????????

+-------+
    ##
+--+  +-+   in the middle, now you have to push two entries, as you assumed

+-----+
###
   +--+     at the beginning, this time it is enough to push one, substract kernel size, add kernel size to starting address

+-----+
    ###
+--+       at the end, very unlikely but possible, this time you just substract the kernel size, starting address remains

This may seem a bit difficult, but it's nothing more than intersection handling really. You'll need that a lot later

To simplify things at the beginning of your project, you can safely assume that on qemu multiboot loads your kernel at 1M, and that there's also a free memory entry at 1M which is bigger than your kernel. So you can be sure that your kernel is located at the beginning of that free memory region. Therefore this will do the trick:

Code: Select all

if(mbt->mmap_addr == 1024*1024) {
  mbt->mmap_addr += ke - ks;
  mbt->mmap_size -= ke - ks;
}
*stack++ = mbt->mmap_addr;
...

Of course once you got the allocation up and running, you should fix this to handle all possible cases properly, because that 1M address is just a well educated guess, not a fact. On a real hardware it could be anything, so get it from the multiboot info. No matter where your kernel is loaded, or if ks, ke are physical addresses or virtual ones, the kernel's size will be (ke-ks).

Cheers,
bzt

Posted: **Sun Jun 09, 2019 1:53 pm**

Sorry, I forgot to post my linker script (after your suggestions). I setup the kernel in the higher half of the RAM (3GB).

Code: Select all

/**
 * The bootloader will look at this image and start execution at the symbol
 * designated at the entry point. 
 */
ENTRY(setup)
OUTPUT_FORMAT(elf32-i386)

KERNEL_VIRTUAL_BASE = 0xC0000000;

SECTIONS {
    /**
     * The real kernel binary starts at 1MB.
     */
    . = 0x00100000;

    /**
     * Set a section for the lower half code.
     */
    .lowerhalf ALIGN(4K) : {
        *(.lowerhalf.data)
        *(.lowerhalf.text)
    }

    /** 
     * Kernel binary starts at 3GB + 1MB.
     * This is a virtual memory address and now all the variables, code labels 
     * are going to be referenced based on this.
     */
    . += KERNEL_VIRTUAL_BASE;
    . = ALIGN (4096);
    start = .;

    /**
     * *(.section name) means: all sections named "section name" in all input object files, please put them here.
     * .multiboot is a section defined in entry.asm, it contains information for grub to find where our kernel entry point is.
     * Use keyword AT to relocate text section, same for rodata, data and bss section.
     */
    .text ALIGN(4K) : AT(ADDR(.text) - KERNEL_VIRTUAL_BASE) {
        *(.text)
    }
    .data ALIGN (4K) : AT(ADDR(.data) - KERNEL_VIRTUAL_BASE) {
        *(.data)
        *(.rodata*)
    }

    /**
     * .bss is also data section, but it contains uninitialized data
     * .common is also uninitialized data section
     */
    .bss ALIGN(4K) : AT(ADDR(.bss) - KERNEL_VIRTUAL_BASE) {
        _sbss = .;
       *(COMMON)
       *(.bss)
       _ebss = .;
    }

    /DISCARD/ : {
        *(.eh_frame);
        *(.comment*);
    }

    /**
     * Put a symbol end here, it tells us where all the kernel code/data ends, it means everything after 'end' can be used for something else
     */
    . = ALIGN(4096);
    end = .;
}

Actually though, I get "start" equals 256MB and "end" as 18KB using a simple printf.

bzt wrote:
Thpertic wrote:Then I'm testing if multiboot_info_t::flags & 0x1 is true, so I can use multiboot_info_t::mem_lower / 4 and multiboot_info_t::mem_upper / 4 (can you please tell me where does this parts starts?). Otherwise, if multiboot_info_t::flags & 0x20 is true I'm going to use this code to get free chunks of memory.
Code: Select all
memory_map_t *mmap = mbt->mmap_addr;
        while ((unsigned int)mmap < (mbt->mmap_addr + mbt->mmap_length)) {
            // Gonna push every free block 
            *stack++ = mbt->mmap_addr; // Or increment the next line
            *stack++ = mbt->mmap_length / 4;
            mmap = (memory_map_t *)(mmap + mmap->size + sizeof(mmap->size));
        }
Correct me if I'm wrong, please.
The way you jump to the next entry looks suspicious. If mbt is a struct that has the same size as a multiboot entry, why not mbt++? And why do you divide the length by 4? To convert to pages, you should do length >> 12 (because 4096 = 2^12). Also you seem to mix mbt and mmap. Decide which one you use.

You are right about the bit-wise shift, but to retrieve the memory map I'm using a structure similar to this one:

Code: Select all

typedef struct multiboot_memory_map {
	unsigned int size;
	unsigned int base_addr_low,base_addr_high;
// You can also use: unsigned long long int base_addr; if supported.
	unsigned int length_low,length_high;
// You can also use: unsigned long long int length; if supported.
	unsigned int type;
} multiboot_memory_map_t;
 
int main(multiboot_info* mbt, unsigned int magic) {
	...
	multiboot_memory_map_t* mmap = mbt->mmap_addr;
	while(mmap < mbt->mmap_addr + mbt->mmap_length) {
		...
		mmap = (multiboot_memory_map_t*) ( (unsigned int)mmap + mmap->size + sizeof(mmap->size) );
	}
	...
}

Which you can find in the Wiki https://wiki.osdev.org/Detecting_Memory ... p_Via_GRUB.

bzt wrote:
Thpertic wrote:In fact when I first assign the stack pointer with
Code: Select all
*stack = whatever_number_it_is;
a Page fault comes up.
No wonder You're dereferencing an unset pointer. To give an address to a pointer, you should use
Code: Select all
stack = whatever_number_it_is;
stack = &whatever_label_it_is;
If you point it to a scalar type, then you dereference and access the pointed value with "*stack". If you use a struct, then you use "->" and field name like with mbt. In this regard "*stack++" is quite misleading, one would expect that it increments the pointed value, but no. For that, you need "(*stack)++". Don't get me wrong, but are you sure you have the required knowledge?

I've set stack with

Code: Select all

stack = end;

and later tried to assign the reference of it. I badly explained myself and I'm really sorry.
Furthermore I indeed wanted to increment the stack pointer to then assign the number of the pages.

Thanks anyways, I hope I explained better now.

Posted: **Sun Jun 09, 2019 8:09 pm**

Hi,

Okay, there are definitely things to sort out here.

1. your linker script is wrong for sure if your end label is below start label.
I think you want start at KERNEL_VIRTUAL_BASE, no? Regardless your script now sets start to KERNEL_VIRTUAL_BASE + 1M + size of lowerhalf rounded up to 4k. This is surely not what you want. Just an advice, forget printf for now, use readelf to list your elf and check your symbols' addresses. It's much faster. Only go forward when you see the correct addresses in "readelf -s mykernel"'s output.

2. ok, now I see what you're trying to do with mmap. This is strictly my own opinion, but GRUB and it's nonconsistent multiboot protocol is a big pile of sh*t. There's a very good reason why I've written my own bootloader instead

I would recommend it, but that's for 64 bit kernels only. I think (I have to stress that it's just a guess) you don't have to add sizeof(mmap->size) just mmap->size to get the next entry? I'm sure there are people here who have done this and can help you with multiboot more than I can. Or check out some of the sources listed on wiki OS projects page for working examples. My advice however is to printf the mmap records first, and see if they match the ones you see in lsmmap at grub's prompt. Play around with "mmap += X", don't continue until you see exactly the same addr and length values.

3. you don't "allocate" space for the stack in your linker script. You should. You should use NOLOAD too on the bss section, and add the stack size otherwise your stack will overwrite it.

Code: Select all

. = ALIGN(4096);
. += 8192; /* use whatever stack size you seem appropriate */
stack_top = .;
end = .;

You can put the first three lines inside the bss {} block at the end if you want (that would be nicer).
Don't worry about the "*stack" error for now, fix the end label first. If the problem still presists, then I would suggest to use readelf to get the address of end (let's say X), and then boot your image in bochs, and at the debug prompt use "page X-4" to see to which physical page that page is mapped to. If it's not 1M + end - start & ~0xFFF, then you have a problem with your paging initialization code. But as I've said, fix your linker script first, because your stack is at end and right now your end label is at 18k which surely is a wrong unmapped address.

Good luck!
bzt

OSDev.org

Stack-based Physical Memory Manager

Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager

Re: Stack-based Physical Memory Manager