OSDev.org

Posted: **Mon Jul 19, 2021 9:34 am**

rdos wrote:
nexos wrote:You could just use a mutex to do this. I hear you saying, "but don't you need the PMM before the scheduler?" And yes, that is true, but you could have a flag to determine whether locks are needed or not.
It's even more complex than that since you might need the PMM before you actually have set up all the free pages in the PMM. The bitmap allocator will need physical pages to create the bitmaps, which creates a nasty recursive situation. Something that actually is easier to handle with lists where you only need to link them.

That's easy enough to solve, as you could just find a large enough free range in the memory map

Posted: **Mon Jul 19, 2021 9:43 am**

rdos wrote:It seems like people attribute PMM to handle user memory mappings. My lock-free physical memory handler only has functions to allocate one or more physical page frames or to free them. It doesn't deal with mapping them into linear address space at all, nor does it care about reference counts.

It seems like people get confused about which functions should be part of the PMM and which functions should be part of the VMM. Mapping and reference counting are both supposed to be part of the VMM.

Posted: **Tue Jul 20, 2021 8:33 am**

nexos wrote:
iansjack wrote:You forgot the slab allocator, which would be my choice.
I was focusing on physical memory allocators The slab would be my choice for heap type allocators. I personally haven't heard of slab PMMs, but if there are those, please, correct me

Well after looking at NetBSD, I guess I was wrong. Slab can be used as a PMM.

Posted: **Tue Jul 20, 2021 10:20 am**

As it allocates fixed-size blocks of memory, it's a good choice. That's exactly what PMM needs.

Posted: **Tue Jul 20, 2021 10:31 am**

iansjack wrote:As it allocates fixed-size blocks of memory, it's a good choice. That's exactly what PMM needs.

Yep. I am going to do slab as my PMM after seeing that. It will prevent the internal fragmentation of buddy and be faster.

Posted: **Wed Jul 21, 2021 1:09 pm**

My MM doesn't use lazy memory mapping. I've tried doing that and it creates a recursive page fault loop. So I just don't bother with that. It requires manual calls to allocate_*() and free_range(), but I could probably abstract that into something like a MemoryAllocation struct that auto-deallocates when dropped.
For my heap allocator I use a linked list allocator. I've considered switching it to a slab allocator or, hell, even just allowing the builder to pick the allocator to use since its not that difficult to alter which one is used because its a set-and-forget kind of thing.

Posted: **Wed Jul 21, 2021 1:28 pm**

Basically, to do a lazy MM, you would have a secondary structure (either a linked list or an AVL tree, your pick). Allocate would be searching for a free range. Freeing would be adding a free range to the list. At first, you don't have it mapped. When a page fault occurs, you map it into the page directory and allocate physical memory. Oh, and don't forget to zero out the memory first, as not doing that is a security nightmare! Also, look into anticipatory paging. This removes some #PF overhead.

As for heap, I am going to have the kernel use a slab allocator, and user space libraries have their pick between slab and AVL trees. Using linked lists for memory allocation isn't the best idea, as they have bad cache performance (as each entry could potentially thrash another), and are O(n) in time complexity.

@Ethin, how come you had recursive #PF loops? I would be interested to see your code for that.

Posted: **Wed Jul 21, 2021 2:56 pm**

nexos wrote:Basically, to do a lazy MM, you would have a secondary structure (either a linked list or an AVL tree, your pick). Allocate would be searching for a free range. Freeing would be adding a free range to the list. At first, you don't have it mapped. When a page fault occurs, you map it into the page directory and allocate physical memory. Oh, and don't forget to zero out the memory first, as not doing that is a security nightmare! Also, look into anticipatory paging. This removes some #PF overhead.

As for heap, I am going to have the kernel use a slab allocator, and user space libraries have their pick between slab and AVL trees. Using linked lists for memory allocation isn't the best idea, as they have bad cache performance (as each entry could potentially thrash another), and are O(n) in time complexity.

@Ethin, how come you had recursive #PF loops? I would be interested to see your code for that.

TBH I don't know why. My #PF handler looks like this -- quite simple really:

Code: Select all

extern "x86-interrupt" fn handle_pf(frame: InterruptStackFrame, error_code: PageFaultErrorCode) {
    use crate::idle_forever;
    use x86_64::registers::control::Cr2;
    let addr = Cr2::read();
    let ec = error_code.bits();
    error!(
        "Page fault: {} while {} memory address {:X}h",
        if (ec & 1) > 0 {
            "protection violation"
        } else if !(ec & 1) > 0 {
            "page not present"
        } else if (ec & 1 << 2) > 0 {
            "UM priv violation"
        } else if !(ec & 1 << 2) > 0 {
            "KM priv violation"
        } else if ec & 1 << 3 > 0 {
            "PTT read failure"
        } else if ec & 1 << 4 > 0 {
            "Instruction fetch"
        } else {
            "unknown cause"
        },
        if ec & 1 << 1 > 0 {
            "writing to"
        } else {
            "reading from"
        },
        addr.as_u64()
    );
    error!(
        "Details:\nRegisters: RIP = {:X}\tCS = {:X}\tflags = {:X}\tRSP = {:X}\tSS = {:X}",
        frame.instruction_pointer.as_u64(),
        frame.code_segment,
        frame.cpu_flags,
        frame.stack_pointer.as_u64(),
        frame.stack_segment
    );
    idle_forever();
}

Doing all the generation in the macro call itself probably is slower than generating the string by hand, but this does the job. If I add an allocate_page_range() call (which is my VMMs function for allocating virtual pages) or allocate_phys_range() (which is my VMMs function for allocating physical pages of RAM for MMIO access) the fault handler continues to be called repeatedly with lots of addresses and never seems to stop faulting.

Posted: **Wed Jul 21, 2021 4:00 pm**

Huh, interesting. I might could be of help, but I don't speak Rust (yet

). I've been wanting to learn the language, but don't have time at the moment.

Posted: **Wed Jul 21, 2021 5:55 pm**

Ethin wrote:Doing all the generation in the macro call itself probably is slower than generating the string by hand, but this does the job. If I add an allocate_page_range() call (which is my VMMs function for allocating virtual pages) or allocate_phys_range() (which is my VMMs function for allocating physical pages of RAM for MMIO access) the fault handler continues to be called repeatedly with lots of addresses and never seems to stop faulting.

Random guess: maybe both of your allocators need to touch un-mapped memory in order to allocate the memory for the page you need in the page fault handlers, that's why you get an infinite fault recursion. Debugging this with QEMU + GDB shouldn't be hard. Not sure for the Rust case, though.

For debugging purposes, you could raise an in_allocator flag at the beginning of your vmm allocator (or physical one, it depends how you wanna allocate memory in the PF handler) and ASSERT(!in_allocator) in the PF handler.

Another theory: your VMM allocator might need to actually map the page allocated with the physical in the page directory. When doing that, sometimes, you need to allocate a page for the page table. If that calls the same allocator, that's a problem.

If I remember correctly, I had several kinds of small problems when I had both a physical pageframe allocator and a VMM allocator in Tilck. At some point, I made it all work properly but the performance wasn't as good as I wanted because the VMM allocator had to call the physical allocator from time to time. Even if that happened only for some bigger chunks (e.g. 1 MB), that had still too much of an overhead for my taste (note: my performance criteria are very tight). Also, each page directory needed a PADDR pointer, because I wasn't using a linear mapping: each page in the kernel VMM could be anywhere in the physical memory. That approach is flexible, but it has an overhead in many places, both in terms of performance and in terms of code complexity.

At the end, I switched to a simplified version of the Linux model: pure linear mapping. At boot, I map all the usable regions of physical memory in the range [0, 896 MB) to +3 GB in the kernel's VMM (0xC0000000). When possible, I use 4MB pages to save memory for page tables. With that, the conversion between paddr and vaddr is linear, with a simple constant. The allocator reserves virtual memory in "heaps" contained in usable regions that have a 1:1 mapping to the physical memory. The extra 128 MB in the VMM are used for special purposes, but there's nothing like Linux's "HighMem": in other words, on 32-bit machines my kernel won't support more than 896MB of RAM. But that's not a problem for me: either the OS will run on small 32-bit embedded systems where 896 MB is more than we can ever consider, or it will run (in the future) on 64-bit machines, where the linear mapping has no practical limitations.

Windows' model: in my knowledge, Windows uses a hybrid model: linear mapping between regions in the VMM and the physical memory, but different regions in the VMM can be mapped anywhere in the physical relatively to each other. So, given a vaddr we need to know its region before being able to determine where is mapped. Given a paddr, we need to determine first if we have a region that contains that paddr range and its initial address, in order to get an usable vaddr. That's the most complex model IMHO, but it certainly works as well.

Posted: **Thu Jul 22, 2021 2:19 am**

Linear mapping of kernel memory doesn't seem like a good idea to me. It wouldn't be of any use in my non-flat kernel anyway since the physical to linear conversion doesn't give the logical address in the driver anyway. I implemented a specific driver that makes it easy to allocate same-sized objects and to do fast conversions between linear (rather logical) and physical addresses (and the reverse). It's only some type of MMIO devices that benefits from this anyway, and they typically allocate many same-sized objects, so no need to map the entire kernel using linear mapping.

Posted: **Thu Jul 22, 2021 6:00 am**

rdos wrote:It wouldn't be of any use in my non-flat kernel anyway since the physical to linear conversion doesn't give the logical address in the driver anyway.

Sorry, I didn't quite get that. Can you better explain what you meant? With MMIO, typically you know the physical address you want to read/write and need a vaddr to use because paging is ON. So, with linear mapping you just read/write to paddr + 3 GB. But, I understand that on 32-bit platforms that doesn't work all the time because HW is often mapped at the end of the physical memory. To deal with that, during the driver initialization I map the physical pages I need in the special 128 MB area [3 GB + 896 MB, 4 GB).

Sill, for everything else not related with HW devices, the linear mapping makes the code a lot simpler and faster. With 64-bit CPUs, I won't have that problem with HW as well.

Posted: **Thu Jul 22, 2021 6:26 am**

@vvaltchev, what kind of PMM does Tilck use? I plan on doing a slab PMM, and calling that twice should add very little overhead.

Posted: **Thu Jul 22, 2021 6:52 am**

nexos wrote:@vvaltchev, what kind of PMM does Tilck use? I plan on doing a slab PMM, and calling that twice should add very little overhead.

I've implemented a special kind of buddy allocator that supports nesting with a lower-level allocator through a simple interface. In the past, I had a physical pageframe allocator that I "passed" to my kmalloc as a "lower-level allocator". The pageframe allocator used a bitset: 1 bit per pageframe. (Note: physical pages are better called "pageframes" to avoid confusion with VMM pages). Reserved/non-usable areas had all of their bits set.

Since early 2018, I dropped the pageframe allocator, after fully switching to a linear model. Since then, in most of the cases, after kmalloc finds a heap and allocates a block, it doesn't need to call anything else. Note that with the non-linear model, one of the slowest thing was that kmalloc() had to call map_pages() after the getting memory from the physical allocator. For 1 page, it had some maybe tolerable overhead, but for what about allocating bigger buffers? Because I wanted no fragmentation in the physical memory, I couldn't guarantee more than 1 single page of contiguous memory. So, allocating 1 MB for example could meant mapping 128 separate pages in VMM and that also required sometimes the allocation of page tables, which had to bypass kmalloc() and use directly the physical allocator to not create an infinite loop, like Ethin's case (I guess). So, no matter how fast your allocator is, if you have to map and unmap pages, that will slow down everything. Then I added more optimizations, like the possibility to get 128 KB of contiguous physical memory (32 bits * 4 KB = 128 KB), but I still had to fall-back to the page-by-page case if that failed. It was a mess for me. Obviously, I could make my kmalloc use itself as a physical allocator as well, and guarantee faster allocations but.. not faster on-the-fly memory mappings. With the linear model, allocating regular memory stopped meaning that I had to touch the page directory and that turned out to be really fast.

Posted: **Thu Jul 22, 2021 7:18 am**

rdos wrote:Linear mapping of kernel memory doesn't seem like a good idea to me. It wouldn't be of any use in my non-flat kernel anyway since the physical to linear conversion doesn't give the logical address in the driver anyway. I implemented a specific driver that makes it easy to allocate same-sized objects and to do fast conversions between linear (rather logical) and physical addresses (and the reverse). It's only some type of MMIO devices that benefits from this anyway, and they typically allocate many same-sized objects, so no need to map the entire kernel using linear mapping.

That sounds exactly like what I'm planning. For DMA memory, I'm going to provide a wrapper that will provide an allocator, from which I can get aligned sub-page chunks, probably using a buddy system to keep the chunks sized and aligned at powers of two, while also providing easy and quick translation between virtual and physical addresses.

An example use would be in my USB drivers, so the HCI driver would allocate some small number of pages DMA accessible to the HCI hardware, wrap them with my wrapper, then use the wrapper to allocate the queue heads and transfer descriptors, and manipulate them using virtual pointers. So the driver can work with virtual address pointers for the most part, and translate to/from physical addresses when interfacing with the HCI hardware or physical address fields in the QH/TD.

It might also allow me to remove the virtual pointers I cache in the QH and TD currently, as I'll have an easy path back from a QH/TD physical address back to a virtual address I can reference in C, so I'll only have to allocate the memory required by the QH/TD in the HCI hardware, rather than over allocating to cache HCI driver data.

OSDev.org

which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use