OSDev.org

Posted: **Thu Jul 22, 2021 8:57 am**

The advantage of a linear map is not that it makes MMIO easier but (i) that it makes page table management vastly simpler, especially in the presence of SMP (where you need to perform shootdown after unmapping), and (ii) that it allows much faster access to arbitrary pages than temporary mappings (which are a mess on x86, since unlike on aarch64, you cannot have per-CPU higher halves without copying large amounts of memory around).

Posted: **Thu Jul 22, 2021 9:09 am**

Ethin wrote:

Code: Select all

if (ec & 1) > 0 {
            "protection violation"
        } else if !(ec & 1) > 0 {
            "page not present"
        } else if (ec & 1 << 2) > 0 {
            "UM priv violation"
        } else if !(ec & 1 << 2) > 0 {
            "KM priv violation"
        } else if ec & 1 << 3 > 0 {
            "PTT read failure"
        } else if ec & 1 << 4 > 0 {
            "Instruction fetch"
        } else {
            "unknown cause"
        },

Erm... unless Rust works completely differently from any other language I have ever looked at, each bit can only be zero or one. So shouldn't every possible error code end up in case 1 or 2?

Posted: **Thu Jul 22, 2021 10:02 am**

You realize that there are bit shifts involved?

Posted: **Thu Jul 22, 2021 10:05 am**

iansjack wrote:You realize that there are bit shifts involved?

Not in the second case. Also cases 3 and 4 test the same bit again.

Posted: **Thu Jul 22, 2021 11:14 am**

On second thoughts, you're right. The "!" tests are not only unnecessary but short cut the later tests.

Posted: **Thu Jul 22, 2021 12:36 pm**

That code is a bit old -- haven't touched it in a while. Will most likely rewrite it again to get rid of those useless tests (the compiler will most likely optimize them away as useless branches, but still).

Posted: **Thu Jul 22, 2021 3:41 pm**

Korona wrote:The advantage of a linear map is not that it makes MMIO easier but (i) that it makes page table management vastly simpler, especially in the presence of SMP (where you need to perform shootdown after unmapping), and (ii) that it allows much faster access to arbitrary pages than temporary mappings (which are a mess on x86, since unlike on aarch64, you cannot have per-CPU higher halves without copying large amounts of memory around).

I mostly see it as a huge security risk to have all memory accesible. It's bad enough with flat memory models & packed data, but adding to that a complete mapping of physical memory adds even more problems. Particularly if this mapping is heavily used. Basically, what it does is that it disables paging, and paging is supposed to restrict access to memory you should not mess with.

Posted: **Thu Jul 22, 2021 3:47 pm**

thewrongchristian wrote:
rdos wrote:Linear mapping of kernel memory doesn't seem like a good idea to me. It wouldn't be of any use in my non-flat kernel anyway since the physical to linear conversion doesn't give the logical address in the driver anyway. I implemented a specific driver that makes it easy to allocate same-sized objects and to do fast conversions between linear (rather logical) and physical addresses (and the reverse). It's only some type of MMIO devices that benefits from this anyway, and they typically allocate many same-sized objects, so no need to map the entire kernel using linear mapping.
That sounds exactly like what I'm planning. For DMA memory, I'm going to provide a wrapper that will provide an allocator, from which I can get aligned sub-page chunks, probably using a buddy system to keep the chunks sized and aligned at powers of two, while also providing easy and quick translation between virtual and physical addresses.

An example use would be in my USB drivers, so the HCI driver would allocate some small number of pages DMA accessible to the HCI hardware, wrap them with my wrapper, then use the wrapper to allocate the queue heads and transfer descriptors, and manipulate them using virtual pointers. So the driver can work with virtual address pointers for the most part, and translate to/from physical addresses when interfacing with the HCI hardware or physical address fields in the QH/TD.

It might also allow me to remove the virtual pointers I cache in the QH and TD currently, as I'll have an easy path back from a QH/TD physical address back to a virtual address I can reference in C, so I'll only have to allocate the memory required by the QH/TD in the HCI hardware, rather than over allocating to cache HCI driver data.

Exactly. Particularly USB 1 & 2 controllers are well-suited for this. It's also usable for AHCI. Maybe the best thing is that you can remove all allocations in a single step when the USB device is detached just by removing the memory object that handles same-size allocations & physical <-> linear translations.

Posted: **Thu Jul 22, 2021 4:02 pm**

vvaltchev wrote:
rdos wrote:It wouldn't be of any use in my non-flat kernel anyway since the physical to linear conversion doesn't give the logical address in the driver anyway.
Sorry, I didn't quite get that. Can you better explain what you meant?

I have a 32-bit segmented kernel. Most of the code will not rely on linear addresses, and so a physical to linear translation is of limited use. My dedicated API instead provides an offset into an object instead of a linear address, which is safer since this pointer can only access data allocated for the object, and not random data in case of corrupt physical addresses.

vvaltchev wrote: With MMIO, typically you know the physical address you want to read/write and need a vaddr to use because paging is ON.

If you just want to map a physical address in an device so it gets accessible, you allocate a linear address and map it to the physical device address. I would also map it to a selector, and then I can safely access the device through the selector. Another case is that an MMIO device will need buffers from user mode, and those cannot use linear mapping, and so you will have to read them out through the page tables. Then you have the case when a MMIO device modifies a schedule, and if you want to follow these chains you must map the physical addresses in linear address space, use a linear mapping or some other method. Since I allocate the scheule chain with a specific API that keeps the linear & physical base addresses in a header, I can validate a schedule link by comparing it to the physical base (and size), and then get the offset by subtracting the physical base.

vvaltchev wrote:
So, with linear mapping you just read/write to paddr + 3 GB. But, I understand that on 32-bit platforms that doesn't work all the time because HW is often mapped at the end of the physical memory. To deal with that, during the driver initialization I map the physical pages I need in the special 128 MB area [3 GB + 896 MB, 4 GB).

I don't find it acceptable to only support 4GB physical memory, and actually, you cannot map more than a couple of 100MBs before you get problems with linear address space. In 32-bit mode using modern computers, it is linear address space that is limited, not physical memory. I can use all physical memory by enabling PAE paging.

Posted: **Thu Jul 22, 2021 4:12 pm**

nexos wrote:Basically, to do a lazy MM, you would have a secondary structure (either a linked list or an AVL tree, your pick).

A system copy of the page table will do. Then you first allocate the page directories in the system copy, and then copy them to the current address space when needed. Using this method you cannot easily free page directories again, but this is typically not needed and you can avoid to map unused page directory entries.

Posted: **Thu Jul 22, 2021 5:07 pm**

rdos wrote:I have a 32-bit segmented kernel. Most of the code will not rely on linear addresses, and so a physical to linear translation is of limited use. My dedicated API instead provides an offset into an object instead of a linear address, which is safer since this pointer can only access data allocated for the object, and not random data in case of corrupt physical addresses.

rdos wrote:If you just want to map a physical address in an device so it gets accessible, you allocate a linear address and map it to the physical device address. I would also map it to a selector, and then I can safely access the device through the selector.

Ah OK, I get it. You have a very different design and rely on segmentation. I don't segmentation at all.

rdos wrote:Another case is that an MMIO device will need buffers from user mode, and those cannot use linear mapping, and so you will have to read them out through the page tables.

That's because you cannot have everything mapped in the virtual memory, right? That's the typical problem with trying to use 4 GB of RAM (or more) on modern machines, with a 32-bit OS.

rdos wrote:I don't find it acceptable to only support 4GB physical memory, and actually, you cannot map more than a couple of 100MBs before you get problems with linear address space. In 32-bit mode using modern computers, it is linear address space that is limited, not physical memory. I can use all physical memory by enabling PAE paging.

I support only 896 MB of usable memory, actually

Of course, Tilck's i686 support is for educational purposes only. Also, almost all the machines with > 1 GB of RAM are actually x86_64 machines. So, I see supporting x86_64 as a simple solution to this problem, because it allows keeping the linear mapping, instead of supporting something like Linux's HighMem. Your case instead, seems even more complex than that, because you not only have to handle much more memory than you can map, but you use segmentation as well.

Have you considered switching to x86_64 ?

Posted: **Thu Jul 22, 2021 6:25 pm**

rdos wrote:
nexos wrote:Basically, to do a lazy MM, you would have a secondary structure (either a linked list or an AVL tree, your pick).
A system copy of the page table will do. Then you first allocate the page directories in the system copy, and then copy them to the current address space when needed. Using this method you cannot easily free page directories again, but this is typically not needed and you can avoid to map unused page directory entries.

Well, technically speaking, yes, but an AVL tree would be faster.
The only issue I see with using an AVL tree would be the need to search on every page fault. Hence, I am planning a solution to this shouldn't require any extra memory, and allow page faults to be O(1).

Posted: **Fri Jul 23, 2021 2:26 am**

nexos wrote:
rdos wrote:
nexos wrote:Basically, to do a lazy MM, you would have a secondary structure (either a linked list or an AVL tree, your pick).
A system copy of the page table will do. Then you first allocate the page directories in the system copy, and then copy them to the current address space when needed. Using this method you cannot easily free page directories again, but this is typically not needed and you can avoid to map unused page directory entries.
Well, technically speaking, yes, but an AVL tree would be faster.
The only issue I see with using an AVL tree would be the need to search on every page fault. Hence, I am planning a solution to this shouldn't require any extra memory, and allow page faults to be O(1).

You cannot get more speed than direct referencing. You alias both the current page directory and the system page directory and then you have direct access to both without any "searching". Ideally, missing page directories will be handled by two page faults. First, you get a pagefault when you try to read the page table mapping through the current page table mapping, and it is the second pagefault that will be determined to be caused by a missing page directory entry, and then the handler first inspects the system mapping (and allocates a 4k page and zeros it if not present), and then copies the entry to the current mapping.

The overhead is only 4k if your kernel has up to 1G of linear address space.

Posted: **Fri Jul 23, 2021 5:15 am**

rdos wrote:You cannot get more speed than direct referencing. You alias both the current page directory and the system page directory and then you have direct access to both without any "searching". Ideally, missing page directories will be handled by two page faults. First, you get a pagefault when you try to read the page table mapping through the current page table mapping, and it is the second pagefault that will be determined to be caused by a missing page directory entry, and then the handler first inspects the system mapping (and allocates a 4k page and zeros it if not present), and then copies the entry to the current mapping.

The overhead is only 4k if your kernel has up to 1G of linear address space.

I plan on always mapping page tables to PDEs, not doing it on demand, that way, most faults can be handled in one #PF. I also plan on using an AVL tree of Address Range Descriptors. Each PTE that is currently allocated but not filled will be pointer to this structure. Of course, the problem here is that bit 0 cannot be set, but, because these things are always going to be in the kernel address area anyway, I can just set bit 0 when processing a #PF. Also, I plan on implementing some form of anticipatory paging, plus, I will zero freed pages in the background. This way page fault overhead will be minimized, while still gleaning the benefits of demand paging.

Posted: **Fri Jul 23, 2021 7:05 am**

nexos wrote:
rdos wrote:You cannot get more speed than direct referencing. You alias both the current page directory and the system page directory and then you have direct access to both without any "searching". Ideally, missing page directories will be handled by two page faults. First, you get a pagefault when you try to read the page table mapping through the current page table mapping, and it is the second pagefault that will be determined to be caused by a missing page directory entry, and then the handler first inspects the system mapping (and allocates a 4k page and zeros it if not present), and then copies the entry to the current mapping.

The overhead is only 4k if your kernel has up to 1G of linear address space.
I plan on always mapping page tables to PDEs, not doing it on demand, that way, most faults can be handled in one #PF. I also plan on using an AVL tree of Address Range Descriptors. Each PTE that is currently allocated but not filled will be pointer to this structure. Of course, the problem here is that bit 0 cannot be set, but, because these things are always going to be in the kernel address area anyway, I can just set bit 0 when processing a #PF. Also, I plan on implementing some form of anticipatory paging, plus, I will zero freed pages in the background. This way page fault overhead will be minimized, while still gleaning the benefits of demand paging.

In 32-bit mode, there are only 1024 PDEs, and 256 of those would be for the 1G kernel space. Allocating these on demand has no speed implications. However, if you pre-allocate them then you will waste 256 x 4k = 2M of physical memory (4M for PAE). Compare that with 4k for the system mapping. It actually doesn't matter if you set bit 0 or not. If you are not allocating them dynamically, then you will need to reserve physical memory for them that cannot be used for other things.

I set unallocated PTEs to zero, and when they are allocated I set bit 1 to indicate this. When the PF handler see a PTE with bit 0 zero and bit 1 as one, it will allocate physical memory for it, set up the PTE, and reexecute. So, if I allocate 10 pages in kernel, they will be setup with PTEs containing "2", and will consume no physical memory. When a pagefault occurs, physical memory is mapped automatically. The pagefault handler will generate a pagefault exception if bit 1 is not set. I also have a more specialized function where I can link a callback. This is not really demand paging, rather lazy allocation of physical memory in kernel space.

For page sharing, I use bit 11 of the PTE. If it is set, the physical address should not be freed when the linear area is freed. This provides a low-resource VMM for kernel space.

OSDev.org

which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use

Re: which memory allocation system to use