Virtual Memory Goals

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
~
Member
Member
Posts: 1228
Joined: Tue Mar 06, 2007 11:17 am
Libera.chat IRC: ArcheFire

Re: Virtual Memory Goals

Post by ~ »

To make things more bearable, it should probably be better to make allocations semi-static, meaning that different subsystems would use different areas in the logical/physical address space, as if it was a hard disk partition with special files and areas. For example, we would treat memory-mapped devices with a semi-static style and format to allocate, and the same would happen for regular applications, shared stuff, etc...

In this way we would get to define a lot of details and structure of pages and the whole memory management system beforehand. We would then make it flexible by allowing the smallest and the biggest possible allocations in a single operation.


I also found this in a book; seems very interesting:
If you dynamically allocate your device structure at runtime, use the VMM service _HeapAllocate, which is very similar to malloc. However, if your device structure includes a large buffer (4Kb or larger), you'll want to include only a pointer to the buffer in the device structure itself, and then allocate the large buffer separately using _PageAllocate. The rule is to use _HeapAllocate for small allocations and _PageAllocate for large allocations, where small and large are relative to 4Kb.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Virtual Memory Goals

Post by Brendan »

Hi,
alexfru wrote:
LtG wrote:Out of curiosity, where'd you get the number 6?
Something like a movsd instruction crossing 3 page boundaries (1 code and 2 data).
Yes.

There's multiple cases where CPU accesses "EIP plus 2 other addresses" (MOVSD, CMPSD, PUSH, POP); and with misaligned accesses (e.g. accessing 2 bytes at 0x0FFFFFFF) each of the 3 simultaneous accesses can be split across 2 pages (and page tables, and ..); leading to a worst case of "6 pages plus 6 page tables plus page directory = 52 KiB" (for "plain 32-bit paging"), and a worst case of "6 pages plus 6 page tables plus 6 page directories plus 6 PDPTs plus PML4 = 100 KiB" (for long mode).

Note that there are extremely rare cases involving even more simultaneous accesses (e.g. the "Debug Store Mechanism" can add an additional access to anything, hardware virtualisation with nested paging structures, etc). For this reason; if you're buying a new computer I'd recommend getting at least 256 KiB of RAM to make sure you have enough. 8)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Virtual Memory Goals

Post by alexfru »

Brendan wrote: There's multiple cases where CPU accesses "EIP plus 2 other addresses" (MOVSD, CMPSD, PUSH, POP); and with misaligned accesses (e.g. accessing 2 bytes at 0x0FFFFFFF) each of the 3 simultaneous accesses can be split across 2 pages (and page tables, and ..); leading to a worst case of "6 pages plus 6 page tables plus page directory = 52 KiB" (for "plain 32-bit paging"), and a worst case of "6 pages plus 6 page tables plus 6 page directories plus 6 PDPTs plus PML4 = 100 KiB" (for long mode).
Page tables and directories are only involved when the addresses aren't in the TLB yet. I guess, the CPU would have to bring the addresses into the TLB *before* executing the instruction (at least the addresses for the code pages containing the instruction) and it shouldn't evict those TLB entries (the ones that are valid for the instruction) from the TLB if, say, just page out of 6 code/data pages isn't accessible, which means you probably don't need to have 13 physical pages mapped at the same time on a 80386. IOW, here's what can happen:
  • page fault for the physical address of the first byte of the instruction (fills one TLB entry if handled)
  • page fault for the instruction fetch
  • page fault for the physical address of the remaining bytes of the instruction (fills one more TLB entry if handled)
  • page fault for the instruction fetch
  • page faults for the physical address of the data accessed by the instruction (several more TLB entries filled if handled)
  • page faults for the data accesses by the instruction
Once filled, the TLB entries should not get invalidated and then filled again when we return from the page fault handler to the instruction (unless the TLB is too small for all the activities performed by the page fault handler). Right? Wrong?
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Virtual Memory Goals

Post by LtG »

alexfru wrote: Page tables and directories are only involved when the addresses aren't in the TLB yet. I guess, the CPU would have to bring the addresses into the TLB *before* executing the instruction (at least the addresses for the code pages containing the instruction) and it shouldn't evict those TLB entries (the ones that are valid for the instruction) from the TLB if, say, just page out of 6 code/data pages isn't accessible, which means you probably don't need to have 13 physical pages mapped at the same time on a 80386. IOW, here's what can happen:
  • page fault for the physical address of the first byte of the instruction (fills one TLB entry if handled)
  • page fault for the instruction fetch
  • page fault for the physical address of the remaining bytes of the instruction (fills one more TLB entry if handled)
  • page fault for the instruction fetch
  • page faults for the physical address of the data accessed by the instruction (several more TLB entries filled if handled)
  • page faults for the data accesses by the instruction
Once filled, the TLB entries should not get invalidated and then filled again when we return from the page fault handler to the instruction (unless the TLB is too small for all the activities performed by the page fault handler). Right? Wrong?
I think your point is right, but I don't think it helps much if at all.. You still need those TLB entries to point to physical memory, if you're going to repurpose the backing memory (RAM) to load in what ever is needed by the the code (which caused the #PF to begin with) then you're going to need to fill that RAM with something else.

Essentially you'd need to make the cache think that two different TLB (virtual address) entries are pointing to same backing memory (same 4KiB RAM page), yet the cache would have different contents for each virtual address.

Or did you mean when the page dir has a not-present PDE you'd create one on the fly (incl. the page table), containing only that one PTE entry, allowing the #PF to be resolved and a single TLB entry to be "fed" to the TLB. And then always re-purposing the 4KiB RAM page for the same use? I think it might be possible to get that to work...

Or did you mean something else?

Brendan wrote: There's multiple cases where CPU accesses "EIP plus 2 other addresses" (MOVSD, CMPSD, PUSH, POP); and with misaligned accesses (e.g. accessing 2 bytes at 0x0FFFFFFF) each of the 3 simultaneous accesses can be split across 2 pages (and page tables, and ..); leading to a worst case of "6 pages plus 6 page tables plus page directory = 52 KiB" (for "plain 32-bit paging"), and a worst case of "6 pages plus 6 page tables plus 6 page directories plus 6 PDPTs plus PML4 = 100 KiB" (for long mode).
Might get slightly off topic, but won't you need at least the following for worst/pathological case:
  • IDT
  • #PF handler
  • Stack (#PF will push to it)
  • Current code page
  • Source data page
  • Destination data page
  • Page table for each page (assuming absolutely worst case where everything is spread so far apart that they are in separate page tables)
  • Page directory
For most of those you need to account for two pages for unaligned/page-boundary access. I'm not sure if there's some internals of the CPU you could take advantage of to reduce this list slightly, such as relying the cache on having old entries, but if that works it would at least be a horrible hack.

The OS dev could try to minimize the #PF handler and thus merge it with the IDT in the same page.

So I count:
  • IDT + #PF handler; 1 page (combined, which means at most ~4KiB for #PF handler)
  • Stack (#PF will push to it); 2 pages (page boundary)
  • Current code page; 2 pages (page boundary)
  • Source data page; 2 pages (page boundary)
  • Destination data page; 2 pages (page boundary)
  • Page table for each page; 9 pages (one for each of the above pages)
  • Page directory; 1 page
Giving a total of 19 pages = 72KiB of RAM (still under Brendans 256KiB).
I'm interested if anyone can come up with anything that must be added to that list or can be removed, keeping in mind absolute worst/pathological case..

edit. Forgot to list GDT, though that could possibly be on the same page as IDT and #PF handler..
Last edited by LtG on Fri Aug 19, 2016 8:59 am, edited 1 time in total.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Virtual Memory Goals

Post by Brendan »

Hi,
LtG wrote:Might get slightly off topic, but won't you need at least the following for worst/pathological case:
None of those things are part of a normal process (e.g. part of kernel). If you're not looking only at "minimum for normal process" then you're going to need a driver that's able to access swap space (and everything it depends on while accessing swap space).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Post Reply