Page 1 of 1

Paging strategy for x86

Posted: Tue Dec 01, 2020 9:24 am
by kotovalexarian
Hello. What is the best paging strategy for x86 (32-bit)? I'm going to implement something like this:
  • Kernel has it's own page directory with identity mapping.
  • Each process has it's own page directory.
  • When process is created, it's sections are placed in random, non-consecutive places in physical memory, but in virtual memory they are mapped to places specified in ELF, so no/minimal relocation is required.
  • Switching between kernel and processes switch the current page directory.
Is this ok? Can you refer to some documentation or articles explaining how to switch page directories during task switching?

Re: Paging strategy for x86

Posted: Tue Dec 01, 2020 10:56 am
by Korona
Identity mapping the kernel doesn't make sense. At least, you want some kernel entry/exit stubs to be at the same location in all address spaces.

Re: Paging strategy for x86

Posted: Tue Dec 01, 2020 11:54 am
by nullplan
kotovalexarian wrote:What is the best paging strategy for x86 (32-bit)?
Define "good" in this case. One quick strategy to get off the ground is:
  • Kernel is mapped to higher half (2GB or 3GB, depends on how much you want to give userspace). Linear mapping (used to hate this, but in 32-bit address space there is just not enough space to have both a large linear mapping, and a proper kernel mapping)
  • Every process gets its own paging structure, but the kernel half is the same everywhere (different highest level structure, but the higher half references the same second level structures everywhere).
  • Processes are mapped with demand paging. That means, only the virtual addresses are reserved, the page tables actually all say "not present". When page faults inevitably happen, the process is put into non-interruptible sleep while the requested page (maybe even more than that) is loaded from disk and mapped into the address space. Processes have no fixed location in physical address space, they are loaded wherever is space when the page faults happen, and are evicted whenever space is waning.
  • Process context switch means switching CR3.
  • Kernel context switch means no additional work (beyond whatever interrupt mechanism you use). The kernel is always mapped, and always to the same address
There, that is basically your bog-standard UNIX clone. Considering this model is used in every mainstream OS to my knowledge, I think it is probably pretty successful. Probably a "good" scheme.
kotovalexarian wrote:When process is created, it's sections are placed in random, non-consecutive places in physical memory, but in virtual memory they are mapped to places specified in ELF, so no/minimal relocation is required
If the ELF type is ET_EXEC, no relocation is possible. If the ELF type is ET_DYN, relocation is required, but you can map the thing anywhere with a random offset. I will probably invite the wrath of bzt if I mention the dynamic linker should be a userspace executable named in the PT_INTERP ELF program header. He thinks putting such a complicated thing into the kernel is a good idea. Reasonable people might disagree.

Re: Paging strategy for x86

Posted: Tue Dec 01, 2020 12:01 pm
by thewrongchristian
kotovalexarian wrote:Hello. What is the best paging strategy for x86 (32-bit)? I'm going to implement something like this:
  • Kernel has it's own page directory with identity mapping.
  • Each process has it's own page directory.
  • When process is created, it's sections are placed in random, non-consecutive places in physical memory, but in virtual memory they are mapped to places specified in ELF, so no/minimal relocation is required.
  • Switching between kernel and processes switch the current page directory.
Is this ok? Can you refer to some documentation or articles explaining how to switch page directories during task switching?
Do you mean the kernel will have it's own, completely isolated, 4GB address space?

This is a reasonable strategy, especially from a security point of view, and provides the user process an entire 4GB address space to work with as well, but there are pitfalls:
  • Prior to PCID (tagged address spaces on x86 c.2015), switching address spaces will trash any non-global TLB entries.
  • Access to user pointers is not simple anymore. You'll have to map user to physical addresses in your kernel to access user data. May or may not be a problem, as you can add temporary maps to such data, but it adds complexity.
  • Sysenter/syscall don't provide a facility to switch address spaces, so you'll need at least a shared system call entry stub in each address space that can do that switch for you. Same with interrupt based system calls, using a task gate would require a hardware task switch to switch address spaces, and hardware task switching on x86 is slooow, so you'd still be better off with a shared stub area to do the switching for you.
  • Interrupts would be required to run in the kernel address space, so all your interrupts would have to be task gates (or stubs in shared stub area to switch the address spaces.) You don't really want to be trashing unrelated TLB entries as a result of hardware interrupts.
  • If a split user/kernel address space is a space burden for your userland, you might be better off using long mode instead and getting a 64-bit address space.

Re: Paging strategy for x86

Posted: Tue Dec 01, 2020 1:06 pm
by kotovalexarian
I've noticed your comments on kernel mapping. So, should I map the whole kernel similarly in every page directory?

UPD: Yes, https://wiki.osdev.org/Higher_Half_Kernel says exactly this.

Re: Paging strategy for x86

Posted: Tue Dec 01, 2020 4:55 pm
by foliagecanine
In theory, when any program is loaded, the Kernel page directory is cloned to create a new address space.
The kernel then uses this NEW page directory and loads it into the CR3. Then it loads the program, maps whatever pages it needs, and executes it.

Then when a process decides to fork, that process' page directory is cloned (the addresses in the tables must be changed of course). A new processed gets assigned the cloned page directory and another PID.

The kernel is mapped similarly in EVERY page directory, so that it can be accessed at all times. The kernel's tables are set to Ring0 only to prevent user processes changing it.

The physical memory layout of the information does not matter much, as long as you can allocate/map new pages. The only time you really need to worry about correct physical mapping is when you have something like MMIO.

At least, this is what I understand. If someone sees something wrong with what I've said, please correct it.