Clarification on some aspects of paging.

Warrior · Post by **Warrior** » Thu Nov 10, 2005 8:58 pm

Hello,
I have just finished reading the entire section on paging in the Intel manuals
(Many times T_T) and would like some verification that this is correct.

For my paging mechanism on initiation I am going to do something like:

Map all the user memory as RW, U, P
Map some global memory for my libraries etc (Does the "Global" flag do this or should I just make it U, RW, P)
Map the kernel memory as R,S,P
Load the PDBR into CR3
Enable paging

Now a few questions arrise before implementation: Bits 12-31 in the PDEs and PTEs show
the address field now is this virtual of physical? I don't get where the phys2virt
conversion goes on or how to map physical addresses to virtual ones.

Also which "permissions" take priority over what? eg(If a PDE is set as RW, U can a PTE in
the PDE be set as R, S?)

When in ring0 does the paging mechanism even do permission checks? Or can the kernel
access anything regardless of what flags are set?

Now can someone please explain to me when a VMM might become useful?
The only thing I can consider it for is finding a physical adddress for a page and
adding it to the PDE then Invalidating the PTE with whatever flags. Maybe as a sub
function of a "MapPage(phys,virt,flags)" kind of thing.

That's all I can think of for now. But hey atleast I'm clearer on paging than I was
a few months ago.

Thanks,
Nelson

Candy · Post by **Candy** » Fri Nov 11, 2005 1:49 am

Nelson wrote: Map all the user memory as RW, U, P

You can map the code as R only, which helps prevent corrupting code etc. Also, you can mark the data NX (which is an AMD64 / EM64T feature).

Map some global memory for my libraries etc (Does the "Global" flag do this or should I just make it U, RW, P)

Map it in every address space at the same location and map it like normal memory, but with global flag. The global flag indicates to the TLB buffer that it should consider these always present.

Map the kernel memory as R,S,P

That's unpractical. Map the code rsp, map the data rwsp.

Now a few questions arrise before implementation: Bits 12-31 in the PDEs and PTEs show
the address field now is this virtual of physical?

This is physical. You'd need a page table to translate them...

I don't get where the phys2virt
conversion goes on or how to map physical addresses to virtual ones.

Say, you have virtual address 0x12345678 in non-PAE mode. Then, 0x123 >> 2 = 0x048 is the index into the PDT (CR3, physical address). It looks up which page is referenced there (physical address) and looks in it for item 0x345 & 0x3FF = 0x345. That item references a physical page. In that physical page it references the address 0x678, which is the byte you wanted.

You have to self-map the page tables and directories, that is, make them point to themselves. The easiest way is at the top of memory. Place the PDT at 0xFFFFF000. It doubles as PT for itself, and as well as page for itself. Just by creating that one entry can you use it to add to the page tables.

Also which "permissions" take priority over what? eg(If a PDE is set as RW, U can a PTE in
the PDE be set as R, S?)

Restrictions take priority over permissions. U / R + S / RW = S / R.
It doesn't matter at which level these permissions are set, just that they are somewhere in the tree.

When in ring0 does the paging mechanism even do permission checks? Or can the kernel
access anything regardless of what flags are set?

The kernel can not access readonly paged memory. With a given bit it can do this for userlevel memory. There are more, iirc, but too many to list. See the one document you've been reading already

.

Now can someone please explain to me when a VMM might become useful?
The only thing I can consider it for is finding a physical adddress for a page and
adding it to the PDE then Invalidating the PTE with whatever flags. Maybe as a sub
function of a "MapPage(phys,virt,flags)" kind of thing.

Keeping track of free pages, mapping and unmapping pages, keeping track of pages shared between processes (COW, copy on write), allowing pages to be swapped out...

Good luck on getting that in one function (and if you succeed, it won't be good

).

AR · Post by AR » Fri Nov 11, 2005 6:00 am

Candy wrote:
I don't get where the phys2virt
conversion goes on or how to map physical addresses to virtual ones.
Say, you have virtual address 0x12345678 in non-PAE mode. Then, 0x123 >> 2 = 0x048 is the index into the PDT (CR3, physical address). It looks up which page is referenced there (physical address) and looks in it for item 0x345 & 0x3FF = 0x345. That item references a physical page. In that physical page it references the address 0x678, which is the byte you wanted.

And in simpler terms: VirtualAddress = (PDE * 1024 + PTE) * 4096
So PageDirectory[1]->PageTable[1] = (1 * 1024 + 1) * 4096 = 4,198,400 bytes

When in ring0 does the paging mechanism even do permission checks? Or can the kernel
access anything regardless of what flags are set?
The kernel can not access readonly paged memory. With a given bit it can do this for userlevel memory. There are more, iirc, but too many to list. See the one document you've been reading already .

IIRC, you've got that backwards, Ring 0 code ignores the user/supervisor and readonly/writable unless you enable the 'Kernel level paging protection' feature which you need to find with CPUID and enable by setting a bit in one of the control registers, present is checked but I don't know about NX.

Warrior · Post by **Warrior** » Fri Nov 11, 2005 6:31 am

Candy wrote: Say, you have virtual address 0x12345678 in non-PAE mode.
Then, 0x123 >> 2 = 0x048 is the index into the PDT (CR3, physical address).
It looks up which page is referenced there (physical address) and looks in it for item 0x345 & 0x3FF = 0x345.
That item references a physical page.
In that physical page it references the address 0x678, which is the byte you wanted.

You have to self-map the page tables and directories, that is, make them point to themselves.
The easiest way is at the top of memory. Place the PDT at 0xFFFFF000.
It doubles as PT for itself, and as well as page for itself.
Just by creating that one entry can you use it to add to the page tables.

So then would it be more practical implementation wise to map entire ranges of memory instead of each
one individually. For example have something like "MapPageRange(Start, End, Phys, Virt, Flags)" so that you split up the
address into the fields needed and not necessarilly adding entries in order created but according to which address it is.
Hmm, thats very easy to understand!

Thanks.

Candy wrote: Keeping track of free pages, mapping and unmapping pages, keeping track of pages shared between processes
(COW, copy on write), allowing pages to be swapped out...

Good luck on getting that in one function (and if you succeed, it won't be good ).

Ah I didn't plan to put that all in one function maybe I was thinking outloud but anyway as said above
I may have a more practical implementation.

AR wrote: IIRC, you've got that backwards, Ring 0 code ignores the user/supervisor
and readonly/writable unless you enable the 'Kernel level paging protection'
feature which you need to find with CPUID and enable by setting a bit in one of the control registers,
present is checked but I don't know about NX.

Would this be practical to enable if detected as a supported feature? I think I can see this protecting
the kernel from the user and the user from the kernel as well which might not be such a bad thing

Thanks for your help Candy and AR

!

AR · Post by AR » Fri Nov 11, 2005 6:43 am

IIRC, you've got that backwards, Ring 0 code ignores the user/supervisor and readonly/writable unless you enable the 'Kernel level paging protection' feature which you need to find with CPUID and enable by setting a bit in one of the control registers, present is checked but I don't know about NX.
Would this be practical to enable if detected as a supported feature? I think I can see this protecting the kernel from the user and the user from the kernel as well which might not be such a bad thing

Ring 3 code does use the protection flags (ie. user code cannot ever access a page that is set to "supervisor", nor can it ever write to a page marked "readonly" but in Ring 0 [the kernel and the kernel only] do those have no bearing at all). The protection feature I mentioned only applies to force the CPU to use Readonly/Writable in kernel mode (Again, I don't know about NX though).

JAAman · Post by **JAAman** » Fri Nov 11, 2005 9:42 am

NX/XD/PE/Feature_With_A_Lot_Of_Names, is checked when the CPU loads a page into the instruction TLB: if NX is enabled, and set, it will refuse to load the page into the TLB (all other checks (except P) are made after it enters the TLB -- P & NX/XD/PE are the only permisions that cause the page to fail to load into the TLB), causing a #PF (AMD#2, pg173 (rev. 3.07))

Candy · Post by **Candy** » Fri Nov 11, 2005 11:27 am

JAAman wrote: NX/XD/PE/Feature_With_A_Lot_Of_Names, is checked when the CPU loads a page into the instruction TLB: if NX is enabled, and set, it will refuse to load the page into the TLB (all other checks (except P) are made after it enters the TLB -- P & NX/XD/PE are the only permisions that cause the page to fail to load into the TLB), causing a #PF (AMD#2, pg173 (rev. 3.07))

that would be more like :

Code: Select all

if (type == code && NX) || (!p) then
  cause_pf();
else
  ....
end if;

yes, the code is kind of pascal-ish. I've been learning vhdl so all my pseudocode has turned pascal-ish as well.

The NX bit doesn't prevent the tlb from loading data pages. If AMD says otherwise, it's a pretty pointless feature if also applied to the data tlb.

On the R/W + U/S + rw_in_s bit, if all is usually writable for the kernel and given that bit the userlevel bit is marked readonly, what's the entire point of marking something R/S ? It'd be a readonly thing that nothing would ever enforce. My guess is thus that that is an error, so that R/S for kernel-level would always be enforced.

Didn't test it though.

JAAman · Post by **JAAman** » Fri Nov 11, 2005 2:06 pm

no, the CPU has 2 different types of TLBs:
-instruction TLBs- and
-data TLBs-

edit: Intel vol. 3 pg 10-1 & 2

if you want to read/write data to a page that your currently executing, the page table must be reread into the data TLB to use it
and a data page table must be reloaded into a instruction TLB to be executed

so its more like

Code: Select all

if (((destinationTLB == codeTLB) && NX) || (!P)) then
        #PF
else
        destinationTLB=LoadPageTable(address)
endif

all other permisions are calculated on the stored values already in the appropriate TLB

however I have heard that some CPUs (not sure which one) DO store (!P) entries in TLBs(and thus require INVLPG on a (!P) page!)

AR · Post by AR » Fri Nov 11, 2005 8:27 pm

Candy wrote: On the R/W + U/S + rw_in_s bit, if all is usually writable for the kernel and given that bit the userlevel bit is marked readonly, what's the entire point of marking something R/S ? It'd be a readonly thing that nothing would ever enforce. My guess is thus that that is an error, so that R/S for kernel-level would always be enforced.

I didn't quite follow that, but the point of clearing Writable and User (R/S) is precisely none, both flags are ignored in Ring 0. You need to enable the Write Protect flag I was referring too, only when that is on does the writable flag matter in Ring 0.

Write Protect (bit 16 of CR0). Inhibits supervisor-level procedures from writing into user-level read-only pages when set; allows supervisor-level procedures to write into user-level read-only pages when clear. This flag facilitates implementation of the copy-on-write method of creating a new process (forking) used by operating systems such as UNIX*.

(IA-32 Intel Architecture Software Developer?s Manual Volume 3: System Programming Guide, pg 2-13)

The page-level protection mechanism recognizes two page types:
Read-only access (R/W flag is 0).

Read/write access (R/W flag is 1).
When the processor is in supervisor mode and the WP flag in register CR0 is clear (its state following reset initialization), all pages are both readable and writable (write-protection is
ignored). When the processor is in user mode, it can write only to user-mode pages that are read/write accessible. User-mode pages which are read/write or read-only are readable; supervisor-mode pages are neither readable nor writable from user mode. A page-fault exception is generated on any attempt to violate the protection rules.

(IA-32 Intel Architecture Software Developer?s Manual Volume 3: System Programming Guide, pg 4-31)

Warrior · Post by **Warrior** » Fri Nov 11, 2005 9:29 pm

Thanks for all the help guys. Now comes the time of implementation. I better get my reading glasses on and a nice tall one since this is looking to be a long night.

Warrior · Post by **Warrior** » Sat Nov 12, 2005 1:10 am

Okay I get how the virtual addresses are split up to find indexes in the PDE and PTEs, now when mapping do I put the full (physical)address wanted to be mapped in both the PDE and the PTE?

Candy · Post by **Candy** » Sat Nov 12, 2005 1:42 am

Nelson wrote: Okay I get how the virtual addresses are split up to find indexes in the PDE and PTEs, now when mapping do I put the full (physical)address wanted to be mapped in both the PDE and the PTE?

You put the physical page addresses in the PT, you put the PT physical addressses in the PD and you put the PD physical address in CR3. Then, enable paging.

Oh, and do self-map your paging structures. You'll want access to them too (and this is where the fun starts

). If you map it at 0xFFFFF000 and thus set the 0x3FFth entry to itself, you can for mapping a page just map the PT (if necessary) and then map the page itself.

Warrior · Post by **Warrior** » Sat Nov 12, 2005 9:51 am

From Intel Manual Volume 3: System Programming
Section 3.6.4

(Page-table entries for 4-KByte pages.) Specifies the physical address of the
first byte of a 4-KByte page. The bits in this field are interpreted as the 20 mostsignificant
bits of the physical address, which forces pages to be aligned on
4-KByte boundaries.
(Page-directory entries for 4-KByte page tables.) Specifies the physical
address of the first byte of a page table. The bits in this field are interpreted as
the 20 most-significant bits of the physical address, which forces page tables to
be aligned on 4-KByte boundaries.

Let me summerize what I know:

Linear Address translation splits up the address and finds the index to PDEs and PTEs. Okay.
PHYSICAL addresses of the Page table go into the PDE and PHYSICAL addresses of the Page itself go into the PTE

What I am confused about is say you have

Code: Select all

int MapMemory(unsigned long phys, unsigned long virt, unsigned long flags);

How would I get the Physical address to "point" to the virtual addresses upon translation?

Thanks,
Nelson

Candy · Post by **Candy** » Sat Nov 12, 2005 10:49 am

Nelson wrote: Linear Address translation splits up the address and finds the index to PDEs and PTEs. Okay.
PHYSICAL addresses of the Page table go into the PDE and PHYSICAL addresses of the Page itself go into the PTE

What I am confused about is say you have
Code: Select all
int MapMemory(unsigned long phys, unsigned long virt, unsigned long flags);
How would I get the Physical address to "point" to the virtual addresses upon translation?

Thanks,
Nelson

Code: Select all

int MapMemory(unsigned long phys, unsigned long virt, unsigned long flags) {
  unsigned int *pd = (unsigned int *)0xFFFFF000;
  unsigned int *pt = (unsigned int *)0xFF300000;
  int pd_offs = virt >> 22;
  int pt_offs = virt >> 12;

  // if there's no page table present
  if (pd[pd_offs] & 1 == 0) {
    // map one
    MapMemory(GetFreePage(), 0xFF300000 + (pd_offs << 12), some_flags);
  }
  // if there is a page present
  if (pt[pt_offs] & 1 == 1) {
    // unmap it and remap this one, or something else.
    AddFreePage(UnMapMemory(virt));
  }

  pt[pt_offs] = (phys & 0xFFFFF000) | flags;
  return 0;
}

Check out atlantisos' memory management, it does this pretty much exactly this way. (in src/kernel/core/kmm I think)

Warrior · Post by **Warrior** » Sat Nov 12, 2005 11:54 am

Okay that clears it up.

Maybe once I get atleast basic paging working, I can implement things like self mapping the Page tables and Page Directories.

Is there a difference between a page being global and simply mapping it into every processes address space? I know you set a bit in a control register and put a flag in the PTE (I think, I'll have to check my notes) but other than that I fail to see any difference.

Time to get coding

. Thanks Candy.

OSDev.org

Clarification on some aspects of paging.

Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.

Re:Clarification on some aspects of paging.