Confused about Paging...

pcmattman · Post by **pcmattman** » Sun Jun 24, 2007 7:44 pm

I'm really confused about this whole paging concept. All that I understand, after hours of reading, is that it allows me to have multiple address spaces within the physical memory. I've looked at memory allocators, tried to understand them and failed.

How exactly am I meant to allocate pages (which I think is a good start), and then map the page directories and tables to the right places? And what order do I do things in when I have to create a new process in a new address space?

Kevin McGuire · Post by **Kevin McGuire** » Sun Jun 24, 2007 8:30 pm

What do you already know about paging?

pcmattman · Post by **pcmattman** » Sun Jun 24, 2007 8:35 pm

Not a lot.

I know that pages are 4 kb blocks of memory. I know that a page directory is full of page tables which are full of mapping information. I don't know much else.

frank · Post by **frank** » Sun Jun 24, 2007 8:43 pm

Have you looked at this tutorial?
http://www.osdever.net/tutorials/pdf/memory1.pdf

deadmutex · Post by **deadmutex** » Sun Jun 24, 2007 10:55 pm

I know that pages are 4 kb blocks of memory. I know that a page directory is full of page tables which are full of mapping information. I don't know much else.

Be careful here. A page directory contains page directory entries (PDEs). Each PDE contains information about a page table. Likewise, a page table contains page table entries(PTEs) that contain information about pages. It's important to note that a page table is not the same thing as a PDE. When I first started with paging, it took me a couple months to realize this which caused lots of confusion.

pcmattman · Post by **pcmattman** » Mon Jun 25, 2007 1:36 am

@frank: thanks heaps for that link, it was pretty much just what I was looking for. I don't know how I missed it earlier

@deadmutex: thanks for the tip

Bughunter · Post by **Bughunter** » Mon Jun 25, 2007 8:04 am

Let's see how a processor performs the translation of an address in this situation:

"mov [0x010B71A8], al" (in Intel syntax)

Let's translate the linear address 0x010B71A8

Code: Select all

0000 0001    0000 1011    0111 0001    1010 1000
   0    1       0    B       7    1       A    8

From left to right, we have bits 31 through 0.

The bits 22 through 31, which are 0000 0001 00, select the entry (read: page table) from the PD (Page Directory). So it uses the 4th PT (Page Table) from the PD. The first 10 bits are just an offset into the PD to select a PT.

The bits 12 through 21, which are 00 1011 0111, are an offset into the PT to select the page we're looking up.

And then, the other bits (bits 0 through 11) select the physical address into the page. (You could see it as an offset)

PAE is let out of the example to simplify the explanation of Linear Address Translation.

Let's clarify all this with an image from the Intel Manual (Architecture Software Developerâ€™s Manual Volume 3A: System Programming Guide, Part 1), see the attachment.

(Anyone, correct me if I'm wrong)

hailstorm · Post by **hailstorm** » Mon Jul 02, 2007 1:14 pm

Bughunter is right. Before you read the whole ' story' below I suggest you get the intel manuals at hand. The images in these manuals are self-explanatory.
B.t.w.; I hope you have the full understanding of pointers!

A pagedirectory / pagetable is just a big array of pointers to pages.
A pagedirectory is ofcourse a page itself and so are pagetables. That is why these tables need to lie on a pageboundary, there are some other reasons to, but that is not important right now.
But keep in mind, the entries of pagedirectory and pagetables are pointers to physical addresses. An entry of a pagedirectory should contain a valid physical address to a memory location when it needs to be accessed. The same here for pagetables. When the pagetable pointer is used for a lookup by the processor, the pageindex should contain a valid physical address.
The structure of paging can be confusing indeed. So, a bit of applying logic can become handy, so think about it when you can!

A page without any extension enabled is 4K in size. 2^12 = 4096. That means that 12 bits are reserved for the offset within the page. These 12 bits are the lower part of a virtual address. Next, we are left with 20 bits.
The page index is specified by bits 21..12, exactly 10 bits. 2^10=1024, the same number as the number pagetable entries. The other bits, 31..22, are meant to be used as pagedirectory index, which also contains 1024 entries. That is why a virtual address is split in three as:
[31..22][21..12][11..0] => [PDE INDEX][PTE INDEX][OFFSET IN PAGE]

So with paging enabled, the processor uses the pagedir and pagetable structures as one big index, like an index in a book for example. Make sure all pages needed are present by setting the present bit, make sure you can find the pagedirectory and it's pagetables and you are ready to go. The theory of paging pretty easy, the complications really can be found in memory management itself.

The reason to use paging should be obvious. Back in the old days for example, when computers didn't had much RAM, paging as a memory management solution was introduced. Programs grew bigger and so did the data that programs used. It just didn't fit anymore.

Paging comes with one big advantage: just let the process think it has all the memory in the world. In the background operating systems can swap pages in and out while the program runs in memory. When a page is needed but isn't available in memory, a pagefault is generated. The operating system can fetch the page that is needed, or allocate more memory, whatever the reason for the pagefault is.

Bughunter · Post by **Bughunter** » Mon Jul 02, 2007 1:44 pm

hailstorm wrote: So with paging enabled, the processor uses the pagedir and pagetable structures as one big index, like an index in a book for example.

Only if you leave the PSE (Page Size Extensions) bit of CR0 cleared. If you enable the PSE bit, you will have 4-MB pages. When you use 4-MB pages, you will have 1024 entries in your PD (Page Directory) which point to pages (and give information about the flags of course) instead of page tables.

hailstorm wrote: The reason to use paging should be obvious. Back in the old days for example, when computers didn't had much RAM, paging as a memory management solution was introduced. Programs grew bigger and so did the data that programs used. It just didn't fit anymore.

The main reason for paging to an OSdever should be to give each process (read: user application) a separate virtual memory addressing space, thus protecting each process' virtual memory address space from being corrupted by another process.

hailstorm · Post by **hailstorm** » Tue Jul 03, 2007 12:12 am

Ofcourse you're right, but for simplicity, I left page extension details out of my story, because the workings of the extensions are different from the basic model. B.t.w., when you have a 80386 at your hand, these extensions are not available.

The reason for implementing a mmu that supports paging, historically lies in the fact that memory was scarce. But, since pde's and pte's contain some management bits like the read/write-bit and superuser/user-bit, you can indeed protect each process virtual address space.
One note: I stronly agree with you that an OS developer should implement page protection...

Bughunter · Post by **Bughunter** » Tue Jul 03, 2007 3:04 am

hailstorm wrote:Ofcourse you're right, but for simplicity, I left page extension details out of my story, because the workings of the extensions are different from the basic model. B.t.w., when you have a 80386 at your hand, these extensions are not available.

Yeah I kindda thought you did that on purpose, but I thought it would be worth noticing it for completeness (e.g. maybe it helps someone searching the forum about paging)

pcmattman · Post by **pcmattman** » Thu Jul 05, 2007 7:58 pm

bughunter wrote:The main reason for paging to an OSdever should be to give each process (read: user application) a separate virtual memory addressing space, thus protecting each process' virtual memory address space from being corrupted by another process.

Another reason: you don't have to do messy relocation work on flat binaries (not so sure about ELF... I'm writing a relocation module for ELF anyway).

Bughunter · Post by **Bughunter** » Fri Jul 06, 2007 3:31 am

pcmattman wrote:
bughunter wrote:The main reason for paging to an OSdever should be to give each process (read: user application) a separate virtual memory addressing space, thus protecting each process' virtual memory address space from being corrupted by another process.
Another reason: you don't have to do messy relocation work on flat binaries (not so sure about ELF... I'm writing a relocation module for ELF anyway).

Why do you think it wouldn't be possible with ELF?

(I don't say it is or isn't, I don't know, just asking you

)

pcmattman · Post by **pcmattman** » Fri Jul 06, 2007 3:45 am

With ELF you have the program header (and section headers, and more) to contend with, you have to do relocation anyway. With a flat binary relocation is much harder, because you'd have to search for opcodes and then handle them appropriately. Giving an address space starting at 0 for each process helps.

Bughunter · Post by **Bughunter** » Fri Jul 06, 2007 4:16 am

Oh yes indeed, now I get it