Hi,
Multiple explanations are always good, as long as they aren't contradictory - so here goes.
'Mapping' in the mathematical sense is the creation of a relation between members of two sets. For two sets (A, B), the mapping f(A) 'maps' a value from the set A to a value in the set B. In mathematical terms:
For two values a ∈ A, b ∈ B, f(a) = b.
The function 'f' can in this sense also be termed a 'relation', and can be written in the more friendly manner:
f : a → b
Here we can see easier that f 'maps' a to b.
-------------------------------
OK, mathematical interlude over (I hope that answered your first question, by the way!), time to apply it.
Forget about 'page tables' for the moment. Just think about the function 'f' above. Above, it mapped something from set A to set B. Now, think about it as mapping addresses from the "virtual" set to the "physical" set.
The "physical" set represents the actual addresses that get put on the computer's address bus, so it includes all RAM and memory mapped devices. This is the address space you are used to if you have used asm without paging.
The "virtual" set of addresses is made up. The idea is that a program (the kernel and all other application programs - basically *everything* once paging is enabled) doesn't need to care about how physical memory is layed out. There may be holes of unusable address space or memory-mapped devices, and also every program would be able to trample data and code from other programs and the kernel!
So code in paging mode see a "virtual" address space. This is (excluding if segmentation is also used) a clean, 32-bit address space extending from 0-4GB (on 32-bit x86).
The kernel is then free to decide how the addresses it writes to and reads from (virtual addresses) map to physical addresses. It uses a function to do this.
Remember from above that a function can be defined as a series of relations:
f : a → b, c → d, e →f, e → g
So 'f' can be defined simply by a table of all possible inputs and their associated output (if any). In this case, this is called a
page table. All we need is a way of defining the function 'f' and to define what the inputs and outputs of it are.
Defining what each address maps to would require a lot of memory, so instead all systems divide the addressable range into fixed-size chunks called "pages", which on 32-bit x86 (without PSE) are 4KB, 4096 bytes in size. It is these pages that are the inputs and outputs to the mapping function. It takes as an input a virtual page number, and gives as output a physical page number. Please note that a virtual address may have
no associated physical address. This is completely valid behaviour, and it is expected that the processor catch this and cause a fault.
Even though the addressable range (0-4GB) is split up into 4K pages, that still leaves 4GB/4KB=0x100000 virtual → physical relations to store. Each relation needs 32-bits storage (for the physical page, 20bits, plus 12 bits of flags), so that's 4MB to store all the relations.
That's a lot, and it is entirely possible that large expanses of the address space are unmapped - so we're basically wasting a lot of memory. Remember that in an operating system each process can have its own address space so it can't mess up others' data, so that's 4MB per process! Because of this, intel decided on a two-level page table system.
That image is copyright James Molloy, i.e. me, so I can use it without fear of reprisal
At the top is the "page directory". This is one page large (4KB), and contains pointers to "page tables". Each pointer requires 32 bits, so 1024 page tables can be stored in the directory.
The "page table" is again 4KB large and contains the relations mentioned above, 32-bits for each, so 1024 "page table entries" per page table. In total, that's 1024*1024=0x100000 page table entries available, which is correct for covering 4GB of address space.
Let's say you want to find the page table entry for the virtual address "0x12345000", so page number "0x12345" (remember that addresses given to the page mapping function must be aligned on 4KB boundaries, so the last three digits of a hex address must be zero).
You'd first calculate which page table the address lies in. Each table holds 1024 entries, so 0x12345 / 1024 = 72.8173828. Thus, our entry is somewhere in the 72nd page table.
Once we have the page table, to find the entry number inside it we can use a simple modulo: 0x12345 % 1024 = 837.
Once you have that page table entry, you can change which (if any) physical address the virtual address "0x12345000" maps to, along with some access privilege flags. See the intel manuals or the wiki for more information about that
I cover this in more detail and possibly less maths if that interests you
here.
I hope this answers your question, if not please reply and let me know.
Cheers,
James