hexcoder wrote:1. Currently I'm parsing the multiboot memory map and allocating pages for all the reserved memory. This could clearly be quite expensive if there are large chunks of reserved memory. What's a good way of handling this? I was thinking a function which would take a physical memory address and a size, and allocate pages to facilitate that, while returning the virtual address. All the other reserved memory would remain unallocated until that function gets called.
You can reduce the memory consumption for tables by using large pages. Those are 2MB in size and thus enable you to map 1GB by using only one page table in x64. The TLB efficiency improves critically. Note that this does not work around the issue of reserved regions, but alleviates it. You can still exclude the large reserved regions from the mapping. Then, it will be useful to have the page frame control structures placed in linear correspondence to the virtual addresses of the physical memory, not to the physical addresses themselves. Also, this does not translate to user space tables easily. It is very convenient for the kernel space physical memory mapping.
hexcoder wrote:2. If a page table has 1024 entries, each representing a 4096-byte chunk, that means that a maximum of 1024 * 4096 = 4194304 bytes can be addressed using one table alone. If process use a page table each, then does that mean that only a maximum of 4mb can be addressed at one time? (I would assume that the other pages are written out to disk as needed after triggering a page fault, but 4mb does seem like quite a small limit, does it not?)
I am not sure about this, but I believe mainstream OSes page-in/out the page tables themselves, like any other piece of process memory. I think only the top level directory has to be present. In other words, the limit on the addressable virtual memory of a process, and the physical memory used for its page tables do not necessarily correspond. There is a paradox in doing this, since paging out some data may require paging in several page tables.
hexcoder wrote:4. Is there any good way to convert a physical address to a virtual one? I can see that the Linux kernel has one (I can't find out how it's implemented), although I'm not really sure why it would be needed. The only way I can see to do it would be to scan every page table entry for the specified virtual address, but that would be quite inefficient.
To put things in perspective, there are two types of kernels in this regard -those that linearly map the physical memory on boot, and those that only map particular physical memory locations for control structures (CPU and kernel) and leave the data (such as file cache) to be mapped on demand. As you may have guessed, Linux is of the former type, and Windows is of the latter. The primary influence is in how file caching works, although portions of the OS architecture are impacted by this choice.
So, Linux has facility to convert physical memory addresses to corresponding kernel space addresses (i.e. phys_to_virt), which due to the nature of the aforementioned linear mapping costs only a simple arithmetic calculation. This serves to operate on global kernel structures, caches, etc.. The threads don't need to manipulate page tables when they want to access any physical memory location, be it in any other process or a global kernel structure - instead they only call the address conversion function (which is inlined by the way) and work with the resulting pointer.
Paging out also requires conversion between physical and virtual addresses. The process PTEs have to be marked invalid. There are multiple cases here.
- The page has no corresponding pte, which means that it is not mapped in any address space (aside from the kernel window to physical memory). This really means that the page is file backed and is left to cache the data in memory for future processes. But it also means that provided it is not dirty, the page can be re-purposed for different usage immediately without any page table manipulations.
- A single process maps this page. This requires one back reference from the page control structure to the process PTE to find it and invalidate it, but see point 3.
- The page is shared between multiple processes, in which case the PTEs inside several tables must be invalidated. This can be done with a list or vector of the relevant PTEs for each page, but due to the added memory traffic, thus TLB traffic during page-out, the method can be undesirable. Instead, the page control structure has means to access a memory mapping interval tree, holding descriptor for each mapped region within each process. For the case of a single mapping (case 2), this is slower than simply referencing the relevant PTE, but still works in constant time (more or less).
The reverse lookup for paging out is provided by the rmap_walk function in Linux, in case you want to look it up.