Questions about Paging (x86)

hexcoder · Post by **hexcoder** » Thu Oct 20, 2016 5:33 pm

Hi,

I've been getting paging working on my OS, and I've mostly been following James Molloy's tutorials on them. I'm extending his code but there are a few things I'm not sure how to handle.

1. Currently I'm parsing the multiboot memory map and allocating pages for all the reserved memory. This could clearly be quite expensive if there are large chunks of reserved memory. What's a good way of handling this? I was thinking a function which would take a physical memory address and a size, and allocate pages to facilitate that, while returning the virtual address. All the other reserved memory would remain unallocated until that function gets called.

2. If a page table has 1024 entries, each representing a 4096-byte chunk, that means that a maximum of 1024 * 4096 = 4194304 bytes can be addressed using one table alone. If process use a page table each, then does that mean that only a maximum of 4mb can be addressed at one time? (I would assume that the other pages are written out to disk as needed after triggering a page fault, but 4mb does seem like quite a small limit, does it not?)

3. The wiki states:

Many prefer to map the last PDE to itself. The page directory will look like a page table to the system.

I'm not sure quite what it means 'itself'. Does it mean that the last 4 megabytes of the total 4GB address space are identity-mapped? I don't quite see how this would provide any benefits.

4. Is there any good way to convert a physical address to a virtual one? I can see that the Linux kernel has one (I can't find out how it's implemented), although I'm not really sure why it would be needed. The only way I can see to do it would be to scan every page table entry for the specified virtual address, but that would be quite inefficient.

Mikumiku747 · Post by **Mikumiku747** » Thu Oct 20, 2016 5:55 pm

hexcoder wrote:3. The wiki states:
Many prefer to map the last PDE to itself. The page directory will look like a page table to the system.
I'm not sure quite what it means 'itself'. Does it mean that the last 4 megabytes of the total 4GB address space are identity-mapped? I don't quite see how this would provide any benefits.

When they say "itself", they refer to the physical address of the Page directory. This has the effect that the last 4K is mapped to the Page directory, and the last 4MB is mapped to all of the page tables in the system. It's a little confusing at first, but once you get the hang of it, it's not too hard.

For example, say you have a page directory at 0x00103000 Physical. You set the last entry of the page directory to 0x00103003, just as you would for any other page table (In this example case, write 0x00103000 to 0x00103FFC). Now, if you try to access any of the last 4K of virtual memory, it will access the page directory itself. To see how this works, imagine yourself as the CPU walking the page table. You need to translate 0xFFFFFF004 (the last page directory, last page table),, so the first thing you do is look at the page directory. We look in 0x00103FFC Physical, and see that the last page table for this address is located at 0x00103000, so we now we look at that as the page table. The last entry in the page table is at 0x00103FFC, so we look at that, and see that it contains the address 0x00103000, so our final result is that we translate 0xFFFFF004 to 0x00103004.. Try translating some other addresses by hand and see if you can figure out how it does those as well.

As you'll notice, the neat thing about this is that it works regardless of where the page directory is stored. As long as you make the last entry of the page directory the address of the page directory itself, you can edit any part of the paging structure without having to know where the parts are actually located. This is why, in the paging wiki article, they seem to use magic constant addresses like 0xFFFFF000 and 0xFFC00000.

Again, it's hard to explain and hard to understand the first time you encounter it, but if you work through it step by step, you should get the hang of it and understand how useful recursive page mapping can be.

- Mikumiku747

Octocontrabass · Post by **Octocontrabass** » Fri Oct 21, 2016 6:41 am

hexcoder wrote:I've been getting paging working on my OS, and I've mostly been following James Molloy's tutorials on them.

Have you checked the list of known bugs yet?

hexcoder wrote:1. Currently I'm parsing the multiboot memory map and allocating pages for all the reserved memory. This could clearly be quite expensive if there are large chunks of reserved memory. What's a good way of handling this?

Expensive how? If it's speed you're worried about, remind yourself that initializing the PMM only happens once. You can always optimize it later, when you come up with a better PMM than the one in the tutorial.

hexcoder wrote:2. If a page table has 1024 entries, each representing a 4096-byte chunk, that means that a maximum of 1024 * 4096 = 4194304 bytes can be addressed using one table alone. If process use a page table each, then does that mean that only a maximum of 4mb can be addressed at one time? (I would assume that the other pages are written out to disk as needed after triggering a page fault, but 4mb does seem like quite a small limit, does it not?)

Normally, each process has its own page directory, and you add page tables to that page directory as needed.

hexcoder wrote:3. Does it mean that the last 4 megabytes of the total 4GB address space are identity-mapped? I don't quite see how this would provide any benefits.

It means the last 4 MB of the 4 GB address space is the page tables. It's a clever trick that relies on the fact that page directories and page tables have pretty much the same internal format, so you can use the page directory as if it were a page table, which makes each page table into a page that you can directly access.

hexcoder wrote:4. Is there any good way to convert a physical address to a virtual one?

Physical addresses can be mapped to multiple virtual addresses, so it's not really possible to convert a physical address to a virtual address. Mapping a physical address to a virtual address is required for memory-mapped I/O, so you'll need some way to do that.

sebihepp · Post by **sebihepp** » Fri Oct 21, 2016 6:42 am

hexcoder wrote:2. If a page table has 1024 entries, each representing a 4096-byte chunk, that means that a maximum of 1024 * 4096 = 4194304 bytes can be addressed using one table alone. If process use a page table each, then does that mean that only a maximum of 4mb can be addressed at one time? (I would assume that the other pages are written out to disk as needed after triggering a page fault, but 4mb does seem like quite a small limit, does it not?)

That's why I use an entire PageDirectory plus mostly independent PageTables - one per process for example.

hexcoder wrote:4. Is there any good way to convert a physical address to a virtual one? I can see that the Linux kernel has one (I can't find out how it's implemented), although I'm not really sure why it would be needed. The only way I can see to do it would be to scan every page table entry for the specified virtual address, but that would be quite inefficient.

Well, that's the only way I currently know. The Problem is, that many virtual adresses can point to the same physical adress. I implemented a function for this in my kernels paging-class. But I think I use it only in one case while initializing everything. You can speed up the search, if you ignore all those page tables, that aren't mapped in the page directory.

hexcoder · Post by **hexcoder** » Fri Oct 21, 2016 7:47 am

Octocontrabass wrote:
hexcoder wrote:1. Currently I'm parsing the multiboot memory map and allocating pages for all the reserved memory. This could clearly be quite expensive if there are large chunks of reserved memory. What's a good way of handling this?
Expensive how? If it's speed you're worried about, remind yourself that initializing the PMM only happens once. You can always optimize it later, when you come up with a better PMM than the one in the tutorial.

I wasn't really talking about speed, it just seems a little wasteful of memory. If you have one gigabyte of reserved space then you would need about 1 megabyte of page table information to cover that.

Octocontrabass wrote:
hexcoder wrote:2. If a page table has 1024 entries, each representing a 4096-byte chunk, that means that a maximum of 1024 * 4096 = 4194304 bytes can be addressed using one table alone. If process use a page table each, then does that mean that only a maximum of 4mb can be addressed at one time? (I would assume that the other pages are written out to disk as needed after triggering a page fault, but 4mb does seem like quite a small limit, does it not?)
Normally, each process has its own page directory, and you add page tables to that page directory as needed.

Ah, that was my misunderstanding then.

Thank you everyone for your replies, this is much clearer now.

Octocontrabass · Post by **Octocontrabass** » Fri Oct 21, 2016 7:55 am

hexcoder wrote:I wasn't really talking about speed, it just seems a little wasteful of memory. If you have one gigabyte of reserved space then you would need about 1 megabyte of page table information to cover that.

Why are you allocating it? You only need to mark it as "not free" in your physical memory manager so it won't be used by later allocations, the same way you handle empty portions of the memory map.

hexcoder · Post by **hexcoder** » Fri Oct 21, 2016 11:19 am

Octocontrabass wrote:
hexcoder wrote:I wasn't really talking about speed, it just seems a little wasteful of memory. If you have one gigabyte of reserved space then you would need about 1 megabyte of page table information to cover that.
Why are you allocating it? You only need to mark it as "not free" in your physical memory manager so it won't be used by later allocations, the same way you handle empty portions of the memory map.

Ah, I see. That sounds like a good idea. But, in that case, you still need to do something to get the correct pages actually allocated. For example in my ACPI code I need to chase some pointers into reserved memory, but how would I ensure that the correct pages are in place before accessing those areas at run-time? I suppose the easiest solution would be to identity map a large chunk around that area, but since you can't easily know ahead of time the exact size of the tables it would be a little tricky.

simeonz · Post by **simeonz** » Fri Oct 21, 2016 12:32 pm

hexcoder wrote:1. Currently I'm parsing the multiboot memory map and allocating pages for all the reserved memory. This could clearly be quite expensive if there are large chunks of reserved memory. What's a good way of handling this? I was thinking a function which would take a physical memory address and a size, and allocate pages to facilitate that, while returning the virtual address. All the other reserved memory would remain unallocated until that function gets called.

You can reduce the memory consumption for tables by using large pages. Those are 2MB in size and thus enable you to map 1GB by using only one page table in x64. The TLB efficiency improves critically. Note that this does not work around the issue of reserved regions, but alleviates it. You can still exclude the large reserved regions from the mapping. Then, it will be useful to have the page frame control structures placed in linear correspondence to the virtual addresses of the physical memory, not to the physical addresses themselves. Also, this does not translate to user space tables easily. It is very convenient for the kernel space physical memory mapping.

hexcoder wrote:2. If a page table has 1024 entries, each representing a 4096-byte chunk, that means that a maximum of 1024 * 4096 = 4194304 bytes can be addressed using one table alone. If process use a page table each, then does that mean that only a maximum of 4mb can be addressed at one time? (I would assume that the other pages are written out to disk as needed after triggering a page fault, but 4mb does seem like quite a small limit, does it not?)

I am not sure about this, but I believe mainstream OSes page-in/out the page tables themselves, like any other piece of process memory. I think only the top level directory has to be present. In other words, the limit on the addressable virtual memory of a process, and the physical memory used for its page tables do not necessarily correspond. There is a paradox in doing this, since paging out some data may require paging in several page tables.

hexcoder wrote:4. Is there any good way to convert a physical address to a virtual one? I can see that the Linux kernel has one (I can't find out how it's implemented), although I'm not really sure why it would be needed. The only way I can see to do it would be to scan every page table entry for the specified virtual address, but that would be quite inefficient.

To put things in perspective, there are two types of kernels in this regard -those that linearly map the physical memory on boot, and those that only map particular physical memory locations for control structures (CPU and kernel) and leave the data (such as file cache) to be mapped on demand. As you may have guessed, Linux is of the former type, and Windows is of the latter. The primary influence is in how file caching works, although portions of the OS architecture are impacted by this choice.

So, Linux has facility to convert physical memory addresses to corresponding kernel space addresses (i.e. phys_to_virt), which due to the nature of the aforementioned linear mapping costs only a simple arithmetic calculation. This serves to operate on global kernel structures, caches, etc.. The threads don't need to manipulate page tables when they want to access any physical memory location, be it in any other process or a global kernel structure - instead they only call the address conversion function (which is inlined by the way) and work with the resulting pointer.

Paging out also requires conversion between physical and virtual addresses. The process PTEs have to be marked invalid. There are multiple cases here.

The page has no corresponding pte, which means that it is not mapped in any address space (aside from the kernel window to physical memory). This really means that the page is file backed and is left to cache the data in memory for future processes. But it also means that provided it is not dirty, the page can be re-purposed for different usage immediately without any page table manipulations.
A single process maps this page. This requires one back reference from the page control structure to the process PTE to find it and invalidate it, but see point 3.
The page is shared between multiple processes, in which case the PTEs inside several tables must be invalidated. This can be done with a list or vector of the relevant PTEs for each page, but due to the added memory traffic, thus TLB traffic during page-out, the method can be undesirable. Instead, the page control structure has means to access a memory mapping interval tree, holding descriptor for each mapped region within each process. For the case of a single mapping (case 2), this is slower than simply referencing the relevant PTE, but still works in constant time (more or less).

The reverse lookup for paging out is provided by the rmap_walk function in Linux, in case you want to look it up.

Octocontrabass · Post by **Octocontrabass** » Fri Oct 21, 2016 4:42 pm

hexcoder wrote:For example in my ACPI code I need to chase some pointers into reserved memory, but how would I ensure that the correct pages are in place before accessing those areas at run-time? I suppose the easiest solution would be to identity map a large chunk around that area, but since you can't easily know ahead of time the exact size of the tables it would be a little tricky.

You can't know the size of all the tables ahead of time, but you can map the minimum necessary to read the "length" field in the table, and map the rest of the table after you know how big it is.

hexcoder · Post by **hexcoder** » Fri Oct 21, 2016 6:56 pm

Octocontrabass wrote:
hexcoder wrote:For example in my ACPI code I need to chase some pointers into reserved memory, but how would I ensure that the correct pages are in place before accessing those areas at run-time? I suppose the easiest solution would be to identity map a large chunk around that area, but since you can't easily know ahead of time the exact size of the tables it would be a little tricky.
You can't know the size of all the tables ahead of time, but you can map the minimum necessary to read the "length" field in the table, and map the rest of the table after you know how big it is.

You are right, and that is what I probably should have done. For now I am just gathering all the ACPI values before enabling paging so I don't need to worry about that. I suppose I will need some functionality like that if I end up writing other drivers that need continuous access to non-standard physical memory locations.

Octocontrabass · Post by **Octocontrabass** » Sat Oct 22, 2016 12:36 am

hexcoder wrote:I suppose I will need some functionality like that if I end up writing other drivers that need continuous access to non-standard physical memory locations.

Most hardware in a modern PC uses dynamically-assigned portions of the physical address space, so you'll need it for most drivers anyway.

OSDev.org

Questions about Paging (x86)

Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)

Re: Questions about Paging (x86)