Actual context switch.

stonedzealot · Post by **stonedzealot** » Mon Sep 20, 2004 3:27 pm

I'm ready to jump to a DPL 3 "user" process and I've got it's whole context set up (i.e. I've created a pseudo page directory and associated tables, and bound the pages) but I can't figure out just when or how you actually change CR3... As far as I can tell, there is no place to tuck away an integer so it automatically loads it upon transitioning to DPL 3... so where exactly is CR3 changed?

Dreamsmith · Post by **Dreamsmith** » Mon Sep 20, 2004 8:20 pm

A change in the DPL should not trigger a change in the PDBR (CR3). The PDBR should change during only one of these two events, depending on whether you're using hardware or software task switching: [1] When you ljmp to a new task segment descriptor, in which case the new PDBR value is loaded from the TSS, or [2] when you explicitly load it as part of your task switching routine. I use something like this:

Code: Select all

INLINE void setPDBR(DWord pdbr)
{
   __asm__ __volatile__("mov %0,%%cr3" : : "r"(pdbr.u));
}

stonedzealot · Post by **stonedzealot** » Mon Sep 20, 2004 8:35 pm

But if I change the PDBR while in execution of kernel code and that code isn't included in the new context, won't it page fault? Or does the kernel need to be included in every context (which would seem to make the whole idea much less useful, though not useless)?

Dreamsmith · Post by **Dreamsmith** » Mon Sep 20, 2004 8:49 pm

wangpeng wrote:But if I change the PDBR while in execution of kernel code and that code isn't included in the new context, won't it page fault?

Yes, assuming you're using method [2] (software task switching). If you use method [1] (hardware task switching), this isn't the case as a task switch becomes atomic.

wangpeng wrote:Or does the kernel need to be included in every context (which would seem to make the whole idea much less useful, though not useless)?

Most operating systems have the kernel "parasitically" ride on top of every process, so no matter which task is running, the kernel code is mapped in. In what way does this make the idea any less useful?

stonedzealot · Post by **stonedzealot** » Mon Sep 20, 2004 9:25 pm

I dunno, it's just that I was hoping to be able to make a complete severance from the kernel except via the interrupt handlers. While that would indeed be magical, it doesn't really effect the usefulness at all, so nevermind. :-X

Anyway, thanks alot Dreamsmith. I'm sure this'll get me off the ground in no time.

HOS · Post by **HOS** » Tue Sep 21, 2004 12:43 pm

Brings up a Q i've had for a little while...

If i make a copy of the page directory for a new task and i want the kernel's memory mapped into this new process, the appropriate page directory entries are already there (for addresses >= 0xC000_0000). these entries point to page tables which can be changed and invalidated so that the kernel could allocate new memory and the new region would still be mapped appropriately in all the processes address spaces. however, if a new page directory entry had to be created, wouldn't this entry have to be manually copied to each page directory in use by every process? or is there any better way to do this?

thanks

Dreamsmith · Post by **Dreamsmith** » Tue Sep 21, 2004 9:06 pm

HOS wrote:... however, if a new page directory entry had to be created, wouldn't this entry have to be manually copied to each page directory in use by every process? or is there any better way to do this?

Yes, it would have to be copied into each page directory, and AFAIK, the only better way to do it is to make sure you never need to create new kernel page directory entries. Allocate them up front, so that they're already there, even if they point to completely empty page tables.

This works for me, but then my kernel is at 0xFC000000. For people with piggy kernels reserving everything from 0xC0000000 on up, that may be too many empty page tables to have lying around. In that case, copying when you need to may be the lesser evil. How do you people with 1GB+ of potential kernel space handle this?

Brendan · Post by **Brendan** » Tue Sep 21, 2004 9:16 pm

Hi,

HOS wrote: If i make a copy of the page directory for a new task and i want the kernel's memory mapped into this new process, the appropriate page directory entries are already there (for addresses >= 0xC000_0000). these entries point to page tables which can be changed and invalidated so that the kernel could allocate new memory and the new region would still be mapped appropriately in all the processes address spaces. however, if a new page directory entry had to be created, wouldn't this entry have to be manually copied to each page directory in use by every process? or is there any better way to do this?

For the original paging system you do have to manually update every page directory when you change page directory entries in the kernel's part of the address space/s. To make this easier I normally map every page directory into all address spaces (as if they are pages), so that I end up with a big array of page directory entries

.

If you use PAE then there's 4 page directories, and you can use a whole/single page directory for kernel space. This "kernel page directory" can be placed into every page directory pointer table, which makes changing page directory entries in the kernel's part of the address space/s much easier.

When my OS is using PAE it actually has a "kernel page directory" for each NUMA domain, so that I do have to manually update each kernel page directory when I change a page directory entry in kernel space (if it's common to all NUMA domains).

My kernel's linear memory management supports with/without PAE, with/without NUMA, with/without multi-CPU, page colouring, re-entrancy locking, (partially lazy) TLB invalidation on other CPUs, and it splits the address space into 3 seperate sections (kernel space, process space and thread space) where the boundary between process space and thread space is determined by the process (and may be different for each process). Also there's support for memory mapped files, swap space, optional allocation on demand (where any page can be marked for allocation on demand and/or AOD can be enabled for the entire process), DMA buffers (which are treated differently to avoid accidents), device memory mappings (e.g. BIOS, video display memory, etc) and every page of RAM is tested for faults before anything uses it. It is taking me a while to complete ::)...

Cheers,

Brendan

stonedzealot · Post by **stonedzealot** » Wed Sep 22, 2004 9:26 am

Since I'm guessing that the act of changing CR3 doesn't take up a significant amount of CPU time (despite the fact it makes far reaching changes), wouldn't it be easier to just page like one or two pages of the kernel (ie your interrupt handler) into every process' memory and just make sure you're in that page before you switch to the process' CR3?

Then when you interrupt X to come back to the kernel, you just make sure that you switch back to the kernel's CR3 before you execute any code outside of that page....

Then you wouldn't have to bother with changing any of the process' tables or anything, as long as they only interrupted to those pages... is that right or is there some flaw in that logic?

Brendan · Post by **Brendan** » Wed Sep 22, 2004 10:22 am

Hi,

wangpeng wrote: Since I'm guessing that the act of changing CR3 doesn't take up a significant amount of CPU time (despite the fact it makes far reaching changes), wouldn't it be easier to just page like one or two pages of the kernel (ie your interrupt handler) into every process' memory and just make sure you're in that page before you switch to the process' CR3?

Then when you interrupt X to come back to the kernel, you just make sure that you switch back to the kernel's CR3 before you execute any code outside of that page....

Then you wouldn't have to bother with changing any of the process' tables or anything, as long as they only interrupted to those pages... is that right or is there some flaw in that logic?

There is no flaw in this logic, but you'd have to change CR3 twice every time the kernel is entered from (and returns to) user level code. This could include IRQ's, exceptions, the kernel's API, etc.

Changing CR3 blows away the CPU's TLB cache so that the CPU has to load page directory and page table data from (very slow) memory instead of from the (very fast) TLB cache, until the TLB is restored again. This means that changing CR3 is quick but it significantly degrades the performance of any code that follows it.

Using the "global page bit" the effect of changing CR3 can be improved, but only for pages that are the same in all address spaces (so it won't help).

Also it would mean that the kernel code couldn't access the processes memory without some trickery (overhead), as most of the kernel and the process memory would be in different address spaces.

It would be possible to identity map the kernel low in physical/linear memory, so that instead of changing CR3 when the kernel is entered and then restoring CR3 when the kernel returns to the process you could disable paging and turn it back on again. In this case the kernel code won't need/use the CPU's TLB cache and would run at full speed. You'd still have degraded performance after the kernel returns to the process though.

While updating the page directory entries in many address spaces does involve overhead, it can be minimized. For the kernel's memory I never remove page tables, so that if the amount of memory the kernel uses is fluctuating it would have additional overhead the first time the memory is allocated, but none after that.

If your OS has a seperate address space for each thread (like mine, but unlike most) then there's little choice - you have to update each thread's page directory when process space is modified.

Note: None of the above applies if PAE is used (for PAE there's different problems)

.

Cheers,

Brendan

stonedzealot · Post by **stonedzealot** » Wed Sep 22, 2004 10:54 am

Ah yes, the TLB and degraded performance. Interesting. I'll have to remember that. I guess, for now, I'll just have to keep the entire kernel in there. Oh well.

Thanks alot, Brendan.

OSDev.org

Actual context switch.

Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.

Re:Actual context switch.