Question on Page Tables

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Cody

Question on Page Tables

Post by Cody »

Hi, all

Sorry if my questions appears too naive. ;D I am just ramping up on OS dev after years of Bios development.

I came across the following paragraphs while reading IA32 Manual Vol3 Chap 3:

"Memory management software has the option of using one page directory for all programs and tasks, one page directory for each task, or some combination of the two."

I have also heard some others saying that sharing page directory can remove the overhead of hardware switching & TLB flushing. But I doubt how an os adopting this policy could ever protect one process's memory space from being trespassing by another process now that the latter could see the former's pages too.

Can anyone help? Thanks in advance. ;)

Best regards,
Cody
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Question on Page Tables

Post by Pype.Clicker »

well, you're simply free to "manually edit" the current page directory rather than switching to a new directory. That makes little sense when there are plenty of entries to be rewritten, but it _can_ make sense if many of your running program share most of their code (through libraries) and data (shared stuff).

In that case, there will be no security penalty since after a switch is taken, all you got is an address space with stuffs that are either yours (just brought in) or stuff you cannot modify (copy-on-write data, read-only code, kernel stuff, etc).

btw, it's amusing to see intel writing "some combination of the two" when the two are actually incompatible. I guess what they mean is that "you may have groups of programs having the same directory, but different page tables, which is distinct from the directory of another group of programs".
JAAman

Re:Question on Page Tables

Post by JAAman »

thats what the global bit is for...

most of the time, there are significant portions of the address space that are shared -- the global bit (introduced on the P5, iirc) allows these to not change when updating the page directory
btw, it's amusing to see intel writing "some combination of the two" when the two are actually incompatible. I guess what they mean is that "you may have groups of programs having the same directory, but different page tables, which is distinct from the directory of another group of programs".
your discription is somewhat confusing, but this is exactly what most OSs do -- the 'groups' which share directories are usually called 'threads' -- if your OS keeps threads in the same directory (most do -- but not all), but isolates processes in separate directories, then you are using "some combination of the two"
But I doubt how an os adopting this policy could ever protect one process's memory space from being trespassing by another process now that the latter could see the former's pages too.
pype.clickers answer is the general way, but there is another (it isnt used much -- it gets very complicated) that is to use segmentation to isolate the processes, within a single page directory (most people use 'flat mode' and ignore segmentation)
0Scoder

Re:Question on Page Tables

Post by 0Scoder »

have also heard some others saying that sharing page directory can remove the overhead of hardware switching & TLB flushing. But I doubt how an os adopting this policy could ever protect one process's memory space from being trespassing by another process now that the latter could see the former's pages too.
There is another way, which I think is used by the microsoft researchers 'singularity' operating system. Software isolated processes I think they are called - the idea is that all the code is not actually run, but is interpretted by the OS, allowing the OS to define custom protection excatly as it once (as well as bypassing hardware switching overheads).

You can see the website for more details:
http://research.microsoft.com/os/singularity/
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re:Question on Page Tables

Post by Colonel Kernel »

0Scoder wrote:the idea is that all the code is not actually run, but is interpretted by the OS
Nope. There is no interpreting going on whatsoever -- that would be terribly slow. Instead, safe languages are used and the output of the compiler is an intermediate language that can be verified by the OS before (and after, in later phases of the project) that IL is translated into native machine code. This verification occurs at installation time once for each application.
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
Cody

Re:Question on Page Tables

Post by Cody »

most of the time, there are significant portions of the address space that are shared -- the global bit (introduced on the P5, iirc) allows these to not change when updating the page directory
Yes, that will make it stay in the TLB. But what puzzles me most is what will happen when I loaded into CR3 a page directory that has changed the mapping of the page marked as global? Intel's manuals listed two ways to invalidate Global entries 1) Clear PGE flag and invalidate the TLB. 2) Execute INVLPG. But they didn't mention whether the page will be automatically invalidated in the case I propose. Shall the OS task switching part do it explicitly or not?
your discription is somewhat confusing, but this is exactly what most OSs do -- the 'groups' which share directories are usually called 'threads' -- if your OS keeps threads in the same directory (most do -- but not all), but isolates processes in separate directories, then you are using "some combination of the two"
Yes, for threads that makes sense. But the sayings in the manual seems to suggest it be kept during task switches. I am sort of confused over the concept of "Task" and "Threads", "Process". It seems to me that "Task" is somewhat equivalent to "Process". But on the other side, "Thread" means execution path switching and in the processor realm, "Tasking Switching"(either hardware or software) is the only way so "Threads" can be treated as "Task" in a broad sense. Am I wrong?

So as a summary, different processes (these forked or copy-on-write excluded) shall have different page directory mappings yet they all contain some entries that are common among all which are for shared resources (code or data, such as interrupt call, system kernel data). Is that right?

Thanks for your reply!

Best regards,
Cody
JAAman

Re:Question on Page Tables

Post by JAAman »

But what puzzles me most is what will happen when I loaded into CR3 a page directory that has changed the mapping of the page marked as global?
when the page is marked as global in the TLB (clearing it in the tables will not affect it if it is currently in the TLB), it will not be updated on a CR3 write, you must invlpg the page to change it (that is why the instruction was created -- it came to exist at the same time as the global bit)

when the CPU needs to access memory, it first looks in the TLB, if there is no entry, it loads the page table into a TLB -- when you reload CR3, all TLBs are marked as empy, so they will be reloaded as needed -- unless the TLB is marked global -- then it will retain its previous values, and the CPU wont even notice the changed table (all invlpg does is mark the TLB as invalid)


Yes, for threads that makes sense. But the sayings in the manual seems to suggest it be kept during task switches. I am sort of confused over the concept of "Task" and "Threads", "Process". It seems to me that "Task" is somewhat equivalent to "Process". But on the other side, "Thread" means execution path switching and in the processor realm, "Tasking Switching"(either hardware or software) is the only way so "Threads" can be treated as "Task" in a broad sense. Am I wrong?
this confusion is normal -- everyone has a different definition of process and thread

some OSs treat both threads and processes exactly the same -- without any difference, but most keep threads as processes which share address space -- they are handled by the task-switch, but they are handled differently (no CR3 load if switch is a new thread in the same process), so yes, there is a task-switch, and the address space stays the same through it, and other times (when switching to a different process) it does change, making this a true hybred solution
So as a summary, different processes (these forked or copy-on-write excluded) shall have different page directory mappings yet they all contain some entries that are common among all which are for shared resources (code or data, such as interrupt call, system kernel data). Is that right?
yes, i think that is correct (ignoring, for now, threads -- which some treat the same as processes anyway)

i hope ive been able to help you
Cody

Re:Question on Page Tables

Post by Cody »

Hi, JAAman
i hope ive been able to help you
Yes, you have been helping a lot. It's really good to discuss with u. ;) I have begun to read Linux Kernel's source code trying to figure out how it utilizes various features of Intel's processors.
some OSs treat both threads and processes exactly the same -- without any difference, but most keep threads as processes which share address space -- they are handled by the task-switch, but they are handled differently (no CR3 load if switch is a new thread in the same process), so yes, there is a task-switch, and the address space stays the same through it, and other times (when switching to a different process) it does change, making this a true hybred solution
I checked Linux's source code "Sched.c" and the key functions "context_switch()". After a rough reading, I believe Linux, just as you have mentioned, treat thread and process in exactly the same way. And every task switch(linux used what is termed as 'Soft Switch') includes CR3 refreshing and TSS's Esp0 updating.

I remembered when I programmed on Sun's Sparc II there seems to be no definition of "thread" and everything there is process based. But later when I switched to ReadHat, they introduced some library such as "thread.o" so if you want to use functions such as "CreateThread()" you will have to link with this extension explicitly.

So it appears to me linux just treat the two without biases. There are differences between the two, but that you just won't notice the differences when talking in the kernel switching mechanisms.

Thanks again for your detailed explanation. It's really great to be able to get help from guys like you!!!! ;D

Best regards,
Cody
Post Reply