Page 1 of 1

On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 6:25 am
by venos
I'm at the point in my OS where I need to start thinking about allocating userspace address spaces. This comes with a few interesting design decisions that need to be made.

To actually build the page directory, I think I can abuse recursive paging, by putting the top level directory as an entry in the kernel's top level entry at some known offset, then map the same offset in the process' top level to itself - kind of making a recursive page table, but that's accessible from the kernel address space. So far so good, I see no issues with this beyond finding the time to build it.

Separately, the kernel also needs to be mapped to all process address spaces, to allow for syscalls and interrupts. Intel provides the GLOBAL flag, which looks like it's for exactly this purpose, but the docs are very unclear on how it works in practise. AIUI, if I've read correctly, setting GLOBAL in the kernel address space on the relevant pages will mean that, on switching to a userspace address space, the GLOBAL address space will still be mapped, without having necessarily populated the relevant entries in the userspace address spaces. However, I'm not certain that I've correctly understood that. It's also unclear whether I need to set GLOBAL for every entry recursively, OR just the bottom level, OR just the top level. Every entry recursively probably won't hurt, but good to clarify.

Does anyone else make use of GLOBAL pages for kernel mapping, who can shed some light on how to make it work in practise as opposed to just on paper? I know it can work for what I'm trying to do, it's just a question of implementation details the Intel manuals are a bit sparse on.

Relatedly, I'm almost executing userspace programs, which is a very exciting milestone :D

Re: On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 11:42 am
by nullplan
Global pages do not get evicted from the TLB when reloading CR3. If you are employing a normal higher-half scheme, this will improve performance of task switches, because the kernel is mapped into both the old and new address spaces at the same place, and the TLBs can just remain.
venos wrote: Sun Nov 24, 2024 6:25 am However, I'm not certain that I've correctly understood that. It's also unclear whether I need to set GLOBAL for every entry recursively, OR just the bottom level, OR just the top level.
Without looking at the documentation, I'd just set it on every level, because I have no reason not to. With the normal address-space split, I know from the highest level which pages are going to be global.

Re: On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 12:24 pm
by rdos
nullplan wrote: Sun Nov 24, 2024 11:42 am Global pages do not get evicted from the TLB when reloading CR3. If you are employing a normal higher-half scheme, this will improve performance of task switches, because the kernel is mapped into both the old and new address spaces at the same place, and the TLBs can just remain.
I remember trying this out, but giving it up. Now when you describe the function, I think I understand why it failed. My logic for TLB flushing is based on keeping a small list of pages that needs to be invalidated per core, but when the page count grows above some maximum amount, I will use a reload of CR3 instead as this would be faster. Of course, on 386, invalidate page is not even supported, which means CR3 reload must always be used there.

So, how are you supposed to handle invalidation of hundreds of pages, some which might be in kernel space, when the global flag is used for kernel space?

Re: On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 12:36 pm
by nullplan
rdos wrote: Sun Nov 24, 2024 12:24 pm So, how are you supposed to handle invalidation of hundreds of pages, some which might be in kernel space, when the global flag is used for kernel space?
Why would you need to invalidate hundreds of kernel pages? That's an event that should be rare. Just invlpg all the pages that need it, is my guess.
rdos wrote: Sun Nov 24, 2024 12:24 pm Of course, on 386, invalidate page is not even supported, which means CR3 reload must always be used there.
But the 386 doesn't even have global pages.

Re: On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 2:11 pm
by rdos
nullplan wrote: Sun Nov 24, 2024 12:36 pm
rdos wrote: Sun Nov 24, 2024 12:24 pm So, how are you supposed to handle invalidation of hundreds of pages, some which might be in kernel space, when the global flag is used for kernel space?
Why would you need to invalidate hundreds of kernel pages? That's an event that should be rare. Just invlpg all the pages that need it, is my guess.
I can see situations when this happens, like when a large heap object in kernel is freed. It could also be that a user process is terminated, along with freeing a small number of kernel pages. The point is also that this creates a "TLB shootdown" that must be handled by all cores in the system. IOW, it's not enough to do these invalidations on the current core, they must be done by all cores in the system. You cannot have an unlimited list of page invalidations per core, unless you allocate it dynamically, which is quite ineffective.

Also, rare cases sometimes happens, and you cannot have those bringing down your system in uncontrolled ways.

Re: On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 3:17 pm
by Octocontrabass
rdos wrote: Sun Nov 24, 2024 12:24 pmSo, how are you supposed to handle invalidation of hundreds of pages, some which might be in kernel space, when the global flag is used for kernel space?
If you really need to flush the entire TLB including global pages, you can toggle CR4.PGE.

Re: On understanding the x86 paging GLOBAL flag

Posted: Sun Nov 24, 2024 3:28 pm
by rdos
Octocontrabass wrote: Sun Nov 24, 2024 3:17 pm
rdos wrote: Sun Nov 24, 2024 12:24 pmSo, how are you supposed to handle invalidation of hundreds of pages, some which might be in kernel space, when the global flag is used for kernel space?
If you really need to flush the entire TLB including global pages, you can toggle CR4.PGE.
Yes. I found this solution here: https://stackoverflow.com/questions/283 ... b-flushing
They also describe the trade-off between entire TLB invalidation and the use of INVLPG.