Page 1 of 1
Page tables and fine-grained locking
Posted: Thu Mar 30, 2006 3:27 pm
by Colonel Kernel
I'm starting to design my virtual memory manager (the mechanisms, not the policies), starting with what I guess I would call "page table maintenance". I'm interested in people's opinions about concurrency and locking as it relates to such page-table manipulating code. Do people mostly use fine-grained locking, or just a coarse lock associated with each page directory...? What are the advantages/disadvantages of whatever approach you use?
Re:Page tables and fine-grained locking
Posted: Thu Mar 30, 2006 3:47 pm
by paulbarker
My virtual memory will probably be split into kernel, process and thread areas, with 3-level locking. Always lock kernel tables, only lock process tables if the process may have a thread running on another processor, only lock thread tables if the thread could be running on another processor.
Re:Page tables and fine-grained locking
Posted: Thu Mar 30, 2006 5:50 pm
by Brendan
Hi,
I split kernel space into many "logical areas" - one for the scheduler's "thread state table", one for each process' "process data area", one for each thread's "thread data area", etc. Basically it's one lock for each kernel data structure.
For example, I'd have one lock for the scheduler's "thread state table" that is used for normal reading/writing and linear memory management (rather than a 2 seperate locks).
The advantage of this is that there's effectively no locks at all for linear memory management in kernel space (unless you count each data structure's lock, which will be needed anyway).
The disadvantages are that each structure must be page aligned, and you can't swap kernel pages to/from swap space or use memory mapped files in kernel space. This probably makes it inappropriate for a monolithic kernel (but for a micro-kernel it works well).
For thread space I don't need any locks as thread space is "unique" and I don't support shared memory, etc (see my comments in the "SMP TLB invalidation IPI" topic).
For process space, I just use one lock per process. If anyone every complains I'll just tell them to use more processes or that they aren't using thread space effectively.
IMHO the important part of locking (in general) is finding a good balance between lock overhead and lock contention. For example, if there's only one or 2 CPUs then a single "global address space lock" might work well, but if there's 768 CPUs locking would need to be much more fine-grained (due to the increased chance of lock contention).
Of course a "perfect" balance isn't possible without knowing how many CPUs, threads, etc there are beforehand - this isn't possible (but a guess is usually good enough)...
Cheers,
Brendan
Re:Page tables and fine-grained locking
Posted: Thu Mar 30, 2006 11:59 pm
by Colonel Kernel
My OS won't have private per-thread sections of the user address space -- it will follow the Windows model in that regard. I was wondering if it would be worth it to split up the user-space portion of each address space into fixed-size regions for the purposes of locking. If the chances of two or more threads on different CPUs calling MM syscalls to modify the same address space at the same are really small, then I guess it isn't worth it. I just don't have a sense of how common an occurrence this is for say, a desktop/workstation/server OS.
Any experiences from "real-world" testing (or a close approximation thereof) that would shed some light on this?
Re:Page tables and fine-grained locking
Posted: Fri Mar 31, 2006 2:53 am
by Brendan
Hi,
Colonel Kernel wrote:If the chances of two or more threads on different CPUs calling MM syscalls to modify the same address space at the same are really small, then I guess it isn't worth it. I just don't have a sense of how common an occurrence this is for say, a desktop/workstation/server OS.
It's like algebra - for "x = 1 * 5 + 3 / 2" you can figure what x equals fairly easily, but for "x = a * b + c / d" you can't figure out what x equals without knowing exactly what the other variables are.
The "variables" you'd need to consider include:
- - how many CPUs there are
- how many thread the process spawns
- how often these threads use an MM syscall
- how often the MM syscalls would use the same lock
- how fast the kernel's critical sections are
- how much unrelated CPU load there is
For a desktop/server OS none of this can be predicted.
Instead, you consider what the OS is designed for, what type of applications it'll be running, the design of the rest of the OS, the costs involved with guessing wrong, etc, and then take a guess.
For example, if you're planning on running a large multi-threaded database on "many-CPU" computers and your critical sections are complicated (relatively slow), then I'd suggest thousands of little locks per address space. Alternatively, if the OS is designed for between 1 and 8 CPUs, won't be running too many multi-threaded processes or the threads often do unrelated work (i.e. on different data) and your critical sections are fast, then I wouldn't suggest using very many locks per address space.
One thing I'd also consider is how many locks you've got in the physical memory manager. - there's no point using lots of locks in the linear memory manager if there's not very many locks in the physical memory manager (you'd just be shifting the lock contention problem to a lower level).
BTW are you using the "available" bits in page directory entries for anything? If you're not, then you could use one of these bits (with "lock bts" and "lock btr" instructions) to have one lock per page table with no real costs - the memory is used anyway. This would give you 1024 locks where each lock covers 4 MB. I'm not sure how it'd work with shared memory though.
Cheers,
Brendan
Re:Page tables and fine-grained locking
Posted: Fri Mar 31, 2006 11:44 am
by Colonel Kernel
(Aside): If you want to succeed in business, never answer a question with a question.
I did a bit of reading and found out that NT 4.0 had a single lock for the kernel address space and a single lock for each user address space. Next will be to find out if they've changed that in later versions to improve scalability...