Hi,
Antti wrote:Is the following pseudocode broken:
Code: Select all
/* Critical section (enter and leave) */
/* This data page has 'PCD' and 'PWT' bits set */
.data
.balign 8, 0
lockvariable: .8byte 0
.text
.globl critical_section_enter
.globl critical_section_leave
.balign 16, 0
critical_section_enter:
lock btsq $0, lockvariable(%rip) /* Bit Test and Set (atomically) */
jc critical_section_enter /* Locked already? If yes, try again. */
retq
.balign 16, 0
critical_section_leave:
lock btrq $0, lockvariable(%rip) /* Reset (atomically) */
retq
It's not broken (it will work as intended). However, it's not very efficient either. The "critical_section_enter" code should use a "test, then (atomically test and set)" approach where the original test doesn't need/use a lock prefix. It should probably also use the "pause" instruction (for hyper-threading, so that the tight loop doesn't hurt the performance of the other logical CPUs in the core as badly). For the "critical_section_leave" code you should be able to "mov $0, lockvariable(%rip)" without any lock prefix, as all 80x86 CPUs guarantee that aligned writes like this are atomic (and you should make sure the lock variable is aligned for performance reasons anyway).
There is also a question of "fairness". Imagine 100 CPUs all fighting for this same lock - some CPUs may be very lucky and repeatedly acquire the lock, and some CPUs may be very unlucky and never acquire the lock. There are different ways of doing it that ensure "fairness" (which means that CPUs acquire the lock in the order they arrive, and no CPU is "unlucky" and waits for an excessive amount of time to acquire the lock).
Now; most of the above (everything except releasing the lock) only helps when there's lock contention, and can make to "no lock contention" case worse. This means that it can be a good idea to have different spinlock code for different cases - one version where lock contention is likely, and one version where lock contention is very unlikely.
Your first priority should be avoiding lock contention, so that lock contention is very unlikely. This may be difficult though, as often it depends on how many CPUs there are - e.g. when there's only 4 CPUs the chance of lock contention (for a specific lock) may be very unlikely, and when there's 200 CPUs the chance of lock contention (for the same lock) may be very likely.
On top of all of this; you'd want to think about your scheduler. If you acquire a spinlock and then the scheduler decides to do a task switch, then you can have many CPUs all spinning, waiting for you to release that lock, and it may take a long time for that to happen because you're running other tasks. To prevent that you want to be able to postpone task switches when your holding a spin lock (but you may not want to disable IRQs and mess up IRQ latency if you can avoid it).
In addition, if an IRQ handler acquires a lock when the code it has interrupted may have already acquired it, then you end up with deadlocks. To prevent that you might want a special lock that disables IRQs. Of course you might not want to disable *all* IRQs - for example, you might want to disable/postpone most IRQs but still be able to handle some types of IPIs (e.g. "TLB shootdown" IPIs) without waiting for the lock to be released and IRQs to be enabled.
This may mean that you end up with another 2 different types of spinlocks - one that postpones task switches and postpones (most? all?) IRQs which is used for anything that might be used by an IRQ handler; plus another type of spinlock that postpones task switches without disabling IRQs that is used for anything that is never needed by an IRQ handler.
This might mean 4 combinations - with/without IRQ postponing, and likely/unlikely lock contention.
Antti wrote:But the question is: do I have to adjust MTRRs (I have not touched them yet)? I am quite not sure whether those "PCD and PWT bits" are sufficient here. Are there anything else that I have forgot to take into account?
You shouldn't need to adjust MTRRs or set the PCD or PWT bits. It will work correctly regardless of whether the lock variable is in "write-back" memory or "uncached" memory or anything else (and you'd want the lock variable in "write-back" memory for performance reasons).
Antti wrote:When it comes to the actual topic, it seems that my "rewrite from scratch" means: just make source code files look nicer.
No. "Rewrite from scratch" means "
most of my old code is unsuitable for my new design and therefore it's a hindrance that will slow progress, and to avoid "tip-toe through the prickle patch" I'm going to cut my losses and save myself a lot of extra work and hassle by discarding all of it (and only using parts of the old code as a reference if/when suitable)".
For an example, imagine if you've got a single-CPU monolithic kernel that uses segmentation, doesn't use paging and has a round-robin scheduler; and you want to write a "many CPU" micro-kernel that uses paging, doesn't use segmentation and has a cooperative scheduler. It would be foolish to think small incremental steps is a good idea in this case - the design has changed too much.
Of course if the design isn't being changed very much then rewriting from scratch would be a bad idea. The question then becomes, how much are you planning to change the design, and are you planning to change the design enough to justify "rewrite from scratch"? This is why I suggested writing down what you want the new design to be and comparing your new design to your existing design.
For example, if your kernel currently has a big kernel lock and you want everything (all the data structures, all the locking, and all the code that uses data structures or locks) to be redesigned to achieve the best performance/scalability, then almost everything will change and starting from scratch is a good idea. However, if your kernel currently has a big kernel lock and you just want to remove that big kernel lock without redesigning much at all, then starting from scratch would be a bad idea.
Cheers,
Brendan