devc1 wrote:
And what about the heap allocator, should I just spinlock before allocating a heap when there are multiple processors involved.
Sounds to me like you're thinking of premature optimisation.
You need to get it working correctly, before you think about getting it working fast. And before you start thinking about what optimisations to apply, you need to measure where you need these optimisations.
After all, if it turns out your heap allocator lock is hardly contended, you might be able to redesign your locking to be coarser, reducing the overhead of the lock. Or it might be that your lock is highly contended, in which case it might be prudent to redesign your locking to be finer. But neither involves changing the lock mechanism, the lock still needs to function correctly as a lock, and it is this mechanism that by necessity makes it slower than non-atomic instruction sequences.
There are other ways to avoid per-thread heap locks. You could instead use a per-thread memory arena, for quick temporary allocations using a simple bump-pointer allocator. Such allocated memory might be used within the processing of a single system call, and once that system call is complete, you just discard all the memory from the arena (any data that must persist beyond the system call, however, would require the regular heap.) Being per-thread, this arena would require no locking, and thus be much faster than your general purpose heap, with the double whammy of reducing use of (and thus lock contention) the main heap, and would probably improve performance significantly more than any optimisation you could do in the main heap locking.