Page 2 of 2

Re: Letting the kernel handle mutexes / semaphores with IPIs

Posted: Mon Jun 06, 2011 12:38 am
by rdos
Owen wrote:The case to optimize for is the uncontested case. If you're blocking, you've already failed at scaling. Besides, if you can lock and unlock the futex in 1/5th of the time (And this is realistic with privilege level switch times on x86) then you significantly reduce the time in which there is an opportunity for blocking to occur (Assuming the application holds the lock for the shortest time possible; a hopefully reasonable assumption)
This seems reasonable. However, if I were to implement this, I'd keep the creation/deletion calls, and just put the handle after the counter. I would also not support using this between processes. There is IPC for such things. No need to support synchronization between processes in shared memory.

Re: Letting the kernel handle mutexes / semaphores with IPIs

Posted: Mon Jun 06, 2011 2:58 am
by rdos
berkus wrote:
rdos wrote:I would also not support using this between processes. There is IPC for such things. No need to support synchronization between processes in shared memory.
Exactly the opposite, I would say.
The problems with such support clearly prevails. There is nothing that can be done with shared memory synchronization that cannot be done with IPC (possibly in conjunction with shared memory). The primary problem with global handles is that they might not become deleted as such handles cannot automatically be purged when processes terminate.

Re: Letting the kernel handle mutexes / semaphores with IPIs

Posted: Mon Jun 06, 2011 4:48 am
by Owen
The Futex structure I showed was a possible optimization. The Linux implementation uses just a uint32_t.

One possible train of thought: If you have shared memory, you don't need to do as much memory copying during IPC.

Re: Letting the kernel handle mutexes / semaphores with IPIs

Posted: Mon Jun 06, 2011 8:24 am
by rdos
Owen wrote:The Futex structure I showed was a possible optimization. The Linux implementation uses just a uint32_t.
OK, then they must have the seach problem when blocking. Using a handle seems like a more optimal solution. If an OS must support futexes in shared memory, it could implement global handles and provide a new API to initialize a shared futex.
Owen wrote:One possible train of thought: If you have shared memory, you don't need to do as much memory copying during IPC.
Shared memory doesn't scale. If you need to add more machines, shared memory won't do. And adding more cores doesn't scale either as the memory system becomes a bottleneck. A generic IPC that works across machines on a network scales much better than any shared memory attempt.

And I do not copy during local IPC. I allocate page-aligned buffers in order to only transfer the page-tables between sender & receiver.

Re: Letting the kernel handle mutexes / semaphores with IPIs

Posted: Mon Jun 06, 2011 8:32 am
by Owen
Agreed, but
  • Lets not throw away the benefits of shared memory on a single machine, and
  • Shared memory makes a great base on which to build high-performance local IPC (In which the SHM is an implementation detail)
  • Shared memory is a great way to implement file access (i.e. all access is by memory mapping the file; this is implemented by mapping the RAM backing the cache into the process). Files are essentially shared memory anyway, and for most cases you'll get better performance with coarse-grained locking (as occurs with shared memory) than with sending fine-grained file ops over the network

Re: Letting the kernel handle mutexes / semaphores with IPIs

Posted: Mon Jun 06, 2011 8:38 am
by rdos
Owen wrote:Shared memory is a great way to implement file access (i.e. all access is by memory mapping the file; this is implemented by mapping the RAM backing the cache into the process). Files are essentially shared memory anyway, and for most cases you'll get better performance with coarse-grained locking (as occurs with shared memory) than with sending fine-grained file ops over the network
Certainly. File buffers are best implemented using shared (kernel) memory, and their pages can easily be mapped into a process address-space as well for faster access. I have implemented memory-mapped files since a while back.