OSDev.org

Posted: **Sun Nov 11, 2007 3:22 pm**

Just sharing some code about queued spinlocks I came up with.
Code is written for MSVC++.

There are two different implementations. Both implementation use stack to store queue, so size of spinlocks are not increased.

The first use linked list. Local-flag for busy-waiting is located on stack of current processor (principle of locality). It is not required to align memory to prevent cache contention. The downside is that it requires one busy-waiting loop when releasing spinlock, but it shouldnâ€™t happen too often.

Code: Select all

struct queued_spinlock_entry
{
	queued_spinlock_entry* next;
	int state;
};

queued_spinlock_entry* head = 0; 

void some_process_1()
{
	queued_spinlock_entry proce_qe;

	__asm
	{
		lea eax, proce_qe
		lea ebx, head
		mov [eax], 0
		mov edx, eax
		mov [eax]queued_spinlock_entry.state, 1

		lock xchg [ebx],eax

		test eax, eax
		jz enter_section

			mov [eax],edx

			wait1:
				pause
				cmp [edx]queued_spinlock_entry.state, 1
				je wait1

		enter_section:
	}

	// do some work


	_asm
	{
		lea eax, proce_qe
		lea ebx, head
		xor ecx, ecx
		mov edx, eax

		lock cmpxchg [ebx], ecx
		je exit_section

			wait2:
				pause
				cmp [edx], ecx
				je wait2

			mov eax, [edx]
			mov [eax]queued_spinlock_entry.state, ecx

		exit_section:
	}
}

The second do not use linked list. Local-flag for busy-waiting is located on stack of processor which came to queue before current processor. It doesnâ€™t require busy-waiting loop when releasing spinlock. But it breaks principle of locality, and requires memory alignment of local-flag to prevent cache contention.

Code: Select all

int* position = 0;

void some_process_2()
{
	int proce_qe;

	__asm
	{
		lea eax, proce_qe
		lea ebx, position
		mov [eax], 1
		mov edx, eax

		lock xchg [ebx],eax

		test eax, eax
		jz enter_section

			wait1:
				pause
				cmp [eax], 1
				je wait1

		enter_section:
	}

	// do some work

	__asm
	{
		lea eax, proce_qe
		lea ebx, position
		xor ecx, ecx
		mov edx, eax

		lock cmpxchg [ebx], ecx

		mov [edx], ecx
	}
}

Posted: **Mon Nov 12, 2007 2:48 am**

this code isn't written in c++ is written in asm.

and where do you wish to use this functionality?

Posted: **Mon Nov 12, 2007 3:15 am**

os64dev wrote:this code isn't written in c++ is written in asm.

I didn't say it is written in C++, but for MSVC++ because of inline assembly (__asm blocks).

os64dev wrote: and where do you wish to use this functionality?

To replace standard spinlocks. It should reduce cache contention and provide fairness - CPUs enter critical section in order they come (FIFO) which is not the case with standard spinlocks.

Posted: **Mon Nov 12, 2007 12:31 pm**

See, in my opinion spinlocks are kind of standard technics of synchronization, same as semaphores and other mutexes. The main idea of spinlocks is simplicity, when u come with more complex tasks u should use diffrent kind of synchronization, but not overwrite existing one.. So please do not tell that it can replace spinlocks... Have you ever thought why no one else used this kind of spinlocks? The answer is very simply - this is not working from logical side. I mean when some process declarated that want to use critical section and its waiting for releasing spinlock it can be killed and so on. What then? The spinlock will be released but critical section will not be reached by any process anymore because your spinlocks mechanism waits for non-existing process. (my answer can be wrong due of bad interpretation of your code, so waiting for your replay..)

Posted: **Mon Nov 12, 2007 3:17 pm**

1. Queued spinlocks are standard synchronization technique, also.
2. They are simple (ok, not simple as 'usual' spinlocks).
3. They are used on multiprocessor systems, in kernel mode, to synchronize processors not threads or processes, so mutexes and semaphores has nothing to do with that.
4. They are used in Windows XP, Server 2003, Vista, Server 2008, and I think I read somewhere that Linux also implements queued spinlocks.
5. Microsoft suggests driver developers to replace existing spinlocks with queued spinlocks and to use them wherever it is possible. Donâ€™t know if there is similar suggestion for Linux developers.
6. They works in very logical way. See below.
7. Because they are not used for inter-thread or inter-process synchronization your story about suddenly dying process is irrelevant. See [3]

So let me tell once again: only multiprocessor (SMP and NUMA) systems can benefit from this kind of spinlocks (well, mostly because spinlocks should only be used on those systems

). Better performance should be gained because cache contention is reduced (each CPU waits on its one local-flag), locality of reference is improved (each CPUâ€™s local-flag is located on kernel stack which is used by the CPU) and provide fairness (CPUs enter the critical section in order they reached which prevents possible starvation).

Posted: **Mon Nov 12, 2007 3:40 pm**

I though that it is used for synchronizing processes or threads, thats why it was unlogical for me, just missunderstanding.

Posted: **Thu Jan 17, 2008 9:33 pm**

Here is a paper about queued spinlocks

http://www.cs.rice.edu/~johnmc/papers/tocs91.pdf