Multiprocessing Context Switch

Avarok · Post by **Avarok** » Tue Sep 25, 2007 11:09 pm

Hi guys, I've been working on a round-robin context switch for threads and/or processes. It currently does not require any fore-knowledge about the number or speed of processors. When "across_address_spaces" is 0, this can be run in ring 3.

I'm sure there are things I'm doing wrong, but perhaps this might be a good place to start a thread to progressively grow this code; which we can then all leech from (public domain):

We have an array called "active", with each element being RSP before the timeslice started. Clearly, each CPU gets it's own slot, as the number of active threads/processes doesn't change.

We have a ring (FIFO) called "ready" with floor and ceiling addresses, and head and tail addresses that migrate up and then reset to floor when they hit the ceiling.

We have a pool of sleeping threads/processes that we keep separately.

The idea was that we (the community) could use this code and then employ this and models other than a round robin FIFO ring (such as a stack with a reschedule when it goes empty or whatever)

We save the context:

Code: Select all

	pusha
	pushf
	mov			RAX, RSP
	push		FS
	and			RAX, 15
	push		GS
	sub			RSP, 512
	sub			RSP, RAX
	fxsave		RSP
	push		RAX
%if accross_address_spaces
	push		CR3
%endif

We can restore the context this way:

Code: Select all

%if accross_address_spaces
	cmp			[RSP], CR3
	je			.B2
	mov			CR3, [RSP]	; OPTIMIZE: this is the "slow/easy" way.  Another way is to traverse page directories and invalidate manually.
.B2:	sub			RSP, 8
%endif
	pop			RAX
	fxrstor		RSP
	add			RSP, RAX
	add			RSP, 512
	pop			GS
	pop			FS
	popf
	popa
	ret

==========

We can use the method described
http://blogs.msdn.com/oldnewthing/archi ... 29915.aspx
to maintain locks. We end up locking two instructions, but in effect have employed two critical sections as the CPU's don't progress until they've made it through first. One to move ring.head, and one to move ring.tail.

We set a register to the location we store and load inside each loop which only happens once the loop is exited (and hence the space is "allocated" for this processor so it won't collide)

==========

The scheduled stack strategy only locks once, but the problem is that it needs to perform a complete run of the scheduler each time it hits 0 at which point any processors that come in will be locked until it's done. This can be reduced significantly by firing the scheduler while there's still as many ready contexts to jump into as there are processors. No code for this yet either, but I'd imagine you could fit a fair few schedulers under this mechanism.

~~~

I'll be updating this first post as ideas develop from the posts below.

Status: Pseudocode