I'm sure there are things I'm doing wrong, but perhaps this might be a good place to start a thread to progressively grow this code; which we can then all leech from (public domain):
We have an array called "active", with each element being RSP before the timeslice started. Clearly, each CPU gets it's own slot, as the number of active threads/processes doesn't change.
We have a ring (FIFO) called "ready" with floor and ceiling addresses, and head and tail addresses that migrate up and then reset to floor when they hit the ceiling.
We have a pool of sleeping threads/processes that we keep separately.
The idea was that we (the community) could use this code and then employ this and models other than a round robin FIFO ring (such as a stack with a reschedule when it goes empty or whatever)
We save the context:
Code: Select all
pusha
pushf
mov RAX, RSP
push FS
and RAX, 15
push GS
sub RSP, 512
sub RSP, RAX
fxsave RSP
push RAX
%if accross_address_spaces
push CR3
%endif
Code: Select all
%if accross_address_spaces
cmp [RSP], CR3
je .B2
mov CR3, [RSP] ; OPTIMIZE: this is the "slow/easy" way. Another way is to traverse page directories and invalidate manually.
.B2: sub RSP, 8
%endif
pop RAX
fxrstor RSP
add RSP, RAX
add RSP, 512
pop GS
pop FS
popf
popa
ret
We can use the method described
http://blogs.msdn.com/oldnewthing/archi ... 29915.aspx
to maintain locks. We end up locking two instructions, but in effect have employed two critical sections as the CPU's don't progress until they've made it through first. One to move ring.head, and one to move ring.tail.
We set a register to the location we store and load inside each loop which only happens once the loop is exited (and hence the space is "allocated" for this processor so it won't collide)
==========
The scheduled stack strategy only locks once, but the problem is that it needs to perform a complete run of the scheduler each time it hits 0 at which point any processors that come in will be locked until it's done. This can be reduced significantly by firing the scheduler while there's still as many ready contexts to jump into as there are processors. No code for this yet either, but I'd imagine you could fit a fair few schedulers under this mechanism.
~~~
I'll be updating this first post as ideas develop from the posts below.
Status: Pseudocode