Page 1 of 2

Your opinions on my scheduler

Posted: Mon Mar 17, 2008 5:36 pm
by piranha
Heres my current scheduling model:
0 is the highest priority.
On each schedule() call, the scheduler moves to the next task, and subtracts 1 from the 'stat' variable. If stat is 0, it runs that task. Otherwise it moves to the next task and repeats the above.
If stat becomes < 0 (which it would on the next schedule() call after running the task) then the scheduler sets 'stat' equal to 'pri'. Pri is the priority. This way if the task has priority 5, it gets run less than if it had priority 2.
Now, if a task goes to 'sleep', then the task priority becomes negative. This way, the scheduler will subtract one (still negative) and see if it's == to 0. It wont be, so it moves on. When the task comes out of sleep, pri becomes positive, and scheduling resumes.

What do ya think?
-JL

Posted: Mon Mar 17, 2008 5:58 pm
by Paw
This could work, but you had to iterate through the whole task list until you identify one you can execute.

Following question is innocent, because I'm currently far from implementing a process/task scheduler. Well, I actually would implement it based on a binomial or Fibonacci heap data structure, which feature very good running times for their operations. So, may I ask you why you don't take one of these for your scheduler?

Posted: Mon Mar 17, 2008 6:39 pm
by piranha
Well, mainly because I came up with that on my own, and wanted to try it out.

-JL

Posted: Thu Mar 20, 2008 6:54 am
by jal
piranha wrote:Well, mainly because I came up with that on my own, and wanted to try it out.
The problem with your scheme is that if your highest priority is, say, 255 (1 byte for priority) and n processes, in a worst case scenerio you'd have to decrement n*255 priority variables before you could run a task, then you run n tasks, and then you'd have to decrement n*255 variables all over again. Possible, but quite a waste of time (you'd want to schedule as fast as possible).


JAL

Posted: Thu Mar 20, 2008 7:42 am
by senaus
As for sleeping tasks, why not just remove the task from the list altogether (assuming you're using a list), then add it back in when it is woken. There's no point in sleeping tasks ever being touched by the scheduler.

Sean

Posted: Thu Mar 20, 2008 6:05 pm
by Brendan
Hi,

@piranha: I've used this type of scheduling before (I call it "variable frequency"). I doubt I'll use it again.
jal wrote:The problem with your scheme is that if your highest priority is, say, 255 (1 byte for priority) and n processes, in a worst case scenerio you'd have to decrement n*255 priority variables before you could run a task, then you run n tasks, and then you'd have to decrement n*255 variables all over again. Possible, but quite a waste of time (you'd want to schedule as fast as possible).
The method I used was to keep track of the lowest count (and the thread with the lowest count) while scanning the list of tasks the first time. Then (if you don't find a task with "count = 0") you scan the list a second time while doing "count = count - lowest_count" and then run the task with the lowest count. That way you only ever need to scan the list of tasks twice.

@piranha: If you keep blocked tasks in the list you'll need to do something like "if (task_is_not_blocked) { check_count() }". This will severely hurt performance because you'll get a large number pipeline stalls due to branch misprediction. It's much better to remove blocked tasks from the list to avoid this.

The next problem is that for a large number of tasks the data won't fit in the CPUs caches. For example, if the count is in an 16-byte data structure and the OS supports a maximum of 1048576 tasks, then you'd be scanning through 16 MB of RAM and you will get a cache miss every time you try to read a task's count because of the CPU's LRU cache eviction algorithm. Basically, you get cache thrashing.

If it takes 150 cycles for a cache miss and 10 cycles to check a tasks count, then worst case (if the OS supports a maximum of 1048576 tasks) would be 335544320 cycles to find a task to run, or about 335 ms on a 1 GHZ CPU. That's far too long - if each task is given 10 ms of CPU time then you'd want to get the "worst case time to select the next task" well below 1 ms. Even without cache thrashing it can't be done (with my method of finding the next task to run).

With that said, I really do like "variable frequency" scheduling (for some purposes) - compared to "variable time slices" (which is just round robin scheduling where the amount of CPU time a task is given depends on that task's priority) it has higher overhead (more task switches per second) but is also much smoother (tasks spend less time waiting for CPU time).

If I could find a fast method for "variable frequency" (that doesn't have a very crappy "worst case time to select the next task") I would use it for scheduling my "high priority" tasks again.


Cheers,

Brendan

Posted: Thu Mar 20, 2008 6:39 pm
by Brendan
Hi again,

I'm a moron! :)
Brendan wrote:The method I used was to keep track of the lowest count (and the thread with the lowest count) while scanning the list of tasks the first time. Then (if you don't find a task with "count = 0") you scan the list a second time while doing "count = count - lowest_count" and then run the task with the lowest count. That way you only ever need to scan the list of tasks twice.
I spent ages trying to think of a faster method than this when I was implementing my last scheduler, and I couldn't do it. I just spent about 3 minutes thinking about it, and guess what? :D

The way to do it quickly is to have a seperate queue for each priority, so that the scheduler can find the first non-empty queue and select the first task on that queue.

For example:

Code: Select all

task *findTaskToRun(void) {
    task *selectedTask;
    task *lastTask;
    int queueNumber = 0;
    int i;

    // Find the first non-empty queue

    while (queue[queueNumber].head == NULL) {
        queueNumber++;
    }

    // Make sure the queue for "count = 0" is not empty

    if(queueNumber != 0) {
        for(i = 0; i+queueNumber < maxPriority; i++) {
            queue[i].head = queue[i+queueNumber].head
            queue[i].tail = queue[i+queueNumber].tail
        }
    }

    // Remove the first task from the queue for "count = 0"

    selectedTask = queue[0].head;
    queue[0].head = selectedTask->next;
    if(selectedTask->next == NULL) {
        queue[0].tail = NULL;
    }

    // Put the selected task back on the queue for it's priority

    queueNumber = selectedTask->priority;
    lastTask = queue[queueNumber].tail;
    if(lastTask == NULL) {
        queue[queueNumber].head = selectedTask;
    } else {
        lastTask->next = selectedTask;
    }
    queue[queueNumber].tail = selectedTask;

    // Return the selected task

    return selectedTask;
}
This method has a much much better "worst case time to select the next task" - all the data fits in the cache, and it doesn't matter how many tasks are running.


Cheers,

Brendan

Posted: Fri Mar 21, 2008 5:50 am
by Ready4Dis
Brendan wrote:Hi again,

I'm a moron! :)
Brendan wrote:The method I used was to keep track of the lowest count (and the thread with the lowest count) while scanning the list of tasks the first time. Then (if you don't find a task with "count = 0") you scan the list a second time while doing "count = count - lowest_count" and then run the task with the lowest count. That way you only ever need to scan the list of tasks twice.
I spent ages trying to think of a faster method than this when I was implementing my last scheduler, and I couldn't do it. I just spent about 3 minutes thinking about it, and guess what? :D

The way to do it quickly is to have a seperate queue for each priority, so that the scheduler can find the first non-empty queue and select the first task on that queue.

For example:

Code: Select all

task *findTaskToRun(void) {
    task *selectedTask;
    task *lastTask;
    int queueNumber = 0;
    int i;

    // Find the first non-empty queue

    while (queue[queueNumber].head == NULL) {
        queueNumber++;
    }

    // Make sure the queue for "count = 0" is not empty

    if(queueNumber != 0) {
        for(i = 0; i+queueNumber < maxPriority; i++) {
            queue[i].head = queue[i+queueNumber].head
            queue[i].tail = queue[i+queueNumber].tail
        }
    }

    // Remove the first task from the queue for "count = 0"

    selectedTask = queue[0].head;
    queue[0].head = selectedTask->next;
    if(selectedTask->next == NULL) {
        queue[0].tail = NULL;
    }

    // Put the selected task back on the queue for it's priority

    queueNumber = selectedTask->priority;
    lastTask = queue[queueNumber].tail;
    if(lastTask == NULL) {
        queue[queueNumber].head = selectedTask;
    } else {
        lastTask->next = selectedTask;
    }
    queue[queueNumber].tail = selectedTask;

    // Return the selected task

    return selectedTask;
}
This method has a much much better "worst case time to select the next task" - all the data fits in the cache, and it doesn't matter how many tasks are running.


Cheers,

Brendan
My original design was to divide my lists into priority levels, having for example, 3 levels (plus a sleeping list, but that's not called ever, so I won't include it here).

Anyways, my scheme worked like so:
Priority 1,2,3

121312
121312
121312

So, as you can see, a cycle is 6 tasks (then it repeats), so for each cycle we called 3 high priority tasks, 2 medium and 1 idle :). This means, the high priority tasks will get 1/2 the clock cycles, while the medium level tasks get called twice, so it gets 1/3 of the clock cycles, and the idle priority (or low priority however you want to view it) gets called once, and receives 1/6th the time. The good thing about this method, is it's A.) simple, just store round robin lists for each priority, and have a counter saying which list to pull from (that counts up to 6 and repeats). High priority tasks are called every other time, so response time and cpu time is good for those tasks. Normal priority tasks are called a bit less often, but still often enough to get a lot of work done, and idle tasks are never starved because they are always alloted a little bit of time. Now, my downside was, that if there was only one high priority task, it is getting 1/2 the CPU, while a hole bunch of normal priority tasks would be sharing the other 1/3 of the cylces. If there are no tasks of a specific priority, the list needs to be modified (my solution was to store multiple lists for priority scedules, so I would have an array of 6, each one saying which priority to use, then make up an array based on what priorities have info, which can also help with having a single high priority task, because you could make it be called 1/3 the time and normal tasks be called 1/2 the time (since they are many, each task is getting less time than the high priority, even though as a hole they are receiving more time). I could also add priority levels and make changes very easy (just make a new list, and a new array of priorities to include the new list). The method works, but I did not stick with it, I plan on implementing a few versions to benchmark and see what gives me the fastest response, best sharing equality, and least starving. Once I find a method that I like, I will probably stick with it until I have a new idea. My scheduler is abstract from my kernel (Which most should be, since the applications know nothing about the scheduler), so implementing new idea's and trying them out shouldn't be to hard, coming up with a nice set of benchmarks may prove challenging however.

Posted: Wed Mar 26, 2008 10:46 am
by mrvn
Brendan wrote: The way to do it quickly is to have a seperate queue for each priority, so that the scheduler can find the first non-empty queue and select the first task on that queue.
That is the way the original AmigaOS handled things. Within a priority class all tasks where round robin scheduled. If no more tasks wanted to run in a priority class the next lower priority class gets processed. The scheduling is trivial but has one big drawback. Tasks with lower priority can starve. They never get any cpu time.


So back to the original idea. You have priority levels and each task should run inversly proportional to its level often. Level 1 gets run every time, level 2 every second time, level 3 every third time and so on.

The initial idea was to initialize a counter to the priority level, decrement it every time the task may be run and run the task when the counter reaches zero. As pointer out that will incurr lots of needless costs. What you did was to record the number of cycles the task should not run and decrement them. Instead why not record the next time they should run?

time_now = 1234

For a new task:
run_next = time_now

Scheduling:
  • - find a task with minimum run_next
    - time_now = task.run_next
    - add task.priority to task.run_next
    - run task
What did that bring us? Firstly you only add priority when a task is run. run_next never changes otherwise. So now you can use a priority queue (usualy some sort of heap) to keep the tasks roughly sorted by run_next. You only need to be able to access the task with minimum run_next so a heap wil ldo verry nicely.

The costly operation will be to add the priority to run_next and to push the task back down onto the heap. Depending on its priority it will have to travel further down. The cost will be O(log num tasks) for every scheduling.

But you can now use the idea from above. Group all tasks with the same priority into a group and build a heap of such groups. Then pick the top group, run all tasks in that group once and push the group down heap. That cuts down your cost to O(log num priorities + 1) or for a fixed (maximum) number of priorities O(1).

MfG
Mrvn

Posted: Wed Mar 26, 2008 12:43 pm
by Candy
Brendan wrote:I spent ages trying to think of a faster method than this when I was implementing my last scheduler, and I couldn't do it. I just spent about 3 minutes thinking about it, and guess what? :D

The way to do it quickly is to have a seperate queue for each priority, so that the scheduler can find the first non-empty queue and select the first task on that queue.
Make a doubly-linked list of tasks that are ready to run, hold an array of tasks of a given priority that indicate what task is the first task of that priority or lower. When a task becomes available, put it in the list before the one at its priority in the list, replace all items with that task pointer at a higher priority with this task. When you want a task, take the top item in the list and put the pointers to the next available one. If you don't have extreme amounts of priorities (say, a constant amount like 8), this is essentially O(1) for any amount of processors, with priorities and round-robin within each priority. You can even limit the amount of time spent on each priority level this way.

It doesn't get much better than constant time.

Posted: Wed Mar 26, 2008 3:12 pm
by senaus
I personally use the method Brendan described, coupled with a little bitmap which tells the scheduler which queues are non-empty. I can then use a library function to find the most significant bit (a single instruction on i386), to save searching through the array. This is truly O(1) in both the number of tasks and the number of priorities.

The only problem is starvation, as mentioned above. I've been thinking about two possible solutions, described below.

One is to 'boost' the priority of tasks below the current maximum, every time we schedule a task which has greater than or equal priority to that of the previous. Using doubly linked lists this is easy, but this would make the algorithm O(#priorities).

Another is to 'rotate' the queues, by adding an offset (which is incremented periodically) to the priority before we queue a task, and then by bit rotating the bitmap when we schedule. This has the drawback of turning the highest priority tasks into the lowest priority when we increment the offset, so we need to move all tasks in the previous highest priority queue into the new highest queue.

Sean

Posted: Thu Mar 27, 2008 3:23 am
by Ready4Dis
senaus wrote:I personally use the method Brendan described, coupled with a little bitmap which tells the scheduler which queues are non-empty. I can then use a library function to find the most significant bit (a single instruction on i386), to save searching through the array. This is truly O(1) in both the number of tasks and the number of priorities.

The only problem is starvation, as mentioned above. I've been thinking about two possible solutions, described below.

One is to 'boost' the priority of tasks below the current maximum, every time we schedule a task which has greater than or equal priority to that of the previous. Using doubly linked lists this is easy, but this would make the algorithm O(#priorities).

Another is to 'rotate' the queues, by adding an offset (which is incremented periodically) to the priority before we queue a task, and then by bit rotating the bitmap when we schedule. This has the drawback of turning the highest priority tasks into the lowest priority when we increment the offset, so we need to move all tasks in the previous highest priority queue into the new highest queue.

Sean
Well, like you said, it has starvation issues, that need to be worked around. My post above describes a very similar method of having a list of queue's that will never starve a process, and it's O(1). I also support variable time slices, so for example, you could do 7ms for high priority, 5ms for medium, and 3ms for idle, so not only are the high priority queue's called more often, it's also run for a slightly longer time. I can tweek the settings by changing a few variables very easily, and I can even change mappings based on system setup or thread counts. Like I said, I may re-implement this method because it's very easily extensible, fast time to find next thread O(1), and relatively easy to implement (and can also add priority levels if I find the current isn't good enough).

An example of how to use in practice:

Code: Select all

#deifne MAX_PRIORITY 4   //sleep, idle, normal, high

List_S ThreadList[MAX_PRIORITY];
u32 PriorityTiming[MAX_PRIORITY] = {0,5,5,5};
u32 PriorityList[6] = {3,2,3,1,3,2};
u32 PriorityListLen = 6;
u32 PriorityListIndex =0;

Thread_S *FindNextThread(void)
{
 Thread_S *ptrThread;
 do //Until we find a list with threads!
 {
  ++PriorityListIndex;
  if (PriorityListIndex>PriorityListLen)
   PriorityListIndex = 0;
 }
 while (ThreadList[PriorityList[PriorityListIndex]].Head == NULL);
 ptrThread = NextInList(&ThreadList[PriorityList[PriorityListIndex]]);
 ptrThread->TimeToRun = PriorityTiming[PriorityList[PriorityListIndex]];
 return ptrThread;
}
NextInList just returns the next element in the list (the data, not the list container). Now, you can make your priority queue's, the time they are called, how long they are each run for, what order they are run very easy to modify, you will never have thread starvation (as long as you set it up right), and it will always move onto the next priority if this one doesn't have any left, so say in my example, high priority has no current threads, it will call it like so: (3 being high, 2 being normal, 1 being idle), and assuming equal timing

2,1,2,2,1,2,2,1 so normal priority would get 66% of the CPU, while idle would get 33%, now if normal had none, and both high and idle had some in the list it'd go like so:
3,3,1,3,3,3,1,3 so high priority would get 75% of the CPU and idle would get 25% (this assumes the timing's are equal, which they don't have to be). Now, if normal and high are the only 2:
3,2,3,3,2,3,3,2 so we are back to 66% vs. 33%, which can easily be modifed using the variable time slice, or making a different priority list (which can even be variable depending on whatever criteria you want to use). It's relatively extensible, fast and compact.

In your 2 "work arounds", the first is O(# of priorities), which is fine if there aren't to many, but with my method you can have 20 priorities, with very little slow-down (especially if you are smart and create a list based on priorities that aren't in use, like shown below). The second sounds like a pain, and moving data around is never fun ;). Below is an example to help speed up based on missing priorities (only needs to be called whena priority list is emptied or a first element is added to a list).

Code: Select all

#deifne MAX_PRIORITY 4   //sleep, idle, normal, high

List_S ThreadList[MAX_PRIORITY];
u32 PriorityTiming[MAX_PRIORITY] = {0,5,5,5};
u32 ListNotEmpty[MAX_PRIORITY] = {0,0,0,0}; //Mark them all as empty
u32 MasterPriorityList[6] = {3,2,3,1,3,2};
u32 PriorityList[6]; //Enough to fit the entire list if needed
u32 MasterListLen = 6;
u32 PriorityListLen = 0;
u32 PriorityListIndex = 0;

//ListNotEmpty is updated when tasks are added/removed!

void GeneratePriorityList(void)
{
 u32 Ctr;
 PriorityListLen = 0;
 for (Ctr=0;Ctr!=MasterListLen;++Ctr)
 {
  if (ListNotEmpty[MasterPriorityList[Ctr]]) //NotEmpty?
  {
    PriorityList[PriorityListLen] = MasterPriorityList[Ctr];
    ++PriorityListLen;
  }
 }
}
Now, if we add a thread to a priority level (excluding sleeping threads), we simply check if the list was empty first, if it was, then we call the GeneratePriorityList function and we can remove the blank list check since it will never be empty, and we no longer have ANY loops at all in our FindNextThread, bringing it back to the O(1) in all cases, with no thread starvation if you didn't mess up building your lists. Also, idle threads will get 100% cpu if no other threads are active, which is how it should be, without any special case code.

Posted: Thu Mar 27, 2008 3:23 am
by Ready4Dis
senaus wrote:I personally use the method Brendan described, coupled with a little bitmap which tells the scheduler which queues are non-empty. I can then use a library function to find the most significant bit (a single instruction on i386), to save searching through the array. This is truly O(1) in both the number of tasks and the number of priorities.

The only problem is starvation, as mentioned above. I've been thinking about two possible solutions, described below.

One is to 'boost' the priority of tasks below the current maximum, every time we schedule a task which has greater than or equal priority to that of the previous. Using doubly linked lists this is easy, but this would make the algorithm O(#priorities).

Another is to 'rotate' the queues, by adding an offset (which is incremented periodically) to the priority before we queue a task, and then by bit rotating the bitmap when we schedule. This has the drawback of turning the highest priority tasks into the lowest priority when we increment the offset, so we need to move all tasks in the previous highest priority queue into the new highest queue.

Sean
Well, like you said, it has starvation issues, that need to be worked around. My post above describes a very similar method of having a list of queue's that will never starve a process, and it's O(1). I also support variable time slices, so for example, you could do 7ms for high priority, 5ms for medium, and 3ms for idle, so not only are the high priority queue's called more often, it's also run for a slightly longer time. I can tweek the settings by changing a few variables very easily, and I can even change mappings based on system setup or thread counts. Like I said, I may re-implement this method because it's very easily extensible, fast time to find next thread O(1), and relatively easy to implement (and can also add priority levels if I find the current isn't good enough).

An example of how to use in practice:

Code: Select all

#deifne MAX_PRIORITY 4   //sleep, idle, normal, high

List_S ThreadList[MAX_PRIORITY];
u32 PriorityTiming[MAX_PRIORITY] = {0,5,5,5};
u32 PriorityList[6] = {3,2,3,1,3,2};
u32 PriorityListLen = 6;
u32 PriorityListIndex =0;

Thread_S *FindNextThread(void)
{
 Thread_S *ptrThread;
 do //Until we find a list with threads!
 {
  ++PriorityListIndex;
  if (PriorityListIndex>PriorityListLen)
   PriorityListIndex = 0;
 }
 while (ThreadList[PriorityList[PriorityListIndex]].Head == NULL);
 ptrThread = NextInList(&ThreadList[PriorityList[PriorityListIndex]]);
 ptrThread->TimeToRun = PriorityTiming[PriorityList[PriorityListIndex]];
 return ptrThread;
}
NextInList just returns the next element in the list (the data, not the list container). Now, you can make your priority queue's, the time they are called, how long they are each run for, what order they are run very easy to modify, you will never have thread starvation (as long as you set it up right), and it will always move onto the next priority if this one doesn't have any left, so say in my example, high priority has no current threads, it will call it like so: (3 being high, 2 being normal, 1 being idle), and assuming equal timing

2,1,2,2,1,2,2,1 so normal priority would get 66% of the CPU, while idle would get 33%, now if normal had none, and both high and idle had some in the list it'd go like so:
3,3,1,3,3,3,1,3 so high priority would get 75% of the CPU and idle would get 25% (this assumes the timing's are equal, which they don't have to be). Now, if normal and high are the only 2:
3,2,3,3,2,3,3,2 so we are back to 66% vs. 33%, which can easily be modifed using the variable time slice, or making a different priority list (which can even be variable depending on whatever criteria you want to use). It's relatively extensible, fast and compact.

In your 2 "work arounds", the first is O(# of priorities), which is fine if there aren't to many, but with my method you can have 20 priorities, with very little slow-down (especially if you are smart and create a list based on priorities that aren't in use, like shown below). The second sounds like a pain, and moving data around is never fun ;). Below is an example to help speed up based on missing priorities (only needs to be called whena priority list is emptied or a first element is added to a list).

Code: Select all

#deifne MAX_PRIORITY 4   //sleep, idle, normal, high

List_S ThreadList[MAX_PRIORITY];
u32 PriorityTiming[MAX_PRIORITY] = {0,5,5,5};
u32 ListNotEmpty[MAX_PRIORITY] = {0,0,0,0}; //Mark them all as empty
u32 MasterPriorityList[6] = {3,2,3,1,3,2};
u32 PriorityList[6]; //Enough to fit the entire list if needed
u32 MasterListLen = 6;
u32 PriorityListLen = 0;
u32 PriorityListIndex = 0;

//ListNotEmpty is updated when tasks are added/removed!

void GeneratePriorityList(void)
{
 u32 Ctr;
 PriorityListLen = 0;
 for (Ctr=0;Ctr!=MasterListLen;++Ctr)
 {
  if (ListNotEmpty[MasterPriorityList[Ctr]]) //NotEmpty?
  {
    PriorityList[PriorityListLen] = MasterPriorityList[Ctr];
    ++PriorityListLen;
  }
 }
}
Now, if we add a thread to a priority level (excluding sleeping threads), we simply check if the list was empty first, if it was, then we call the GeneratePriorityList function and we can remove the blank list check since it will never be empty, and we no longer have ANY loops at all in our FindNextThread, bringing it back to the O(1) in all cases, with no thread starvation if you didn't mess up building your lists. Also, idle threads will get 100% cpu if no other threads are active, which is how it should be, without any special case code.

Posted: Thu Mar 27, 2008 5:23 am
by jal
A number of things have been said about starvation, but in some systems starvation is not something to be worried about (or at least part of the design). In e.g. RTOSes, the thread with the highest priority runs indefinitly until it blocks or quits. Lower priority threads may be in cryo for ever (and of course don't actually starve, as their state doesn't change). A related problem is that of priority inversion: if a higher priority task calls a lower priority task, but there's another task with a priority in between those tasks, that task will run without countermeasures, freezing a higher priority task. On the other hand, if a lower priority task calls a higher priority task, and a task unblocks having a priority in between, that task will not be run until the higher priority tasks finishes. It's all so complicated :).


JAL

Posted: Thu Mar 27, 2008 6:38 am
by Ready4Dis
Typically what is done when a high priority task calls a lower one, is to suspend the high priority and promote the low priority to high (or normal, depending on who called) and once complete, it is brought back down to idle and the high priority is re-woken. Of course this sounds simple, but there are also times when a high priority class calls a low priority task, which in turn calls the same high priority task. How I envision this working is as such:

Thread 1: High priority calls low priorty and sleeps.
Thread 2: Low priority promoted to high priority.
Thread 2 now calls Thread 1. Thread 1 is woken up, and remains high priority, while thread 2 is put back to sleep. Tread 1 returns from call, and falls back asleep while thread 2 is re-awoken to high priority (it's priority when it called), when it's done, it wakes Thread 1 back up, and resumes it's low priority status. This is pretty tough to get right, and lots of testing needs to be done to catch these awkward scenario's. I do recommend getting basic multitasking working, then expand upon (or implement a new method) once you have all the bugs of task/thread switching in place, so you're not troubleshooting/bug tracking 2 things at once.