Peer Review: My Scheduler Design

astrocrep · Post by **astrocrep** » Fri Jan 11, 2008 12:57 pm

My Scheduler Spec v1.3

Scheduler will acknowledge 6 queue:
Very High
High
Norm
Low
Idle
Sleeping

The Scheduler will monitor two volatile Bytes used for status and notification in the scheduler.

Byte 1: Scheduler Key (unsigned char scheduler_key)
---------------------------------------------------
0 - 4 - Active Queue Bit - Used to signify what queue its currently in, 0 = no, 1 = yes (0 = Very High, 1 = High ...)
5 - Reserved
6 - Reserved
7 - Scheduler Lock Bit - This is set when entering the scheduler and cleared exiting. When entering the scheduler, this bit is checked, if its set, we leave, no harm, no foul. This is important if an active task yields back to the scheduler, for any reason, and as the scheduler is processing, a timer event fires.

Byte 2: Scheduler Queue (unsigned char scheduler_queue)
-------------------------------------------------------
0 - 4 - Queue Has Items - Used to mark if a queue has an threads waiting to run 0 = nothing in this queue, 1 = something(s) in this queue
5 - New Very High Notification - Signal that a new Very High Thread has been added...
6 - New High Notification - Signal that a new High Thread has been added...
7 - New Norm Notification - Signal that a new Norm Thread has been added...

Any given queue will work as such:
1.) At init, all queues are pointing to a unique tail_task (a seperate 1 per queue), and the active kernel thread is then added to the HEAD of the Norm Queue (and the kernel thread's -> next will point to the tail_thread)
2.) When a new task is added to the system, it is given a free initial run boost, and placed at the HEAD of its priority queue. At this point, we set [priority queue] Has items bit in scheduler_queue and if the priority is NORM or better, we set the new thread waiting flag, also in the scheduler_queue.
3.) When processing a queue, a queue will run until its head becomes the tail_task, at that point the next (lower) priority is run.
4.) When a Thread is put to sleep, (wait() is different) is will be removed from its priority queue and added to a generic sleep queue, it will also be assigned a ticks_to_wake_up_at value.
5.) When a Sleeping thread is awoken, it will be placed back at the HEAD of its queue, given a run boost of 50 ticks, and the scheduler_queue flags will be set to indicate that the priority has items, and has new items if applicable.

time_to_run:
------------
This is the amount of time a current task has left to run, it is calculated as follows (using the below key)

time_to_run = 100 - (priority_value * 10);

The priority_value key:
Very High = 0
High = 2
Norm = 5
Low = 7
Idle = 9

This is how the Timer handler will work (logically)

1.) Scheduler Checks [Scheduler Lock Bit] if set, it returns, if not set, it is then set and the logic flow continues.
2.) Scheduler will increment the ticks counter
3.) Scheduler will then loop through all sleeping threads, and wake any that are ready (ticks_to_wake_up = system_ticks).
4.) Scheduler will check the Notification Bits, if there is a new thread in a priority GREATER then the current, the Active Priority becomes that priority. All of this is done by calling select_next_task() *** Important things to remember *** If we are processing NORM, and a Very High comes in, the new very high is placed at the HEAD of the very high queue. Furthermore, since we have already completed the very high queue, the previous HEAD was actually the tail. So once the new very high task has completed, the scheduler will attempt to grab the next very high, since its the tail, it goes to the next priority lower...
5.) Scheduler will check current_task time_to_run != 0, in that case then we decrement it by 1, and return.
6.)If the current_task time_to_run == 0 --OR-- we had a new item notification in a queue GREATER than the current, we call select_next_task() (Explained below)
6.) Once here, we have a new task to run, we check the time_to_run, is it 0? If it is 0 we calculate the new time_to_run, if its not we give it a yield bonus of + 1/2 of a new time_to_run.
7.) The Scheduler switches to the new current_task and returns.

select_next_task() (logically)
--------------------------------
if New Very High and Active Queue != Very High
Active Queue Bit = Very High

if New High and Active Queue < High
Active Queue Bit = High

if New Norm and Active Queue < Norm
Active Queue Bit = Norm

[code for each priority, but only run for the active priority]
if current_priority_queue != priority_tail_task
active_task = current_priority_queue
current_priority_queue = current_priority_queue->next
[add active_task to the end of the current_priority_queue]
else
active priority is then set to the next lower one

if low_priority_queue == low_tail_task
if no other queue has items (bit checking) run the first idle thread
if idle_queue == idle_tail_task
kernel_idle()

if any queue has items, cycle their tail marker to the end of the queue.
Set the active queue to the first queue with items, call select_next_task(); and then return.

So basically this is my Scheduler Logic in a nut shell.
Also a side note, my task structure has the related variables mentioned above.

So what do you think, I am not looking for anything crazy good, but something that will show bias for priorities, and handle sleeping states...

Comments and critics welcome.
Thanks,
Rich

lukem95 · Post by **lukem95** » Fri Jan 11, 2008 1:57 pm

i dont have time to go through it particularly thoroughly, but the parts i did read seem very good.

i will most likely be using this as a resource when i implement multitasking, fairly soon.

... you should also consider adding this to the wiki after it is reviewed, i think it would make a good addition.

ucosty · Post by **ucosty** » Fri Jan 11, 2008 2:00 pm

I've become a bit of a benchmark whore so I'm going off to implement it to see how it goes.

My only concern is that you use proper mutual exclusion structures instead of a volatile lock bit. Can somebody with more knowledge in this area back me up please?

astrocrep · Post by **astrocrep** » Fri Jan 11, 2008 2:03 pm

I have implement Mutual Exclusion with a ticket algorithim...

However, instead of a thread spinning till its turn, it gives back the control to the scheduler.

What the lock bit is for is to prevent us from enter the scheduler twice... as that would obviously be bad... also this saves us from doing a cli / sti.

I guess I could / should be using an atomic lock... perhaps as a todo.

Found a good reference as to what atomic operations looks like...
http://joshua.raleigh.nc.us/docs/linux- ... 72905.html
From linux 2.4 I believe.

Thanks,
Rich

Brendan · Post by **Brendan** » Sun Jan 13, 2008 2:01 am

Hi,

astrocrep wrote:This is how the Timer handler will work (logically)

1.) Scheduler Checks [Scheduler Lock Bit] if set, it returns, if not set, it is then set and the logic flow continues.
2.) Scheduler will increment the ticks counter

That seems around the wrong way to me, unless you've got a ticks counter and a seperate system time counter. For example:

0.) Increment the system clock
1.) Scheduler Checks [Scheduler Lock Bit] if set, it returns, if not set, it is then set and the logic flow continues.
2.) Scheduler will increment the ticks counter
3.) Scheduler will then loop through all sleeping threads

Instead of:

1.) Scheduler will increment the ticks counter
2.) Scheduler Checks [Scheduler Lock Bit] if set, it returns, if not set, it is then set and the logic flow continues.
3.) Scheduler will then loop through all sleeping threads

Or perhaps:

1.) Scheduler will increment the ticks counter
2.) Scheduler will then loop through all sleeping threads
3.) Scheduler Checks [Scheduler Lock Bit] if set, it returns, if not set, it is then set and the logic flow continues.

astrocrep wrote:3.) Scheduler will then loop through all sleeping threads

How long will it take (during the timer IRQ) to scan the entire list of sleeping threads? For example, if there's 10000 threads that are sleeping until sometime next month, will you re-check them all during every timer IRQ?

astrocrep wrote:and wake any that are ready (ticks_to_wake_up = system_ticks).

A minor note here - there can be 3 types of time delays.

The first type of time delay sleeps for a set length of time, for e.g. "wake_me_up_after(delay_ticks)". This isn't effected by variations in the system time clock and is probably the most frequently used/misused type of time delay.

The second type of time delay sleeps until a specific time, for e.g. "wake_me_up_when( wake_tick )". This also isn't effected by variations in the system time clock, and is rarely supported by other OSs even though it's extremely useful for avoiding problems where delays are repeated (e.g. locking a game to a specific frame rate). For example:

Code: Select all

    next_tick = get_current_tick();
    for(;;) {
        next_tick += 100;
        wake_me_up_when( next_tick );
        foo();
    }

For this code "foo()" is called every 100 ticks, regardless of how long "foo()" takes to execute and other things. Often people stuff this up and try to use something like:

Code: Select all

    for(;;) {
        wake_me_up_after( 100 )
        foo();
    }

In this case, with unknown CPU speeds, unknown other threads running and no idea how long "foo()" will take to execute, there's no way to determine how often "foo()" will be called (except to say it'd be "100 or more" ticks between calls).

Some people try to do it correctly and write something like this:

Code: Select all

    spent_tick = 0;
    for(;;) {
        wake_me_up_after( 100 - spent_tick )
        start_time = get_current_tick();
        foo();
        spent_time = get_current_tick() - start_tick;
    }

The problem here is that thread switches can occur at the wrong time (e.g. between "wake_me_up_after()" and "start_tick = ") which messes up the timing - there's still no way to determine how often "foo()" will be called (except to say it'd be "100 or more" ticks between calls).

The last type of time delay sleeps until a specific time and *is* effected by variations in the system time clock. For example, if the system time clock is currently (incorrectly) set to the year 1008 and a thread wants to sleep until 9:00 on the first Thursday in June in the year 2008 then you could do some calculations and do "wake_me_up_after(calculated_ticks)" or "wake_me_up_when(calculated_wake_time)", but if someone corrects the system time clock you'll wake up at the wrong time (in this case, you'll be 1000 years too late). AFAIK, due to lack of OS support, most applications use polling in this case. For e.g.:

Code: Select all

    while( wake_time < get_current_time() ) {
        sleep(1)
    }

This works (sort of), but forces the application to compromise between precision (e.g. "sleep(1)" with more frequent checking) and overhead (e.g. "sleep(999)" with less frequent checking). Note: IMHO "polling" always means "pointless checking", but in this case it also means pointless thread switches for pointless checking...

For the first 2 types of time delays your scheduler would want to use something like "if( (remaining_ticks--) <= 0) wake_thread()". For the last type of time delay your scheduler would want to use "if(wake_time <= current_time) wake_thread()".

In all of these cases you might want to wake up when something happens or when the time delay expires. For example, for some OSs you might need to have functions with timeouts (a common example would be the "select()" function). For my OS messaging is used for everything, and "when something happens" means "when a message is received", so I'd be looking for 3 different functions, one for each type of delay (e.g. "get_message_with_timeout(ticks)", "get_message_with_expiry(wake_tick)" and "get_message_until(wake_time)").

If you implement support for all of this you'd need 2 different queues (one for the first 2 types of delays and another for the third type of delay) and you'd need to be able to remove a thread from these queues if the thread is woken up before the time delay expires.

astrocrep wrote:4.) Scheduler will check the Notification Bits, if there is a new thread in a priority GREATER then the current, the Active Priority becomes that priority. All of this is done by calling select_next_task() *** Important things to remember *** If we are processing NORM, and a Very High comes in, the new very high is placed at the HEAD of the very high queue. Furthermore, since we have already completed the very high queue, the previous HEAD was actually the tail. So once the new very high task has completed, the scheduler will attempt to grab the next very high, since its the tail, it goes to the next priority lower...

When a higher priority thread becomes "ready to run", why not just do a thread switch directly to that higher priority thread (instead of setting a Notification Bit and waiting for the scheduler to notice)?

astrocrep wrote:7 - Scheduler Lock Bit - This is set when entering the scheduler and cleared exiting. When entering the scheduler, this bit is checked, if its set, we leave, no harm, no foul. This is important if an active task yields back to the scheduler, for any reason, and as the scheduler is processing, a timer event fires.

What happens if you're in the scheduler deciding which thread to switch to and an IRQ occurs that should unblock a thread? The IRQ handler can't leave the thread blocked just because the scheduler's lock bit is set...

astrocrep wrote:6.) Once here, we have a new task to run, we check the time_to_run, is it 0? If it is 0 we calculate the new time_to_run, if its not we give it a yield bonus of + 1/2 of a new time_to_run.

astrocrep wrote:5.) When a Sleeping thread is awoken, it will be placed back at the HEAD of its queue, given a run boost of 50 ticks, and the scheduler_queue flags will be set to indicate that the priority has items, and has new items if applicable.

These 2 things combined look like a malicious CPU hog's dreams come true. If a thread repeatedly sleeps for as little time as possible just before it's time slice ends, then it gets a 50% yield bonus, gets put on the head of the queue and also gets a 50 tick boost?

Artificial boosts also add other problems. For example, imagine a HTTP server running as a very high priority thread. At a certain load (e.g. 1000 concurrent connections) it might work fine without any problem while using 90% of each time slice, however if something happens (e.g. due to some freak occurance, like several unrelated IRQs happening within the same time slice) and the HTTP server uses a full time slice, then it'll suddenly loses it's yield bonus. Once the yield bonus is gone it'd be unable to sustain a fraction of the load it previously handled and would continually use it's entire time slice trying. A network administrator would need to stop and restart the HTTP server to get the yield bonus back and restore the HTTP server's ability to handle the load.

Lastly, to me the scheduler seems like it may have similar problems to "variable time slicing", which always feels sluggish under load. For example, if there's a very high priority thread (e.g. the GUI), and 20 normal priority threads, 20 low priority threads and 40 idle priority threads running (and none of these threads block); then the very high priority thread would get 100 ticks, the low priority threads would get 30 ticks each and the idle priority threads would get 10 ticks each. That means the very high priority GUI thread would only end up with 5% of CPU time (or "100 / (20*50 + 20*30 + 40*10)") and would need to wait for 2000 ticks between time slices. If each tick is 1 ms, then the very high priority thread would run for 100 ms and then wait for 2 seconds between time slices.

Cheers,

Brendan

astrocrep · Post by **astrocrep** » Sun Jan 13, 2008 9:38 am

Thanks for your reply.

There seems to be a massive amount of usefull information hear... I am going to need to time to digest it all.

Again, thank you very much!

-Rich

OSDev.org

Peer Review: My Scheduler Design

Peer Review: My Scheduler Design

Re: Peer Review: My Scheduler Design