Round Robin Priority Implementation

Brendan · Post by **Brendan** » Tue Jul 31, 2018 11:19 am

Hi,

Korona wrote:There is no need for one IRQ every frame.

There's a need to be able to handle a maximum of one IRQ every frame (but that doesn't mean you'd always have the maximum).

Korona wrote:One IRQ before each frame counter wraparound (i.e. every 1024 ms) is enough.

Erm, what?

The user presses a key, up to 8 ms later the USB controller polls the keyboard and transfers a "HID packet" to a buffer in RAM, then you let the data sit in RAM for up to 1 second because you didn't have an IRQ at the end of that frame; and then up to 1 second later the USB controller driver finally gets an IRQ and checks what happened (and sees that a "HID packet" arrived and sends it to the HID driver)?

Korona wrote:So there is no need for such a high priority - USB drivers cannot starve the bus driver anyway (unless they need more than 1024 ms to process a single packet - in that situation your system is likely doomed anyway).

Erm, what?

I'm sure there are people capable of typing faster than 1 character every 2 seconds (one "press", then a 1024 ms gap, then one "release", then a 1024 ms gap, then ...).

Cheers,

Brendan

Octacone · Post by **Octacone** » Tue Jul 31, 2018 11:36 am

Ummm guys, it's nice that you are discussing all that but I am currently far far away from that galaxy.
Right now I don't have anything to block nor any real tasks that could run except ones that print random characters. I am just trying to work on things gradually.
Currently my scheduler is getting called by the PIT every time it ticks.
Although I'll take care of that when the time comes and my OS grows enough.
For now I am trying to solve a general puzzle in my head on how to connect all this.
Btw @Brendan I've seen your post on PIT not being needed as such, but what if an evil programmer decided not to preempt wouldn't that leave the entire OS hanging. There has to be a fallback mechanism like the PIT timer ticking or something.

This is what my attempt looks like, it doesn't work, getting GPF, PF, etc...
Before I tried to implement priorities, everything worked like a charm, there has to be something obviously wrong:

Code: Select all

void Multitasking_Class::Schedule(multitasking_registers_t* multitasking_registers)
{
    Hardware::Disable_Interrupts();
    Save_Context(multitasking_registers); //save context based on currently_running_priority to the corresponding queue.
    switch(currently_running_priority)
    {
        case high:
        {
            if(first_high_priority_thread_switched == false && high_priority_queue != null)
            {
                first_high_priority_thread_switched = true;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(high_priority_queue->esp);
            }
            if(high_priority_queue->next_thread != null)
            {
                high_priority_queue = high_priority_queue->next_thread;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(high_priority_queue->esp);
            }
            else
            {
                currently_running_priority = medium;
                high_priority_queue = first_high_priority_thread;
                first_high_priority_thread_switched = false;
            }
            break;
        }
        case medium:
        {
            if(first_medium_priority_thread_switched == false && medium_priority_queue != null)
            {
                first_medium_priority_thread_switched = true;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(medium_priority_queue->esp);
            }
            if(medium_priority_queue->next_thread != null)
            {
                medium_priority_queue = medium_priority_queue->next_thread;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(medium_priority_queue->esp);
            }
            else
            {
                currently_running_priority = normal;
                medium_priority_queue = first_medium_priority_thread;
                first_medium_priority_thread_switched = false;
            }
            break;
        }
        case normal:
        {
            if(first_normal_priority_thread_switched == false && normal_priority_queue != null)
            {
                first_normal_priority_thread_switched = true;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(normal_priority_queue->esp);
            }
            if(normal_priority_queue->next_thread != null)
            {
                normal_priority_queue = normal_priority_queue->next_thread;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(normal_priority_queue->esp);
            }
            else
            {
                currently_running_priority = idle;
                normal_priority_queue = first_normal_priority_thread;
                first_normal_priority_thread_switched = false;
            }
            break;
        }
        case idle:
        {
            if(first_idle_priority_thread_switched == false && idle_priority_queue != null)
            {
                first_idle_priority_thread_switched = true;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(idle_priority_queue->esp);
            }
            if(idle_priority_queue->next_thread != null)
            {
                idle_priority_queue = idle_priority_queue->next_thread;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
                Switch_To_Thread(idle_priority_queue->esp);
            }
            else
            {
                idle_priority_queue = first_idle_priority_thread;
                first_idle_priority_thread_switched = false;
                currently_running_priority = high;
                PIC.Send_EOI(32);
                Hardware::Enable_Interrupts();
            }
            break;
        }
    }
}

Korona · Post by **Korona** » Tue Jul 31, 2018 11:44 am

Brendan wrote:
Korona wrote:There is no need for one IRQ every frame.
There's a need to be able to handle a maximum of one IRQ every frame (but that doesn't mean you'd always have the maximum).

Ah, you meant "maximum one IRQ every frame" instead of "one IRQ every frame". Yes, that is of course correct. But not what you wrote in your previous post: You were talking about an "end of frame" IRQ - not a transaction completion IRQ.

nullplan · Post by **nullplan** » Tue Jul 31, 2018 2:22 pm

Octacone wrote:Btw @Brendan I've seen your post on PIT not being needed as such, but what if an evil programmer decided not to preempt wouldn't that leave the entire OS hanging. There has to be a fallback mechanism like the PIT timer ticking or something.

For that you use interrupts. If you have a display server running, and on it a terminal emulator, a shell and a program in that shell, and that program is not yielding CPU time on its own, then... no matter. When you move the mouse, you get an IRQ from the mouse, which should cause the display server to receive an event of some kind and update the position of the pointer. After which the evil non-yielder runs again, but so what? And when you press CTRL-C on the keyboard, another bunch of IRQs should cause events to the display server, which should send events to the terminal emulator, which should send SIGINT to the foreground process group of the terminal, so in this case, to the non-yielder. Which should be its end.

PIT is only needed if a process is waiting for some time. Which admittedly, on my system is all the time (my clock program is always sleeping for a second). But in that case you only need an IRQ for the first expiring timer.

This is what my attempt looks like, it doesn't work, getting GPF, PF, etc...
Before I tried to implement priorities, everything worked like a charm, there has to be something obviously wrong:

Well, if it's obvious, then why don't you see it?

Your code betrays a design problem: You have intermixed the PIC IRQ handler with other things, the scheduler in this case. This makes it impossible to yield time from any other context, makes it harder to use a different timer (LAPIC timer or ACPI power saving timer for instance) for the same job.

In order to solve this problem, I suggest adding support for "late" IRQ handlers. The normal IRQ handler actually handles the IRQ itself, i.e. acknowledges the interrupt to the interrupting device (is that needed for the PIT? I don't recall...) and sends EOI to the PIC; but the late handler does everything after this. That should reduce complexity somewhat.

Then try to use fewer variables and more arrays, because that way, your code can be reformed into a loop, instead of the copy-paste-monster it is. This further reduces complexity and maybe makes the error even more obvious.

Other than that, try to work the problem itself, not around the problem. Don't ask what change caused the faults to occur, ask why the fault is happening. Admittedly, GPF doesn't help, but at least you know where it happened (from the return IP). And for the PF you know where it happened and what address you tried to access.

Brendan · Post by **Brendan** » Tue Jul 31, 2018 3:04 pm

Hi,

Octacone wrote:Before I tried to implement priorities, everything worked like a charm, there has to be something obviously wrong:

To start with, your IRQ enabling/disabling is prone to race conditions - e.g. in multiple places you do "Hardware::Enable_Interrupts();" and then "Switch_To_Thread();" so an IRQ can occur in just before the "Switch_To_Thread();" or while you're in the middle of switching threads.

I typically just have a comment like "caller of this function is responsible for making sure IRQs are disabled before they call it" and then don't disable or enable IRQs in the scheduler anywhere. After the task switch the new task will return from the scheduler back to wherever it came from and then IRQs will be enabled. It's also a bad idea to send the EOI in "Schedule()" because it might be called from somewhere completely different - e.g. if a task blocks for any reason (e.g. calls "read()" or "sleep()" or many of the other kernel API functions) the code that handles that will call "Schedule()" to find a new task for the CPU.

For example, it might look like:

Code: Select all

void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Hardware::Disable_Interrupts();
        Schedule();
        Hardware::Enable_Interrupts();
    }
}

..except that (if an "interrupt gate" is used) the CPU automatically handles enabling/disabling IRQs so it could be like this:

Code: Select all

void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Schedule();
    }
}

Next; if the currently running task was the only task at that priority but was blocked for any reason (and/or exited because it finished or was terminated for being naughty), there'd be no tasks left at the same priority as the currently running task. Your scheduler doesn't handle this properly. Also, if a higher priority task is unblocked or created your scheduler won't notice or care.

The problem is "switch(currently_running_priority)". That whole "switch/case" needs removed and replaced with something like:

Code: Select all

    if(high_priority_queue != null) {

        // Get task from high priority queue and switch to it

    } else if(medium_priority_queue != null) {

        // Get task from medium priority queue and switch to it

    } else if(normal_priority_queue != null) {

        // Get task from normal priority queue and switch to it

    } else if(idle_priority_queue != null) {

        // Get task from idle priority queue and switch to it

    } else {

        // No tasks are running!

    }

I'm also not sure why you've split the low level task switch into two pieces ("Save_Context()" and then "Switch_To_Thread()"). Normally you'd just have a "Switch_To_Thread()" that saves the old task's state and loads the new task's state. In that case it'd be like:

Code: Select all

// WARNING: Caller is responsible for making sure IRQs are disabled before calling this!

void Schedule(void) {
    something * thread;

    if(high_priority_queue != null) {

        thread = high_priority_queue;
        high_priority_queue = thread->next;

    } else if(medium_priority_queue != null) {

        thread = medium_priority_queue;
        medium_priority_queue = thread->next;

    } else if(normal_priority_queue != null) {

        thread = normal_priority_queue;
        normal_priority_queue = thread->next;

    } else if(idle_priority_queue != null) {

        thread = idle_priority_queue;
        idle_priority_queue = thread->next;

    } else {

        // No tasks are running!

    }

    Switch_To_Thread(thread);
}

void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Schedule();
    }
}

Note: I also changed to "circular singly linked lists" in that example - it's cleaner/easier because you don't need care about reaching the end of the list.

Octacone wrote:Btw @Brendan I've seen your post on PIT not being needed as such, but what if an evil programmer decided not to preempt wouldn't that leave the entire OS hanging. There has to be a fallback mechanism like the PIT timer ticking or something.

That depends on the design of the OS (which scheduling algorithm, etc). Either something is used as a fallback method (e.g. timer) and that's rarely used because lots of tasks block before it happens or get preempted by a higher priority task before it happens; or the scheduler just lets the task use as much time as it wants until a higher priority task unblocks and preempts it (and there's no fallback needed).

Cheers,

Brendan

Octacone · Post by **Octacone** » Tue Jul 31, 2018 3:10 pm

nullplan wrote: For that you use interrupts. If you have a display server running, and on it a terminal emulator, a shell and a program in that shell, and that program is not yielding CPU time on its own, then... no matter. When you move the mouse, you get an IRQ from the mouse, which should cause the display server to receive an event of some kind and update the position of the pointer. After which the evil non-yielder runs again, but so what? And when you press CTRL-C on the keyboard, another bunch of IRQs should cause events to the display server, which should send events to the terminal emulator, which should send SIGINT to the foreground process group of the terminal, so in this case, to the non-yielder. Which should be its end.

PIT is only needed if a process is waiting for some time. Which admittedly, on my system is all the time (my clock program is always sleeping for a second). But in that case you only need an IRQ for the first expiring timer.

Hmm that would only work in case for a terminal, what about a generic graphics app, you can't simply quit it because you don't know that it is the cause of your issues in the first place? Also what do you mean by yielding. I want my multitasking to be preemptive not cooperative. I hope we are not talking about two different things.

This is what my attempt looks like, it doesn't work, getting GPF, PF, etc...
Before I tried to implement priorities, everything worked like a charm, there has to be something obviously wrong:
Well, if it's obvious, then why don't you see it?

Your code betrays a design problem: You have intermixed the PIC IRQ handler with other things, the scheduler in this case. This makes it impossible to yield time from any other context, makes it harder to use a different timer (LAPIC timer or ACPI power saving timer for instance) for the same job.

In order to solve this problem, I suggest adding support for "late" IRQ handlers. The normal IRQ handler actually handles the IRQ itself, i.e. acknowledges the interrupt to the interrupting device (is that needed for the PIT? I don't recall...) and sends EOI to the PIC; but the late handler does everything after this. That should reduce complexity somewhat.

Then try to use fewer variables and more arrays, because that way, your code can be reformed into a loop, instead of the copy-paste-monster it is. This further reduces complexity and maybe makes the error even more obvious.

Other than that, try to work the problem itself, not around the problem. Don't ask what change caused the faults to occur, ask why the fault is happening. Admittedly, GPF doesn't help, but at least you know where it happened (from the return IP). And for the PF you know where it happened and what address you tried to access.

Again, same issues as before, you are mentioning yielding (which makes me think of cooperative multitasking for some reason).
I don't really need to care about other timers atm since PIT is okay on itself. It does the job, well supported, easily programmable.
I actually already have something like that. When PIT triggers an interrupt if tasking is not enabled it will take care of everything without calling the scheduler at all, it's all written in Assembly. When tasking is enabled scheduler has to send an EOI to the PIC instead.
While reading about this topic I noticed that lots of people used linked lists and suggested them as a way to go. Arrays don't seem dynamic enough although they do simply things a bit. You would have to specify a maximum amount of threads.
There are definitely better designs that mine, basically every other that works

I know it looks very confusing.
I did some debugging and noticed that the high priority task ran successfully while other didn't (their "case" was called but there is a bug in between that I'm yet to find). Random GPF and PF point to stack corruption, I hope not.
Btw I've seen your static/dynamic comment, can you please explain it a bit more because I've seen it somewhere while browsing the forums. It sounds interesting, I hope it's not too complex to implement and actually worth implementing after all. Would you recommend it to someone who has just started discovering multitasking?

Octacone · Post by **Octacone** » Tue Jul 31, 2018 3:36 pm

Brendan wrote:Hi,

Octacone wrote:Before I tried to implement priorities, everything worked like a charm, there has to be something obviously wrong:
To start with, your IRQ enabling/disabling is prone to race conditions - e.g. in multiple places you do "Hardware::Enable_Interrupts();" and then "Switch_To_Thread();" so an IRQ can occur in just before the "Switch_To_Thread();" or while you're in the middle of switching threads.

I typically just have a comment like "caller of this function is responsible for making sure IRQs are disabled before they call it" and then don't disable or enable IRQs in the scheduler anywhere. After the task switch the new task will return from the scheduler back to wherever it came from and then IRQs will be enabled. It's also a bad idea to send the EOI in "Schedule()" because it might be called from somewhere completely different - e.g. if a task blocks for any reason (e.g. calls "read()" or "sleep()" or many of the other kernel API functions) the code that handles that will call "Schedule()" to find a new task for the CPU.

For example, it might look like:
Code: Select all
void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Hardware::Disable_Interrupts();
        Schedule();
        Hardware::Enable_Interrupts();
    }
}
..except that (if an "interrupt gate" is used) the CPU automatically handles enabling/disabling IRQs so it could be like this:
Code: Select all
void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Schedule();
    }
}
Next; if the currently running task was the only task at that priority but was blocked for any reason (and/or exited because it finished or was terminated for being naughty), there'd be no tasks left at the same priority as the currently running task. Your scheduler doesn't handle this properly. Also, if a higher priority task is unblocked or created your scheduler won't notice or care.

The problem is "switch(currently_running_priority)". That whole "switch/case" needs removed and replaced with something like:
Code: Select all
    if(high_priority_queue != null) {

        // Get task from high priority queue and switch to it

    } else if(medium_priority_queue != null) {

        // Get task from medium priority queue and switch to it

    } else if(normal_priority_queue != null) {

        // Get task from normal priority queue and switch to it

    } else if(idle_priority_queue != null) {

        // Get task from idle priority queue and switch to it

    } else {

        // No tasks are running!

    }
I'm also not sure why you've split the low level task switch into two pieces ("Save_Context()" and then "Switch_To_Thread()"). Normally you'd just have a "Switch_To_Thread()" that saves the old task's state and loads the new task's state. In that case it'd be like:
Code: Select all
// WARNING: Caller is responsible for making sure IRQs are disabled before calling this!

void Schedule(void) {
    something * thread;

    if(high_priority_queue != null) {

        thread = high_priority_queue;
        high_priority_queue = thread->next;

    } else if(medium_priority_queue != null) {

        thread = medium_priority_queue;
        medium_priority_queue = thread->next;

    } else if(normal_priority_queue != null) {

        thread = normal_priority_queue;
        normal_priority_queue = thread->next;

    } else if(idle_priority_queue != null) {

        thread = idle_priority_queue;
        idle_priority_queue = thread->next;

    } else {

        // No tasks are running!

    }

    Switch_To_Thread(thread);
}

void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Schedule();
    }
}
Note: I also changed to "circular singly linked lists" in that example - it's cleaner/easier because you don't need care about reaching the end of the list.

Octacone wrote:Btw @Brendan I've seen your post on PIT not being needed as such, but what if an evil programmer decided not to preempt wouldn't that leave the entire OS hanging. There has to be a fallback mechanism like the PIT timer ticking or something.
That depends on the design of the OS (which scheduling algorithm, etc). Either something is used as a fallback method (e.g. timer) and that's rarely used because lots of tasks block before it happens or get preempted by a higher priority task before it happens; or the scheduler just lets the task use as much time as it wants until a higher priority task unblocks and preempts it (and there's no fallback needed).

Cheers,

Brendan

It's very likely that I might have encountered a race condition. I just wasn't able to find a way to safely enable the interrupts while sending the EOI and hooping that the PIT won't fire an interrupt before my code gets a chance to call Switch_To_Thread();. There is still one problem.
How is this supposed to work:

Code: Select all

void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Hardware::Disable_Interrupts();
        Schedule();
        Hardware::Enable_Interrupts();
    }
}

Hardware::Enable_Interrupts(); would never get executed since the scheduler will have called Switch_To_Thread() and everything after that wouldn't exist anymore.

"Also, if a higher priority task is unblocked or created your scheduler won't notice or care."

Why should it care? It had a chance to run all the higher priority tasks before switching to the lower priority list.
Shouldn't it just wait for all other threads to complete and then notice a new thread?
If not, what about "starvation"?

"I'm also not sure why you've split the low level task switch into two pieces ("Save_Context()" and then "Switch_To_Thread()"). Normally you'd just have a "Switch_To_Thread()" that saves the old task's state and loads the new task's state. In that case it'd be like:"

Because my IRQ_0 (In Assembly) handler pushes all the needed registers that I then fetch by accessing multitasking_registers (in C).
It has to be done just before the "C interrupt handler part" gets called because if done later they would contain trash values that were used by C++ and not the task "data".

"Note: I also changed to "circular singly linked lists" in that example - it's cleaner/easier because you don't need care about reaching the end of the list."
Oh, totally forgot about that. It could definitely make thing easier. +1

What's this variable used for and by who: time_slice_remaining?

Thanks for the tips, I'll take a deeper look at all the code and try to make it work.

Brendan · Post by **Brendan** » Tue Jul 31, 2018 4:13 pm

Hi,

Octacone wrote:It's very likely that I might have encountered a race condition. I just wasn't able to find a way to safely enable the interrupts while sending the EOI and hooping that the PIT won't fire an interrupt before my code gets a chance to call Switch_To_Thread();. There is still one problem.
How is this supposed to work:
Code: Select all
void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Hardware::Disable_Interrupts();
        Schedule();
        Hardware::Enable_Interrupts();
    }
}
Hardware::Enable_Interrupts(); would never get executed since the scheduler will have called Switch_To_Thread() and everything after that wouldn't exist anymore.

The "Hardware::Enable_Interrupts()" will get executed. Think of it like this:

Timer IRQ occurs and disables IRQs
Timer IRQ handler calls "Schedule()" which does a task switch
- The new task starts running again from wherever it was before
- The new task enables IRQs
- The new task does a bunch more stuff
- Timer IRQ occurs again and disables IRQs
- Timer IRQ handler calls "Schedule()" which does a task switch back to the task we started with
The task starts running again from wherever it was before, which is just before the timer IRQ handler enables IRQs again
Timer IRQ handler enables IRQs and returns to user-space or wherever

Octacone wrote:"Also, if a higher priority task is unblocked or created your scheduler won't notice or care."

Why should it care? It had a chance to run all the higher priority tasks before switching to the lower priority list.
Shouldn't it just wait for all other threads to complete and then notice a new thread?
If not, what about "starvation"?

It doesn't work like that - tasks are constantly starting and stopping and the schedulers list's are constantly changing. You can't say "LOL, a high priority task got woken up because the user pressed a key, but that task had it's chance 4 days ago so I'll just keep executing millions of different idle tasks while the user smashes their keyboard in frustration!".

Octacone wrote:"I'm also not sure why you've split the low level task switch into two pieces ("Save_Context()" and then "Switch_To_Thread()"). Normally you'd just have a "Switch_To_Thread()" that saves the old task's state and loads the new task's state. In that case it'd be like:"

Because my IRQ_0 (In Assembly) handler pushes all the needed registers that I then fetch by accessing multitasking_registers (in C).
It has to be done just before the "C interrupt handler part" gets called because if done later they would contain trash values that were used by C++ and not the task "data".

Yeah, don't do that - all IRQ handlers should be "void handler(void)".

The state that was saved at the start of the IRQ handler is just the state that needs to be restored when you return from the IRQ handler back to whatever was interrupted. It has nothing to do with the state of a task at the start of a task switch.

Octacone wrote:What's this variable used for and by who: time_slice_remaining?

Normally the timer IRQ is used for multiple things - keeping track of time since boot, waking up sleeping tasks, etc. For these other things you want the timer IRQ to happen reasonably often, like maybe every 1 ms, but you don't want to do a task switch every 1 ms because they're a bit expensive and the overhead of so many task switches would eat too much CPU time.

To fix that, maybe you set the timer to interrupt every 1 ms, but then during task switches you set a "time_slice_remaining" variable to 100 so that after 100 IRQs that variable has been decremented 100 times and reaches zero, so (after 100 ms) you do a task switch.

Of course then you can decide to have a different time slice length for different tasks. For example, maybe you decide that high priority tasks get 5 ms (and do "time_slice_remaining = 5" during the task switch for them), and medium priority tasks get 10 ms, and normal priority tasks got 40 ms, and idle priority tasks got 200 ms. This is fine - it won't ruin "milliseconds_since_boot" time keeping (like changing the PIT frequency would).

Cheers,

Brendan

nullplan · Post by **nullplan** » Wed Aug 01, 2018 11:54 am

Octacone wrote:Btw I've seen your static/dynamic comment, can you please explain it a bit more because I've seen it somewhere while browsing the forums. It sounds interesting, I hope it's not too complex to implement and actually worth implementing after all. Would you recommend it to someone who has just started discovering multitasking?

Well yes, I did recommend it in that post, after all. It is easy to implement, just doesn't scale well.

You probably have a structure containing all the important information about a thread, like the TID and the stack pointer. You put into this structure two new members:

Code: Select all

int static_prio;
int dyn_prio;

Now, when a thread is put into the queue of runnable threads, you set dyn_prio equal to static_prio. Then, in your scheduler, you iterate over all runnable threads, increment dyn_prio, and simultaneously look for the thread with the highest dyn_prio. That thread is the one you switch to. And when you switch to a thread, you remove it from the runnable queue (add it back in when the circumstances warrant it, e.g. if the thread was pre-empted by another thread via IRQ, or if it yielded time by calling sched_yield()).

Add some special handling for the idle thread (the one you run when there's nothing to run), and a way to set static_prio (I'd have new threads/processes inherit it from their parents, and add a system call nice() to set static_prio, and a resource limit on the prio so it can't be increased too much by users with too little privilege). But I did also see OSes getting the static priority from the spawn system call, or equivalent, as parameter. Also add some magic to prevent overflows.

This way, no thread can ever starve since its dynamic priority will at some point eclipse the priority of all other runnable threads.

If you want to be really awesome, you can add a new layer between static and dynamic priority (let's call it dynamic_start_prio), to punish processes that hog CPU by decreasing their priority without that being visible to the process. And reward processes that yield time with a little prio boost. All in reasonable limits, of course.

BTW, yielding has nothing to do with cooperative multitasking (well, maybe a little). A process yields time whenever it calls a blocking system call, or sched_yield(), which is a system call that just runs the next process but leaves the current one runnable. And most processes constantly call blocking system calls, like read() on a terminal, or poll() on a bunch of sockets. Or sleep() in case of my clock process.

This way you have prioritization using only a single queue for runnable processes. But, as mentioned, it doesn't scale well, since on every schedule you have to look at and change all runnable threads. So scheduling delays get longer the more of those you have. Which is why you might not want to implement a system tick for the time being, so long running threads can run for as long as they please until interrupted by IRQ, to reduce the number of reschedules.

Octacone · Post by **Octacone** » Wed Aug 01, 2018 1:40 pm

Brendan wrote:Hi,
Octacone wrote:It's very likely that I might have encountered a race condition. I just wasn't able to find a way to safely enable the interrupts while sending the EOI and hooping that the PIT won't fire an interrupt before my code gets a chance to call Switch_To_Thread();. There is still one problem.
How is this supposed to work:
Code: Select all
void timer_IRQ_handler(void) {
    PIC.Send_EOI(32);
    if(--time_slice_remaining == 0) {
        Hardware::Disable_Interrupts();
        Schedule();
        Hardware::Enable_Interrupts();
    }
}
Hardware::Enable_Interrupts(); would never get executed since the scheduler will have called Switch_To_Thread() and everything after that wouldn't exist anymore.
The "Hardware::Enable_Interrupts()" will get executed. Think of it like this:
Timer IRQ occurs and disables IRQs

Timer IRQ handler calls "Schedule()" which does a task switch
The new task starts running again from wherever it was before

The new task enables IRQs

The new task does a bunch more stuff

Timer IRQ occurs again and disables IRQs

Timer IRQ handler calls "Schedule()" which does a task switch back to the task we started with

The task starts running again from wherever it was before, which is just before the timer IRQ handler enables IRQs again

Timer IRQ handler enables IRQs and returns to user-space or wherever

Octacone wrote:"Also, if a higher priority task is unblocked or created your scheduler won't notice or care."

Why should it care? It had a chance to run all the higher priority tasks before switching to the lower priority list.
Shouldn't it just wait for all other threads to complete and then notice a new thread?
If not, what about "starvation"?
It doesn't work like that - tasks are constantly starting and stopping and the schedulers list's are constantly changing. You can't say "LOL, a high priority task got woken up because the user pressed a key, but that task had it's chance 4 days ago so I'll just keep executing millions of different idle tasks while the user smashes their keyboard in frustration!".

Octacone wrote:"I'm also not sure why you've split the low level task switch into two pieces ("Save_Context()" and then "Switch_To_Thread()"). Normally you'd just have a "Switch_To_Thread()" that saves the old task's state and loads the new task's state. In that case it'd be like:"

Because my IRQ_0 (In Assembly) handler pushes all the needed registers that I then fetch by accessing multitasking_registers (in C).
It has to be done just before the "C interrupt handler part" gets called because if done later they would contain trash values that were used by C++ and not the task "data".
Yeah, don't do that - all IRQ handlers should be "void handler(void)".

The state that was saved at the start of the IRQ handler is just the state that needs to be restored when you return from the IRQ handler back to whatever was interrupted. It has nothing to do with the state of a task at the start of a task switch.

Octacone wrote:What's this variable used for and by who: time_slice_remaining?
Normally the timer IRQ is used for multiple things - keeping track of time since boot, waking up sleeping tasks, etc. For these other things you want the timer IRQ to happen reasonably often, like maybe every 1 ms, but you don't want to do a task switch every 1 ms because they're a bit expensive and the overhead of so many task switches would eat too much CPU time.

To fix that, maybe you set the timer to interrupt every 1 ms, but then during task switches you set a "time_slice_remaining" variable to 100 so that after 100 IRQs that variable has been decremented 100 times and reaches zero, so (after 100 ms) you do a task switch.

Of course then you can decide to have a different time slice length for different tasks. For example, maybe you decide that high priority tasks get 5 ms (and do "time_slice_remaining = 5" during the task switch for them), and medium priority tasks get 10 ms, and normal priority tasks got 40 ms, and idle priority tasks got 200 ms. This is fine - it won't ruin "milliseconds_since_boot" time keeping (like changing the PIT frequency would).

Cheers,

Brendan

After reading all this I fell like I know what you mean and don't at the same time.
They way I implemented multitasking differs from your way of thinking and that is what confuses me. So lets start again:

Brendan wrote:The "Hardware::Enable_Interrupts()" will get executed. Think of it like this:
Timer IRQ occurs and disables IRQs

Timer IRQ handler calls "Schedule()" which does a task switch
The new task starts running again from wherever it was before

The new task enables IRQs

The new task does a bunch more stuff

Timer IRQ occurs again and disables IRQs

Timer IRQ handler calls "Schedule()" which does a task switch back to the task we started with

The task starts running again from wherever it was before, which is just before the timer IRQ handler enables IRQs again

Timer IRQ handler enables IRQs and returns to user-space or wherever

Timer IRQ occurs and disables IRQs OK
Timer IRQ handler calls "Schedule()" which does a task switch OK
The new task starts running again from wherever it was before What if it was switched to for the first time
The new task enables IRQs Where exactly, is this something automatic or a programmer has to take care of this?
The new task does a bunch more stuff You mean stuff that it's intended to do? Like drawing something or playing a song for e.g.
Timer IRQ occurs again and disables IRQs OK so we are at the first step again
Timer IRQ handler calls "Schedule()" which does a task switch back to the task we started with This part I don't get, the task we started with? You mean the "kernel" task that kick-started this whole thing?
The task starts running again from wherever it was before, which is just before the timer IRQ handler enables IRQs again Just before it was preempted, inside the scheduler?
Timer IRQ handler enables IRQs and returns to user-space or wherever How? Isn't the kernel/user code flow lost forever since we have never saved it? There is no way to get to Hardware::Enable_Interrupts()? I am lost at this point. When my scheduler calls Switch_To_Thread let's say that threads starts running and executing stuff and enabled the interrupts and another interrupt happens switching to another threads and another and another and we could never reach Hardware::Enable_Interrupts() if we did that, threads would be constantly switched between back and forth never allowing any kind of return after the Scheduler(); point. What if we created a special kernel thread that continued execution of the kernel stuff itself like code that comes after Multitasking.Initialize(); inside kernel main, then we would still have to wait for n tasks to loop around until it gains back the control. Again I'm really confused with this.

Brendan wrote: Yeah, don't do that - all IRQ handlers should be "void handler(void)".

Then where should I get my multitasking_registers from? They would have to be saved before entering the handler.

Brendan wrote: The state that was saved at the start of the IRQ handler is just the state that needs to be restored when you return from the IRQ handler back to whatever was interrupted. It has nothing to do with the state of a task at the start of a task switch.

This I don't get. I'm not saving that state anywhere so how am I supposed to be returning to it. and wouldn't that stated be returned to ever n threads?
Also how do I jump start multitasking after all. This is what I do:

Code: Select all

Hardware::Disable_Interrupts();
    PIT.Update_Multitasking_Status(true);
    Hardware::Enable_Interrupts();
    currently_running_priority = high;
    first_high_priority_thread_switched = true;
    Switch_To_Thread(first_high_priority_thread->esp);

So next time the interrupt appears this task would get saved (for the first time since we just started switching threads) properly etc.

Brendan wrote: To fix that, maybe you set the timer to interrupt every 1 ms, but then during task switches you set a "time_slice_remaining" variable to 100 so that after 100 IRQs that variable has been decremented 100 times and reaches zero, so (after 100 ms) you do a task switch.

Of course then you can decide to have a different time slice length for different tasks. For example, maybe you decide that high priority tasks get 5 ms (and do "time_slice_remaining = 5" during the task switch for them), and medium priority tasks get 10 ms, and normal priority tasks got 40 ms, and idle priority tasks got 200 ms. This is fine - it won't ruin "milliseconds_since_boot" time keeping (like changing the PIT frequency would).

My timer is currently set to 1000 Hz aka 1 ms.
100 IRQs equals 100ms, wouldn't an end user notice that slowness?
Why do lower priority task get less time_slice_remaining? That means that higher priority threads will be switched more often than the lower priority ones. Will that interfere with my queue code? Is that a replacement or extension for my scheduler? Is time_slice_remaining tied to priority scheduling? Is it mandatory by design or just a neat extension?

Octacone · Post by **Octacone** » Wed Aug 01, 2018 1:45 pm

nullplan wrote:
Octacone wrote:Btw I've seen your static/dynamic comment, can you please explain it a bit more because I've seen it somewhere while browsing the forums. It sounds interesting, I hope it's not too complex to implement and actually worth implementing after all. Would you recommend it to someone who has just started discovering multitasking?
Well yes, I did recommend it in that post, after all. It is easy to implement, just doesn't scale well.

You probably have a structure containing all the important information about a thread, like the TID and the stack pointer. You put into this structure two new members:
Code: Select all
int static_prio;
int dyn_prio;
Now, when a thread is put into the queue of runnable threads, you set dyn_prio equal to static_prio. Then, in your scheduler, you iterate over all runnable threads, increment dyn_prio, and simultaneously look for the thread with the highest dyn_prio. That thread is the one you switch to. And when you switch to a thread, you remove it from the runnable queue (add it back in when the circumstances warrant it, e.g. if the thread was pre-empted by another thread via IRQ, or if it yielded time by calling sched_yield()).

Add some special handling for the idle thread (the one you run when there's nothing to run), and a way to set static_prio (I'd have new threads/processes inherit it from their parents, and add a system call nice() to set static_prio, and a resource limit on the prio so it can't be increased too much by users with too little privilege). But I did also see OSes getting the static priority from the spawn system call, or equivalent, as parameter. Also add some magic to prevent overflows.

This way, no thread can ever starve since its dynamic priority will at some point eclipse the priority of all other runnable threads.

If you want to be really awesome, you can add a new layer between static and dynamic priority (let's call it dynamic_start_prio), to punish processes that hog CPU by decreasing their priority without that being visible to the process. And reward processes that yield time with a little prio boost. All in reasonable limits, of course.

BTW, yielding has nothing to do with cooperative multitasking (well, maybe a little). A process yields time whenever it calls a blocking system call, or sched_yield(), which is a system call that just runs the next process but leaves the current one runnable. And most processes constantly call blocking system calls, like read() on a terminal, or poll() on a bunch of sockets. Or sleep() in case of my clock process.

This way you have prioritization using only a single queue for runnable processes. But, as mentioned, it doesn't scale well, since on every schedule you have to look at and change all runnable threads. So scheduling delays get longer the more of those you have. Which is why you might not want to implement a system tick for the time being, so long running threads can run for as long as they please until interrupted by IRQ, to reduce the number of reschedules.

Thanks for the clarification. I think this is the way Linux does it or at least it did. I'll have to fix my current design first and then move onto "more advanced" concepts. I will consider implementing this at one point.
Does this work well in your OS, is your GUI responsive enough without and longer wait periods?

Brendan · Post by **Brendan** » Wed Aug 01, 2018 11:15 pm

Hi,

Octacone wrote:
Brendan wrote:Yeah, don't do that - all IRQ handlers should be "void handler(void)".
Then where should I get my multitasking_registers from? They would have to be saved before entering the handler.

I think this is at the heart of the problem.

Let's assume there's a task called "TaskA". Most of the time TaskA is running user-space code that searches for prime numbers; but some of the time TaskA runs kernel code, either because it called the kernel API, or there was an exception, or there was an IRQ.

Sometimes while running kernel code the kernel decides to do a task switch. It doesn't matter much why the kernel's code is running when this happens (if kernel's code is running because user-space code called the kernel API, or if there was a timer IRQ, or...); the important thing to understand is that when task switches happen the CPU is always running kernel code, and all task switches only ever happen between tasks that are running kernel code.

Now, the low level task switch code might look like this (simplified a little and assuming 32-bit 80x86 with "CDECL" calling conventions):

Code: Select all

goto_task:
    ;Save previous task's general purpose registers
    ;
    ;Note: EAX, ECX and EDX are "caller saved" and therefore we can assume the caller saved them and don't need to save them again

    push ebx
    push esi
    push edi
    push ebp

    ;Save previous task's other registers
    ;
    ;Note: While running kernel code segment registers are always the same for all tasks, therefore they don't need to be saved or loaded at all

    ;Save previous task's "kernel ESP"

    mov edi,[currentTaskInfoBlock]
    mov [edi + TaskInfoBlock.ESP],esp

    ;Update "currentTaskInfoBlock" variable so it points to the new task's information

    mov esi,[esp+5*4]
    mov [currentTaskInfoBlock],esi

    ;Load next task's "kernel ESP"

    mov esp,[esi + TaskInfoBlock.ESP]

    ;Load next task's other registers (there are none!)

    ;Load next task's general purpose registers

    pop ebp
    pop edi
    pop esi
    pop ebx

    ;Return to the next task

    ret

Let's assume there's also another task, called "TaskB". Most of the time TaskB is running user-space code that generates incredibly realistic photo-quality images using ray tracing (lots of calculations); but (just like TaskA and all other tasks) some of the time TaskB runs kernel code, either because it called the kernel API, or there was an exception, or there was an IRQ.

Now, think of it like this

TaskA is happily searching for prime numbers, minding its own business, and an IRQ occurs. Note: It does not matter what the IRQ is - could be timer, but could be disk or network or...
The IRQ causes the CPU to switch to kernel code; so now TaskA is running kernel code.
The kernel starts handling the IRQ. Maybe the IRQ was timer and the IRQ handler updates a "ticks_since_boot" variable, maybe the IRQ was disk and the IRQ handler tells the disk controller to start the next transfer in the queue, maybe the IRQ was network or keyboard or something else. It doesn't matter.
Eventually the kernel decides to do a task switch. Maybe it's because the TaskA used all of its time slice, maybe it's because a higher priority task that was waiting for data from swap space is preempting TaskA, and maybe it's any of about 100 different reasons. It doesn't matter, it only matters that the kernel felt like doing a task switch.
The piece of kernel code that decided to do a task switch disables IRQs just before calling the scheduler. In fact; all of the many different pieces of kernel code that could decide to do a task switch all disable IRQs before calling the scheduler.
The task switch code saves "kernel state for TaskA" and loads "kernel state for TaskB", and then returns to whatever TaskB was doing when it stopped getting CPU time last (which is guaranteed to be kernel code, because task switches only ever happen between tasks running kernel code).
- TaskB starts running kernel code; and the first thing that kernel code does is enable IRQs again; because all of the many different pieces of kernel code that could decide to do a task switch disable IRQs before calling the scheduler and then enable IRQs when the scheduler returns to the same task.
- TaskB finishes whatever it was doing in the kernel (which could be almost anything - kernel API call, mouse IRQ, exception, ... - it doesn't matter what)
- TaskB returns to user-space; and continues doing the ray-tracing to generate an nice looking picture.
- Eventually; TaskB is interrupted by an IRQ (it could be the timer IRQ, but it could be disk or network or.. - it still doesn't matter).
- The IRQ causes the CPU to switch to kernel code; so now TaskB is running kernel code.
- The kernel starts handling the IRQ (which depends on what kind of IRQ it was, but it doesn't really matter what the kernel does or which IRQ it is).
- Eventually the kernel decides to do a task switch. Again, it doesn't matter why (maybe it was because TaskB used all its time slice, maybe it's because a higher priority task is preempting, etc).
- The piece of kernel code that decided to do a task switch disables IRQs just before calling the scheduler.
- The task switch code saves "kernel state for TaskB" and loads "kernel state for TaskA", and then returns to whatever TaskA was doing when it stopped getting CPU time last.
TaskA starts running kernel code; and the first thing that kernel code does is enable IRQs again.
TaskA finishes whatever it was doing in the kernel (which could be almost anything - kernel API call, mouse IRQ, exception, ... - it doesn't matter what)
TaskA returns to user-space; and continues searching for those elusive prime numbers

Octacone wrote:What if it was switched to for the first time

When the kernel creates a new task, the kernel creates a new "kernel stack" for it, and then puts data on the new kernel stack so that the scheduler's "goto_task" code will be happy. This includes a "return EIP" that might point to some sort of task initialisation code, so that when the scheduler's "goto_task" is finished switching to the new task it does "RET" which returns to the task initialisation code.

Octacone wrote:
Brendan wrote:Yeah, don't do that - all IRQ handlers should be "void handler(void)".
Then where should I get my multitasking_registers from? They would have to be saved before entering the handler.

The state that is saved is the state when "goto_task" is called (not some stale/obsolete state from when CPU switched from user-space to kernel that isn't important at all); and the state that is loaded is the state from when "goto_task" was called. This means that when the kernel calls "goto_task" it seems like nothing happens (except that time passes while the CPU is running other tasks) because the state from before calling "goto_task" is the state after returning from "goto_task".

Octacone wrote:My timer is currently set to 1000 Hz aka 1 ms.
100 IRQs equals 100ms, wouldn't an end user notice that slowness?

Not if the scheduler is written properly.

Let's assume that there's TaskA that wants to use lots of CPU time finding prime numbers, and TaskB that wants to use lots of CPU time doing ray tracing. Both of these would be very low priority because they use lots of CPU time and don't have to respond to anything quickly. Now let's also assume that there's a TaskC that is a text editor, which spends almost all of its time waiting for the user to press a key. Because TaskC uses very little CPU time and has to respond quickly, it would be a high priority task.

What if the scheduler gives TaskA 123456 ms, then gives TaskB 123456 ms, then gives TaskA 123456 ms, and so on? That's fine because the user isn't sitting there waiting for these tasks to get CPU time and respond.

Now, what if the scheduler gives TaskA 123456 ms but immediately after that the user presses a key? When this happens, taskC is unblocked, and because TaskC is much higher priority than TaskA, the scheduler immediately does a task switch (TaskC preempts TaskA). It doesn't matter how much time TaskA had left, the task switch happens immediately anyway. Of course TaskC will use very little CPU time to handle the key press and will then block again (waiting for the next key to be pressed), and when TaskC blocks the scheduler will have to find something else for the CPU to do, so it'll probably give TaskA CPU time again. Because of this; it doesn't matter how much time the scheduler gave TaskC because TaskC will block anyway.

Brendan wrote:For example, maybe you decide that high priority tasks get 5 ms (and do "time_slice_remaining = 5" during the task switch for them), and medium priority tasks get 10 ms, and normal priority tasks got 40 ms, and idle priority tasks got 200 ms. This is fine - it won't ruin "milliseconds_since_boot" time keeping (like changing the PIT frequency would).

That means that higher priority threads will be switched more often than the lower priority ones.[/quote]

Higher priority tasks are supposed to block before their time slice runs out, so they're given smaller time slices (e.g. 5 ms instead of 200 ms) in case they don't do what they're supposed to do.

Low priority tasks (CPU hogs that don't block) are the opposite - they're supposed to use all the CPU time they can get; and when there's 2 or more of them you don't want to waste CPU time on the overhead of millions of task switches that won't make any difference, so they get much larger time slices (and any more important/higher priority work needs to be done, these low priority tasks will be preempted anyway).

Cheers,

Brendan

nullplan · Post by **nullplan** » Thu Aug 02, 2018 1:43 am

Octacone wrote:Thanks for the clarification. I think this is the way Linux does it or at least it did. I'll have to fix my current design first and then move onto "more advanced" concepts. I will consider implementing this at one point.
Does this work well in your OS, is your GUI responsive enough without and longer wait periods?

Linux might have done it this way at some point, OS9 still does it this way. And they sell that as RTOS.

Linux now uses "completely fair queuing", in which they put all threads into a priority queue (well, they use a red-black-tree, but it is used like a priority queue) to find the next runnable thread, which is the one that has gotten the least amount of run time yet. They key the PQ by run-time and search for the minimum.
I once read an interesting article about this, but can't find the link at the moment.

As for response time: My OS is still in its infancy, I'm more of a lurker and theoretician. I hope to be able to host a shell by the end of the year. However, I use OS9 at work, and it works pretty much as expected: As long as no other process is creating load, the GUI response time is as expected for a 400MHz CPU with no graphics accelerator. If a high-priority process is loading the system down, though, response times go through the roof. The processes never starve, though. It might take a second or two for a button press to be handled, but it is handled. But then, the system also uses a 10ms system tick.

Octacone · Post by **Octacone** » Thu Aug 02, 2018 1:56 pm

Brendan wrote:Hi,

Octacone wrote:
Brendan wrote:Yeah, don't do that - all IRQ handlers should be "void handler(void)".
Then where should I get my multitasking_registers from? They would have to be saved before entering the handler.
I think this is at the heart of the problem.

Let's assume there's a task called "TaskA". Most of the time TaskA is running user-space code that searches for prime numbers; but some of the time TaskA runs kernel code, either because it called the kernel API, or there was an exception, or there was an IRQ.

Sometimes while running kernel code the kernel decides to do a task switch. It doesn't matter much why the kernel's code is running when this happens (if kernel's code is running because user-space code called the kernel API, or if there was a timer IRQ, or...); the important thing to understand is that when task switches happen the CPU is always running kernel code, and all task switches only ever happen between tasks that are running kernel code.

Now, the low level task switch code might look like this (simplified a little and assuming 32-bit 80x86 with "CDECL" calling conventions):
Code: Select all
goto_task:
    ;Save previous task's general purpose registers
    ;
    ;Note: EAX, ECX and EDX are "caller saved" and therefore we can assume the caller saved them and don't need to save them again

    push ebx
    push esi
    push edi
    push ebp

    ;Save previous task's other registers
    ;
    ;Note: While running kernel code segment registers are always the same for all tasks, therefore they don't need to be saved or loaded at all

    ;Save previous task's "kernel ESP"

    mov edi,[currentTaskInfoBlock]
    mov [edi + TaskInfoBlock.ESP],esp

    ;Update "currentTaskInfoBlock" variable so it points to the new task's information

    mov esi,[esp+5*4]
    mov [currentTaskInfoBlock],esi

    ;Load next task's "kernel ESP"

    mov esp,[esi + TaskInfoBlock.ESP]

    ;Load next task's other registers (there are none!)

    ;Load next task's general purpose registers

    pop ebp
    pop edi
    pop esi
    pop ebx

    ;Return to the next task

    ret
Let's assume there's also another task, called "TaskB". Most of the time TaskB is running user-space code that generates incredibly realistic photo-quality images using ray tracing (lots of calculations); but (just like TaskA and all other tasks) some of the time TaskB runs kernel code, either because it called the kernel API, or there was an exception, or there was an IRQ.

Now, think of it like this
TaskA is happily searching for prime numbers, minding its own business, and an IRQ occurs. Note: It does not matter what the IRQ is - could be timer, but could be disk or network or...

The IRQ causes the CPU to switch to kernel code; so now TaskA is running kernel code.

The kernel starts handling the IRQ. Maybe the IRQ was timer and the IRQ handler updates a "ticks_since_boot" variable, maybe the IRQ was disk and the IRQ handler tells the disk controller to start the next transfer in the queue, maybe the IRQ was network or keyboard or something else. It doesn't matter.

Eventually the kernel decides to do a task switch. Maybe it's because the TaskA used all of its time slice, maybe it's because a higher priority task that was waiting for data from swap space is preempting TaskA, and maybe it's any of about 100 different reasons. It doesn't matter, it only matters that the kernel felt like doing a task switch.

The piece of kernel code that decided to do a task switch disables IRQs just before calling the scheduler. In fact; all of the many different pieces of kernel code that could decide to do a task switch all disable IRQs before calling the scheduler.

The task switch code saves "kernel state for TaskA" and loads "kernel state for TaskB", and then returns to whatever TaskB was doing when it stopped getting CPU time last (which is guaranteed to be kernel code, because task switches only ever happen between tasks running kernel code).

TaskB starts running kernel code; and the first thing that kernel code does is enable IRQs again; because all of the many different pieces of kernel code that could decide to do a task switch disable IRQs before calling the scheduler and then enable IRQs when the scheduler returns to the same task.

TaskB finishes whatever it was doing in the kernel (which could be almost anything - kernel API call, mouse IRQ, exception, ... - it doesn't matter what)

TaskB returns to user-space; and continues doing the ray-tracing to generate an nice looking picture.

Eventually; TaskB is interrupted by an IRQ (it could be the timer IRQ, but it could be disk or network or.. - it still doesn't matter).

The IRQ causes the CPU to switch to kernel code; so now TaskB is running kernel code.

The kernel starts handling the IRQ (which depends on what kind of IRQ it was, but it doesn't really matter what the kernel does or which IRQ it is).

Eventually the kernel decides to do a task switch. Again, it doesn't matter why (maybe it was because TaskB used all its time slice, maybe it's because a higher priority task is preempting, etc).

The piece of kernel code that decided to do a task switch disables IRQs just before calling the scheduler.

The task switch code saves "kernel state for TaskB" and loads "kernel state for TaskA", and then returns to whatever TaskA was doing when it stopped getting CPU time last.

TaskA starts running kernel code; and the first thing that kernel code does is enable IRQs again.

TaskA finishes whatever it was doing in the kernel (which could be almost anything - kernel API call, mouse IRQ, exception, ... - it doesn't matter what)

TaskA returns to user-space; and continues searching for those elusive prime numbers

Octacone wrote:What if it was switched to for the first time
When the kernel creates a new task, the kernel creates a new "kernel stack" for it, and then puts data on the new kernel stack so that the scheduler's "goto_task" code will be happy. This includes a "return EIP" that might point to some sort of task initialisation code, so that when the scheduler's "goto_task" is finished switching to the new task it does "RET" which returns to the task initialisation code.

Octacone wrote:
Brendan wrote:Yeah, don't do that - all IRQ handlers should be "void handler(void)".
Then where should I get my multitasking_registers from? They would have to be saved before entering the handler.
The state that is saved is the state when "goto_task" is called (not some stale/obsolete state from when CPU switched from user-space to kernel that isn't important at all); and the state that is loaded is the state from when "goto_task" was called. This means that when the kernel calls "goto_task" it seems like nothing happens (except that time passes while the CPU is running other tasks) because the state from before calling "goto_task" is the state after returning from "goto_task".

Octacone wrote:My timer is currently set to 1000 Hz aka 1 ms.
100 IRQs equals 100ms, wouldn't an end user notice that slowness?
Not if the scheduler is written properly.

Let's assume that there's TaskA that wants to use lots of CPU time finding prime numbers, and TaskB that wants to use lots of CPU time doing ray tracing. Both of these would be very low priority because they use lots of CPU time and don't have to respond to anything quickly. Now let's also assume that there's a TaskC that is a text editor, which spends almost all of its time waiting for the user to press a key. Because TaskC uses very little CPU time and has to respond quickly, it would be a high priority task.

What if the scheduler gives TaskA 123456 ms, then gives TaskB 123456 ms, then gives TaskA 123456 ms, and so on? That's fine because the user isn't sitting there waiting for these tasks to get CPU time and respond.

Now, what if the scheduler gives TaskA 123456 ms but immediately after that the user presses a key? When this happens, taskC is unblocked, and because TaskC is much higher priority than TaskA, the scheduler immediately does a task switch (TaskC preempts TaskA). It doesn't matter how much time TaskA had left, the task switch happens immediately anyway. Of course TaskC will use very little CPU time to handle the key press and will then block again (waiting for the next key to be pressed), and when TaskC blocks the scheduler will have to find something else for the CPU to do, so it'll probably give TaskA CPU time again. Because of this; it doesn't matter how much time the scheduler gave TaskC because TaskC will block anyway.

Brendan wrote:
Brendan wrote:For example, maybe you decide that high priority tasks get 5 ms (and do "time_slice_remaining = 5" during the task switch for them), and medium priority tasks get 10 ms, and normal priority tasks got 40 ms, and idle priority tasks got 200 ms. This is fine - it won't ruin "milliseconds_since_boot" time keeping (like changing the PIT frequency would).
That means that higher priority threads will be switched more often than the lower priority ones.
Higher priority tasks are supposed to block before their time slice runs out, so they're given smaller time slices (e.g. 5 ms instead of 200 ms) in case they don't do what they're supposed to do.

Low priority tasks (CPU hogs that don't block) are the opposite - they're supposed to use all the CPU time they can get; and when there's 2 or more of them you don't want to waste CPU time on the overhead of millions of task switches that won't make any difference, so they get much larger time slices (and any more important/higher priority work needs to be done, these low priority tasks will be preempted anyway).

Cheers,

Brendan

Dude I'm so sorry for bothering you but it looks like I'm really dumb

, that's it. I still don't get it completely.

This is what I've learned:
*Each task should be given a time slice aka quantum, higher priority tasks get smaller quantum while lower priority ones get bigger.
*Interrupts should only preempt tasks if they run out of their quantum aka don't block on time.
*Most of the task switches don't happen because of the timer but because the task called System_Call_Block_For_This_Thread_And_Put_It_To_Sleep_And_Switch_To_Another_Thread_While_Waiting();
*Blocks happen because the task is waiting for something to happen that isn't instant like writing some sectors or copying some data over or sleeping etc...
*Higher priority task block more often, lower priority ones "squeeze" all the juice from the CPU and they don't need to be switched all the time because it's pointless, let them do their stuff and move along.
*I don't need to save any segment registers because when the interrupt / system call happens we'll be automatically switched to kernel mode anyways.

What I thought I knew but after reading all of this don't seem to know:
*What is a proper way to initialize multitasking? What should my Initialize(); function contain? To just make a fake stack frame (like I'm doing atm) and do an iretd while popping all the necessary stuff.
*What should I set my initial EBP to? When creating a brand new task.
*How am I supposed to save the registers without using multitasking_registers_t* pointer returned by my timer handler? C to Assembly is a one way street? Inline assembly to save them maybe?
*Should there be a standalone kernel thread? I don't see its purpose, nor how to create it.
*How is my task supposed to enable IRQs? Within Switch_To_Thread(); function?
*Again I can't understand how is your idea supposed to work. There is no way to return to Hardware::Enable_Interrupts(); since threads don't return anywhere. One you switch context there is not way to get back to any kernel code (except if an IRQ occurs but that will just take just to Hardware::Disable_Interrupts();) after that point. I know I might be wrong, again I don't understand this at all.
*Circular linked list doesn't have an end, how am I supposed to check for a null pointer in order to determine when to switch to the next lower priority queue?

Brendan · Post by **Brendan** » Thu Aug 02, 2018 6:13 pm

Hi,

Octacone wrote:This is what I've learned:
*Each task should be given a time slice aka quantum, higher priority tasks get smaller quantum while lower priority ones get bigger.
*Interrupts should only preempt tasks if they run out of their quantum aka don't block on time.

Most interrupts (e.g. disk controller's IRQ, network card's IRQ, USB controller's IRQ, etc) should (possibly indirectly, after a sequence of things) cause the currently running task to be preempted (without waiting for the currently running task's time slice to end) whenever they cause a higher priority task to unblock.

For example (for a monolithic kernel); maybe you get an IRQ from the USB controller; so the USB controller driver does some work and sends some data to a USB flash driver, then the USB flash diver does some work and sends some data to some file system code, then the file system code does some work and sends some data to the VFS layer, then the VFS layer does some work and realises that 4 different tasks were all waiting for that data (e.g. because those tasks called "read()" or maybe because of page faults in a memory mapped file area) and tells scheduler to unblock those 4 tasks. Then, after all these separate pieces have all done a little work (that began with an IRQ), the scheduler realises that one of the tasks that was unblocked has higher priority than the currently running task and does an immediate task switch.

The timer interrupt is little special because it's more direct (there's no "driver calls driver that calls driver that..." involved); and because it handles 2 different causes of task switches - waking up sleeping tasks (e.g. if a task calls "sleep(2);" then the task will block, and then 2 seconds later the timer interrupt will realise 2 seconds has passed and unblock the task again) and check for "end of time slice".

Octacone wrote:*Most of the task switches don't happen because of the timer but because the task called System_Call_Block_For_This_Thread_And_Put_It_To_Sleep_And_Switch_To_Another_Thread_While_Waiting();

I'd say that (for crude guesses - it depends too much on what kinds of programs the user feels like running, the scheduler's algorithms, etc):

50% of task switches are caused by the currently running task blocking (either because they called kernel API function like "sleep()" or "read()" or...; or because of a page fault where the page fault handler had to make the task wait for data or memory)
25% of task switches are caused by a higher priority task unblocking and preempting the currently running task
10% of task switches are caused by a task using its entire time slice
8% of task switches are caused by tasks being terminated (or crashing)
4% of task switches are caused by tasks being spawned
3% of task switches are caused by miscellaneous stuff (e.g. tasks reducing their own priority)

Note that I'd recommend having 2 general purpose functions in your scheduler - one that other kernel code can use for blocking the currently running task, like "block_task(int reason);"; and another that other kernel code can use for unblocking a task, like "unblock_task(task_info * task);".

For "block_task(int reason);" the scheduler would update the currently running task's state (set the reason why the task is blocked) and remove the task from whichever list of list of ready to run tasks it was on, and then call "Schedule()" to choose another task to give CPU time to.

For "unblock_task(task_info * task);"; the scheduler would update the requested task's state (e.g. set the reason why the task is blocked to "none") and put the task back on a list of tasks that are ready to run; then check if the unblocked task should preempt or not, and if it should preempt the scheduler would call "goto_task()" directly (skipping the overhead of choosing which task to switch to because it already knows).

Also don't forget that a lot of the time (especially for micro-kernels - not quite so much for monolithic kernels) tasks switches are caused by tasks waiting to receive data from other tasks, and then unblocking when some other task sends data to them; where no devices (and no IRQs) are involved at all. For a simple example, one task might call "printf()" causing it to write data to STDOUT, and then a shell or a terminal emulator might be unblocked because it was waiting to receive that data, and because the shell or terminal emulator is likely to be higher priority it will probably preempt; so you end up with "printf() causes a task switch".

Octacone wrote:What I thought I knew but after reading all of this don't seem to know:
*What is a proper way to initialize multitasking? What should my Initialize(); function contain? To just make a fake stack frame (like I'm doing atm) and do an iretd while popping all the necessary stuff.

I like to think of it like this: When the first instruction in the boot loader is executed the OS is already running a task. The problem is that this initial task has none of the meta-data that the kernel wants tasks to have (e.g. where that meta-data might be some kind of "task info block" containing the task's name, how much CPU time the task has used, which virtual address space the task uses, what state the task is currently in, etc; plus maybe an entry in a list of running tasks that the scheduler uses to determine which task gets CPU time next). When the kernel initialises multi-tasking what it's really doing is creating the missing meta-data for a task that already exists.

Octacone wrote:*What should I set my initial EBP to? When creating a brand new task.

If you're using a language and calling convention that uses "stack frames", and if you didn't tell the compiler to omit stack frames to improve performance, and if you have some kind of kernel debugger that relies on EBP to find stack frames (and don't have a more modern debugger that doesn't rely on EBP to find stack frames); then I'd be tempted to set EBP to zero so that it's easier for the debugger to know when a "back trace" can't trace back further (to before the task existed). For all other cases EBP is just a normal register and you can do whatever you like with it.

Octacone wrote:*How am I supposed to save the registers without using multitasking_registers_t* pointer returned by my timer handler? C to Assembly is a one way street? Inline assembly to save them maybe?

I'd write a function in pure assembly (not in inline assembly where compiler assumes it has complete control of the stack) to do the low-level task switch; similar to the "goto_task:" example code I showed before. For C code to call your function you'll probably need to tell the compiler that the function is "extern" or something; and you will need to write your assembly to match the calling conventions that your C compiler expects.

Octacone wrote:*Should there be a standalone kernel thread? I don't see its purpose, nor how to create it.

You already have one kernel thread (the initial task that has existed since boot); but (assuming its a monolithic kernel again) eventually you're probably going to want more kernel threads to do various things (e.g. update DNS cache in the background, figure out which pages to send to swap or prefetch, etc).

Note that there are 2 different ways of handling idle CPUs. For some kernels, if there's no tasks that can be run, the scheduler waits (e.g. does HLT until a task is unblocked) and during this time no task is running at all. For other kernels there's a special "idle task" (that uses the lowest priority possible and never blocks) to make sure that there's always at least one task that the scheduler can give CPU time to.

The latter is easier because you don't need to handle the "no task running" case anywhere; but the former is a little more powerful (especially when you start doing power management and need to do things like change CPU speed).

Octacone wrote:*How is my task supposed to enable IRQs? Within Switch_To_Thread(); function?

No; you'd disable IRQs in all the pieces of code that call the "Switch_To_Thread()" or the "Schedule()" function (before calling the function); and you'd enable IRQs again in all the pieces of code that call the "Switch_To_Thread()" or the "Schedule()" function (after the function returns). The "Schedule()" and "Switch_To_Thread()" functions wouldn't touch interrupts at all.

Part of the reason for this is to avoid race conditions. For example, you might have many pieces of code scattered throughout the kernel that do some work and then ask scheduler to do a task switch, where you do not want an IRQ happening in between the work and the task switch; so you disable IRQs and then do the work and then do the task switch to make sure an IRQ can't interrupt between the work and the task switch.

Octacone wrote:*Again I can't understand how is your idea supposed to work. There is no way to return to Hardware::Enable_Interrupts(); since threads don't return anywhere. One you switch context there is not way to get back to any kernel code (except if an IRQ occurs but that will just take just to Hardware::Disable_Interrupts();) after that point. I know I might be wrong, again I don't understand this at all.

Imagine if "Switch_To_Thread()" did literally nothing, like this:

Code: Select all

void Switch_To_Thread(thread_info * thread) {
   // Do nothing at all
}

And you have a piece of code somewhere that does this:

Code: Select all

    Hardware::Disable_Interrupts();
    Switch_To_Thread(thread);
    Hardware::Enable_Interrupts();

Obviously in this case the "Hardware::Enable_Interrupts();" would be executed, because "Switch_To_Thread(thread);" doesn't do anything at all.

Now, what if your task calls "Switch_To_Thread(thread);" and it did a task switch and then 123 other tasks were given CPU time; but eventually your task gets CPU time again and "Switch_To_Thread(thread);" returns as if nothing happened? This is almost exactly the same as before - the only difference is that (as far as you can tell) "Switch_To_Thread(thread);" took a long time to do nothing. It still returns like any other function would; and "Hardware::Enable_Interrupts();" is still executed.

Octacone wrote:*Circular linked list doesn't have an end, how am I supposed to check for a null pointer in order to determine when to switch to the next lower priority queue?

For a circular linked list, if there's anything on the list you have a pointer to something on the list, and if there's nothing on the list you can't have a pointer to something on the list (the pointer to something on the list is NULL).

This was shown in the example code I posted previously:

Code: Select all

// WARNING: Caller is responsible for making sure IRQs are disabled before calling this!

void Schedule(void) {
    something * thread;

    if(high_priority_queue != null) {

        thread = high_priority_queue;
        high_priority_queue = thread->next;

    } else if(medium_priority_queue != null) {

        thread = medium_priority_queue;
        medium_priority_queue = thread->next;

    } else if(normal_priority_queue != null) {

        thread = normal_priority_queue;
        normal_priority_queue = thread->next;

    } else if(idle_priority_queue != null) {

        thread = idle_priority_queue;
        idle_priority_queue = thread->next;

    } else {

        // No tasks are running!

    }

    Switch_To_Thread(thread);
}

Cheers,

Brendan

OSDev.org

Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation

Re: Round Robin Priority Implementation