Multi-tasking issue with hardware task switching (Intel)
Posted: Sun Apr 11, 2021 3:06 am
Hello,
I wrote a preemptive multitask kernel that fully uses Intel (i386) hardware tasks.
Exceptions and hardware interrupts are handled through task gates. Consequently when an exception or a hardware interrupt is triggered, the CPU automatically switches to the corresponding task (handler).
The scheduler is a task as well. It is triggered by the hardware timer (PIT) at a frequency of 100Hz.
My kernel is very simple: the number of tasks is fixed and the scheduler simply switches to the next task to be executed in a round-robin fashion.
To execute the next task, here is what I do:
1) Update the EFLAGS register to set the NT (Nested Task) bit to 1: this tells the CPU we're in a nested task
2) Set the busy flag in the TSS descriptor of the next task we're about to switch to
3) Make the next task to be the current task's parent, by updating the current task's TSS previous task link with the TSS selector of the next task. Note that the current task is the scheduler itself.
4) Set the scheduler task to be the parent of the next task to schedule so that executing iret in the next task will get back to the scheduler
5) Execute the iret instruction to switch to the next task
This works well, each task gets scheduled properly in a round-robin fashion as expected.
However, this implementation suffers from two issues:
- Executing iret in a task to go back to the scheduler triggers a general protection fault (GPF)
- Once in the exception handler which is also a task, I'm not able to go back to the scheduler more than once without triggering a double fault exception.
In the exception handler, I would like to "kill" the offending task that triggered the exception (such as GPF). The idea is to mark this task as non-schedulable anymore and switch to the scheduler.
These are the steps I wrote to do this:
1) Mark the task as not schedulable anymore (it's a boolean I keep in the meta-data of each task); the scheduler will simply not schedule this task anymore.
2) Update the EFLAGS register to set the NT (Nested Task) bit to 1: this tells the CPU we're in a nested task
3) Set the busy flag in the TSS descriptor of the scheduler task we're about to switch to
4) Make the scheduler task to be the current task's parent (the exception handler), by updating the current task's TSS previous task link with the TSS selector of the scheduler task. Note that the current task is the exception handler.
5) Execute the iret instruction to switch to the scheduler task
This works once, the CPU switched to the scheduler which then schedules the next task.
It's all well, until a new exception is triggered (for instance by a task executing a privileged instruction). When a new task triggers an exception, and the code above is executed, the CPU generates a double fault!
Does anyone have an idea what I do wrong above?
- Why does executing iret in a task to go back to the scheduler triggers a general protection fault?
- Why is the way I "kill" a task doesn't seem to work?
From my understanding of how the CPU works, what I do should work, but clearly there is something I don't understand.
Thanks a lot for your help
I wrote a preemptive multitask kernel that fully uses Intel (i386) hardware tasks.
Exceptions and hardware interrupts are handled through task gates. Consequently when an exception or a hardware interrupt is triggered, the CPU automatically switches to the corresponding task (handler).
The scheduler is a task as well. It is triggered by the hardware timer (PIT) at a frequency of 100Hz.
My kernel is very simple: the number of tasks is fixed and the scheduler simply switches to the next task to be executed in a round-robin fashion.
To execute the next task, here is what I do:
1) Update the EFLAGS register to set the NT (Nested Task) bit to 1: this tells the CPU we're in a nested task
2) Set the busy flag in the TSS descriptor of the next task we're about to switch to
3) Make the next task to be the current task's parent, by updating the current task's TSS previous task link with the TSS selector of the next task. Note that the current task is the scheduler itself.
4) Set the scheduler task to be the parent of the next task to schedule so that executing iret in the next task will get back to the scheduler
5) Execute the iret instruction to switch to the next task
This works well, each task gets scheduled properly in a round-robin fashion as expected.
However, this implementation suffers from two issues:
- Executing iret in a task to go back to the scheduler triggers a general protection fault (GPF)
- Once in the exception handler which is also a task, I'm not able to go back to the scheduler more than once without triggering a double fault exception.
In the exception handler, I would like to "kill" the offending task that triggered the exception (such as GPF). The idea is to mark this task as non-schedulable anymore and switch to the scheduler.
These are the steps I wrote to do this:
1) Mark the task as not schedulable anymore (it's a boolean I keep in the meta-data of each task); the scheduler will simply not schedule this task anymore.
2) Update the EFLAGS register to set the NT (Nested Task) bit to 1: this tells the CPU we're in a nested task
3) Set the busy flag in the TSS descriptor of the scheduler task we're about to switch to
4) Make the scheduler task to be the current task's parent (the exception handler), by updating the current task's TSS previous task link with the TSS selector of the scheduler task. Note that the current task is the exception handler.
5) Execute the iret instruction to switch to the scheduler task
This works once, the CPU switched to the scheduler which then schedules the next task.
It's all well, until a new exception is triggered (for instance by a task executing a privileged instruction). When a new task triggers an exception, and the code above is executed, the CPU generates a double fault!
Does anyone have an idea what I do wrong above?
- Why does executing iret in a task to go back to the scheduler triggers a general protection fault?
- Why is the way I "kill" a task doesn't seem to work?
From my understanding of how the CPU works, what I do should work, but clearly there is something I don't understand.
Thanks a lot for your help