What is the problem with this code? Tasks and scheduling

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Danyy
Posts: 16
Joined: Mon Sep 21, 2020 11:22 pm
Libera.chat IRC: Danyy

What is the problem with this code? Tasks and scheduling

Post by Danyy »

This is not a very good post and I am aware of that but I couldn't think why there would be a problem with this and how I may improve this. I also have some questions about tasking in general. Thank you in advance.

So I recently started writing a scheduler for basic tasks. I haven't even switched to ring 3. I am just trying to run some basic tasks in kernel mode. Now here s my task structure:

Code: Select all

typedef struct task
{
	uint64_t PID;
	uint64_t entry;
	uint64_t privilege;
	uint64_t taskState;
	char * name;
	char * description;
	task_registers_t state;
	uint64_t kernel_stack;
	uint64_t user_stack;
	heap_t heap;
	pml4_t * pml4;
	struct task * next;
	
} __attribute__((packed)) task_t;

enum taskStates{

	NOTRUN = 0,
	RUNNING = 1,
	WAITING = 2,
	BLOCKED = 3,
	DELETE = 4
};


I have a lot of fields but most are useless at this point. I will probably use all at one point so I added a bunch. Now I define 2 tasks. One is for the kernel and the other is a random task.

Code: Select all


int task1main()       // This is the random task
{
	
	printk("Hello world");
	
	exit(0);      // This basically marks the task as DELETE (meaning we should delete it at one point) and just halts
}

void initTasking()
{
	task_t *newTask = kalloc(&currentHeap, sizeof(task_t));
	
	task_t *kernelTask = kalloc(&currentHeap, sizeof(task_t));
	
	//printk("%x %x", newTask, kernelTask);
	
	newTask->PID = 2;
	newTask->entry = &task1main;
	newTask->privilege = 0;
	newTask->name = "TestTask";
	newTask->taskState = NOTRUN;
	newTask->description = "Test";
	newTask->next = 0;
	
	kernelTask->PID = 0;
	kernelTask->entry = 0x10000;
	kernelTask->privilege = 0;
	kernelTask->name = "Kernel";
	kernelTask->taskState = RUNNING;
	kernelTask->description = "Kernel";
	kernelTask->next = newTask;
	
	currentTask = kernelTask;
	taskQueue = kernelTask;
	
	
	
	
	isTasking = 1;
	
}
Here's what I thought about initialization. I will do all scheduling in IRQ so I already have the state (registers) of the current running task. So I define the kernel task and set appropriate values. I set it as RUNNING as we are already in the kernel, meaning all the registers need to be saved and reloaded later. I set the other task as NOTRUN, meaning we never run it, meaning all of its state registers are 0 including RIP.

Here's the scheduler:

Code: Select all

void schedule(task_registers_t *currentState)
{
	if(isTasking == 1)
	{
		
		if( currentTask->taskState == RUNNING)
		{
			
			currentTask->taskState = WAITING;                            // make the current process waiting
			memcpy(&currentTask->state, currentState, sizeof(task_registers_t)); // save the state
			task_t* nextTask;
			if(currentTask->next != 0)
				nextTask = currentTask->next;
			else
				nextTask = taskQueue;
			
			if(nextTask->taskState == NOTRUN)               // if it has never been run before
			{	
				nextTask->taskState = RUNNING;              // lets run it
				nextTask->state.ss = currentTask->state.ss;
				nextTask->state.cs = currentTask->state.cs;
				nextTask->state.rflags = currentTask->state.rflags;
				nextTask->state.rsp = currentTask->state.rsp;
				nextTask->state.rbp = currentTask->state.rbp;
				nextTask->state.rip = nextTask->entry;     // get the next tasks states rip to be its entry
				memcpy(currentState, &nextTask->state, sizeof(task_registers_t));  // write it
				
				//printk(" A process wants to be run : %s", nextTask->name);
				
				
				currentTask = nextTask;
				
				return;
			}
			else if(nextTask->taskState == WAITING)         // if it is waiting to be run
			{ 
				
				nextTask->taskState = RUNNING;              // lets run it
				//printk(" A process wants to be running : %s %x %x         ", nextTask->name, nextTask->state->rip, currentTask->state->rip);
				
				memcpy(currentState, &nextTask->state, sizeof(task_registers_t));  // copy its state and put it into frame
				//printRegisters(nextTask->state);
				currentTask = nextTask;
				return;
			}
		}
	}
	else
		return;

}
We run this in IRQ 0 handler like this:

Code: Select all

void irq0_handler(interrupt_frame_t* frame)
{
	EnvironmentTick++;
	//printRegisters((task_registers_t*)frame);
	
	schedule((task_registers_t*)frame);
	
	default_irq_handler(frame);
}

Here's what this code does: If the current task is running (which we assume it does) then first mark it WAITING and save the currents registers to the task's state field. Then let's check the next task. If it is 0, we are at the end of the linked list, so lets go back to the start of the queue. If it is not 0, then fetch that task. If the next task is NOTRUN, set its parameters to the parameters of the current task. This is required since if we don't supply a cs, ss, rsp and etc we get a GPF. Then set the registers to the next task's registers and return. The IRQ will pick up the values from the frame and put them to the locations necessary so no problem after this. If the task has been run atleast once and is WAITING then just load its state into frame. Finally set the currentTask as the nextTask so that we can switch to another task the next time.

Problems about this code:

1. If I don't put while(1) at the end of the task I am running I get invalid OPCode error, meaning the code jumps to random places.

2. If I, instead of printing one line, print a single letter in a loop and add a delay, I get alternating letters of output (which is expected), like for example if I say
while(1) {printk("b"); for(int i = 0; i<1000000; i++);} to kernel
and
while(1) {printk("c"); for(int i = 0; i<1000000; i++);} to the other task

I get alternating letters like bcbcbbcbcbcbbcbcbcbcbcbcbcbcbcbcb
which is expected considering the switch is successful. However after one point I suddenly get a page fault in RIP 0x7C which is interesting cause my code shouldn't be jumping to random places.

Questions about this code:

1. Is this a good way to go about scheduling?

2. Is there obvious bugs in the code that you see and I am too blind/ignorant to notice?

3. How am I "supposed" to exit the task and how am I supposed to kill it?

General questions about tasking:

1. How do I apply these to ring 3?

2. What is kernel stack per task? and why do we need that?

As far as I understood it is for storing "things" for a task while they are executing kernel code i.e. an interrupt or a system call.

I would be more than happy to post more code if you require it.

Thank you once more.
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: What is the problem with this code? Tasks and scheduling

Post by nullplan »

Danyy wrote:1. Is this a good way to go about scheduling?
No. There are several problems with this design. The way you are scheduling makes it impossible to schedule a new task from a context that isn't an interrupt. So how will you implement blocking system calls? You cannot overwrite the current task state then, you still need it when the time comes to return. You have tied the scheduler hard to interrupts, when that was really not necessary.

There are also technical problems with the implementation. So you have a single list of tasks. OK, that's great. But what if the next task is not currently runnable? You need to search for a task to execute. What if no task is runnable? I avoid that by having an idle task that is always runnable.

Then there are the stack problems, which are likely the source of your issues. I shall talk about them a little further down.
Danyy wrote:3. How am I "supposed" to exit the task and how am I supposed to kill it?
I would just mark the current task as "dead" and call schedule() in an infinite loop. schedule() will pass over dead tasks and the task allocator will re-use dead tasks. Alternatively, you could have the idle task scan the task list for dead tasks and remove them from the list.
Danyy wrote:1. How do I apply these to ring 3?
If you view user space tasks as kernel space tasks that will just return to user space at the highest level of kernel stack, you notice that user space and scheduling are independent issues.
Danyy wrote:2. What is kernel stack per task? and why do we need that?
You need that because otherwise you get those weird problems you are currently having. Your tasks share a stack. This means that while one task is executing and putting things on stack, it can get interrupted at any time and let the other task resume. Then the other task will write into the same memory area, and they will just interfere with each other. Now, AMD64 uses registers for lots of things, so maybe you can get away with some stack corruption. But at some point someone is going to need to return from a function where the return pointer was overwritten. You assert that your tasks don't jump to random places, but the return instruction is an indirect jump, and if the return pointer was overwritten, it will jump to random places.

That's why you need to allocate a kernel stack for each task. I typically just allocate four pages of virtual memory, of which I set the first one to "no access" and the other three to read/write. At the very end of this memory area, I put a task descriptor. The rest is kernel stack. The task switcher will, in the end, just push all non-volatile registers, save rsp into the task structure, load rsp with the one from the new task structure, pop all registers, and return. Therefore, for a new task, I will put some dummy values on the kernel stack, then the address of a trampoline function, then whatever the trampoline needs. Basically, I create a new task as if it was mid-way through a scheduling call. That also means my task switcher does not have to care about task state. The initialization function sets up everything so it works when the task is scheduled in.
Carpe diem!
Danyy
Posts: 16
Joined: Mon Sep 21, 2020 11:22 pm
Libera.chat IRC: Danyy

Re: What is the problem with this code? Tasks and scheduling

Post by Danyy »

nullplan wrote:
Danyy wrote:1. Is this a good way to go about scheduling?
No. There are several problems with this design. The way you are scheduling makes it impossible to schedule a new task from a context that isn't an interrupt. So how will you implement blocking system calls? You cannot overwrite the current task state then, you still need it when the time comes to return. You have tied the scheduler hard to interrupts, when that was really not necessary.

There are also technical problems with the implementation. So you have a single list of tasks. OK, that's great. But what if the next task is not currently runnable? You need to search for a task to execute. What if no task is runnable? I avoid that by having an idle task that is always runnable.

Then there are the stack problems, which are likely the source of your issues. I shall talk about them a little further down.
Danyy wrote:3. How am I "supposed" to exit the task and how am I supposed to kill it?
I would just mark the current task as "dead" and call schedule() in an infinite loop. schedule() will pass over dead tasks and the task allocator will re-use dead tasks. Alternatively, you could have the idle task scan the task list for dead tasks and remove them from the list.
Danyy wrote:1. How do I apply these to ring 3?
If you view user space tasks as kernel space tasks that will just return to user space at the highest level of kernel stack, you notice that user space and scheduling are independent issues.
Danyy wrote:2. What is kernel stack per task? and why do we need that?
You need that because otherwise you get those weird problems you are currently having. Your tasks share a stack. This means that while one task is executing and putting things on stack, it can get interrupted at any time and let the other task resume. Then the other task will write into the same memory area, and they will just interfere with each other. Now, AMD64 uses registers for lots of things, so maybe you can get away with some stack corruption. But at some point someone is going to need to return from a function where the return pointer was overwritten. You assert that your tasks don't jump to random places, but the return instruction is an indirect jump, and if the return pointer was overwritten, it will jump to random places.

That's why you need to allocate a kernel stack for each task. I typically just allocate four pages of virtual memory, of which I set the first one to "no access" and the other three to read/write. At the very end of this memory area, I put a task descriptor. The rest is kernel stack. The task switcher will, in the end, just push all non-volatile registers, save rsp into the task structure, load rsp with the one from the new task structure, pop all registers, and return. Therefore, for a new task, I will put some dummy values on the kernel stack, then the address of a trampoline function, then whatever the trampoline needs. Basically, I create a new task as if it was mid-way through a scheduling call. That also means my task switcher does not have to care about task state. The initialization function sets up everything so it works when the task is scheduled in.
So firstly I should get rid of the schedule in IRQ? The reason I put it there was it gave me access to the current state of the running task. I thought about pushing all registers to the stack and retrieving them from there but I couldn't find a way to push rip since pushing rip doesn't really make sense. How can I store the rip?

Also will I remove the schedule() entirely from the IRQ? If so, where will I do the scheduling?
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: What is the problem with this code? Tasks and scheduling

Post by nullplan »

Danyy wrote:So firstly I should get rid of the schedule in IRQ? The reason I put it there was it gave me access to the current state of the running task. I thought about pushing all registers to the stack and retrieving them from there but I couldn't find a way to push rip since pushing rip doesn't really make sense. How can I store the rip?
You push RIP every time you call a function. Here's how I do it: My task switch primitive is this:

Code: Select all

/* void arch_lowlevel_task_switch(uint64_t *this_rsp, uint64_t next_rsp); */
.global arch_lowlevel_task_switch
.type arch_lowlevel_task_switch, @function
arch_lowlevel_task_switch:
  pushq %r15
  pushq %r14
  pushq %r13
  pushq %r12
  pushq %rbp
  pushq %rbx
  movq %rsp, (%rdi)
  movq %rsi, %rsp
  popq %rbx
  popq %rbp
  popq %r12
  popq %r13
  popq %r14
  popq %r15
  retq
.size arch_lowlevel_task_switch, .-arch_lowlevel_task_switch
My task structure contains a member "rsp". The higher level task switcher just calls this function, passing a pointer to the current task's rsp member and the next tasks rsp. Once the current task gets scheduled in again, the non-volatile registers are restored and everything is alright. Importantly, this can be called from any place and can switch to any place. As a matter of policy, I never wait in interrupt handlers, but I could with this setup.

Creating a new task therefore means filling the stack with a first image that the above task switcher can use to start the task. So for a kernel task:

Code: Select all

extern const char startkthread[];
struct init_kthread {
  void *arg;
  void (*func)(void*);
  uint64_t rip;
  uint64_t kregs[6];
};
int spawn_kernel_thread(void (*func)(void*), void* arg) {
  struct task *tsk = get_task_struct();
  if (!tsk)
    return -1;
  struct init_kthread *kt = (void*)((char*)tsk - sizeof (struct init_kthread));
  kt->rip = (uint64_t)startkthread;
  kt->func = func;
  kt->arg = arg;
  tsk->rsp = (uint64_t)kt;
  return 0;
}
Note: I allocate the kernel stack for a task below the task descriptor.

Then the only thing startkthread needs to do is set the "current task" variable, put "func" and "arg" into the right registers, do an indirect call, then jump to "exit_thread". Source code is left as an exercise to the reader. As is the version of that code that spawns a userspace thread.

Now you can already do cooperative multi-tasking and switch when the tasks yield. In order to make it pre-emptive, the only thing you need in addition is a timer interrupt, yes, but only yield from the handler after you acknowledge the interrupt. The way I do it is: The task struct contains a field for general information flags. If the flag for "needs scheduling" is set when returning to userspace, I don't return return to userspace, but run the scheduler instead. And then return to userspace afterwards (so after the task is scheduled in again). Then the timer interrupt handler merely sets that flag. As does any other interrupt that causes a higher priority task to be unblocked. This allows me to handle interrupts normally and not intermingle the interrupt controller code with driver code or kernel logic.
Carpe diem!
Danyy
Posts: 16
Joined: Mon Sep 21, 2020 11:22 pm
Libera.chat IRC: Danyy

Re: What is the problem with this code? Tasks and scheduling

Post by Danyy »

nullplan wrote:
Danyy wrote:So firstly I should get rid of the schedule in IRQ? The reason I put it there was it gave me access to the current state of the running task. I thought about pushing all registers to the stack and retrieving them from there but I couldn't find a way to push rip since pushing rip doesn't really make sense. How can I store the rip?
You push RIP every time you call a function. Here's how I do it: My task switch primitive is this:

Code: Select all

/* void arch_lowlevel_task_switch(uint64_t *this_rsp, uint64_t next_rsp); */
.global arch_lowlevel_task_switch
.type arch_lowlevel_task_switch, @function
arch_lowlevel_task_switch:
  pushq %r15
  pushq %r14
  pushq %r13
  pushq %r12
  pushq %rbp
  pushq %rbx
  movq %rsp, (%rdi)
  movq %rsi, %rsp
  popq %rbx
  popq %rbp
  popq %r12
  popq %r13
  popq %r14
  popq %r15
  retq
.size arch_lowlevel_task_switch, .-arch_lowlevel_task_switch
My task structure contains a member "rsp". The higher level task switcher just calls this function, passing a pointer to the current task's rsp member and the next tasks rsp. Once the current task gets scheduled in again, the non-volatile registers are restored and everything is alright. Importantly, this can be called from any place and can switch to any place. As a matter of policy, I never wait in interrupt handlers, but I could with this setup.

Creating a new task therefore means filling the stack with a first image that the above task switcher can use to start the task. So for a kernel task:

Code: Select all

extern const char startkthread[];
struct init_kthread {
  void *arg;
  void (*func)(void*);
  uint64_t rip;
  uint64_t kregs[6];
};
int spawn_kernel_thread(void (*func)(void*), void* arg) {
  struct task *tsk = get_task_struct();
  if (!tsk)
    return -1;
  struct init_kthread *kt = (void*)((char*)tsk - sizeof (struct init_kthread));
  kt->rip = (uint64_t)startkthread;
  kt->func = func;
  kt->arg = arg;
  tsk->rsp = (uint64_t)kt;
  return 0;
}
Note: I allocate the kernel stack for a task below the task descriptor.

Then the only thing startkthread needs to do is set the "current task" variable, put "func" and "arg" into the right registers, do an indirect call, then jump to "exit_thread". Source code is left as an exercise to the reader. As is the version of that code that spawns a userspace thread.

Now you can already do cooperative multi-tasking and switch when the tasks yield. In order to make it pre-emptive, the only thing you need in addition is a timer interrupt, yes, but only yield from the handler after you acknowledge the interrupt. The way I do it is: The task struct contains a field for general information flags. If the flag for "needs scheduling" is set when returning to userspace, I don't return return to userspace, but run the scheduler instead. And then return to userspace afterwards (so after the task is scheduled in again). Then the timer interrupt handler merely sets that flag. As does any other interrupt that causes a higher priority task to be unblocked. This allows me to handle interrupts normally and not intermingle the interrupt controller code with driver code or kernel logic.
What I dont understand is how do you store the rip of the current task. I know that each time we call a function rip is passed onto the stack however you do not call low level task switch directly from the task itself. You call it from your schedule() or switch_task() function when you call it you get rip of that function. Not the task that called switch_task()

You set the kt->rip to be the start of the task or the entry point. I also don't understand how you stop the task and switch to the next one. Yes you do your low level switch but where do you do it if the only thing the IRQ does is set the task to be scheduled. How do you call your switch_task()

I also still don't understand why my way is defective. I understand that I shouldnt be relying solely on the IRQ to switch tasks, but I technically don't(?). I could switch a task anywhere if I have the state of the current task.
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: What is the problem with this code? Tasks and scheduling

Post by nullplan »

Danyy wrote:What I dont understand is how do you store the rip of the current task. I know that each time we call a function rip is passed onto the stack however you do not call low level task switch directly from the task itself. You call it from your schedule() or switch_task() function when you call it you get rip of that function. Not the task that called switch_task()
But if I have the RIP and RSP of the calling task, I have the context to restore back to higher level. Sure, the lowest RIP on the stack is just my arch_switch_task(), and that one only has the RIP of schedule() to return to. But then the stack frame of schedule() can return to wherever it was called from. And if, after some number of stack frames, it turns out that the task interrupted was a user space task, then the user space context will be at the very top of the stack, being restored before returning to user space.

You can think of the call stack as a sequence of stack frames. Technically a linked list, but its structure is not so set in stone. By saving RSP, I am not only saving the lowest stack frame, but the entire sequence of stack frames. And yes, the lowest few of them might always or often be very similar, but they can diverge further up.
Danyy wrote:You set the kt->rip to be the start of the task or the entry point. I also don't understand how you stop the task and switch to the next one. Yes you do your low level switch but where do you do it if the only thing the IRQ does is set the task to be scheduled. How do you call your switch_task()
Yes, that was the example for spawning. The actual switch is more like

Code: Select all

void schedule(void) {
  struct task *next_task = select_next_task(); /* left as an exercise for the reader. */
  arch_switch_task(next_task);
}
/* in arch-specific code: */
void arch_switch_task(struct task *next) {
  struct task *this = get_current_task();
  if (this->flags & TI_FPU)
    arch_fpusave(this->fpusave_area);
  if (this->flags & TI_DEBUG)
    arch_disable_debugregs();
  arch_lowlevel_task_switch(&this->rsp, next->rsp);
  /* when we get here, 'this' has been scheduled back in. */
  set_current_task(this);
  if (!(this->flags & TI_KERNEL)) {
    arch_load_cr3(this->cr3);
    arch_set_tss_rsp0(this);
  }
  if (this->flags & TI_DEBUG)
    arch_load_debugregs(this);
  if (this->flags & TI_FPU)
    arch_fpuload(this->fpusave_area);
}
schedule() can be called from anywhere. As I said, I handle the case of having to reschedule in the highest level interrupt handling code, namely the portion that returns to user space:

Code: Select all

return_to_userspace:
/* situation: Interrupt / system call has been handled. Handler has returned here. We have already determined that we will return to user space. r15 points to registers on stack, followed by IRET frame */
  movq %gs:CPU_CURRENT, %rax
  testq $TI_RESCHED, TASK_FLAGS(%rax)
  jz 1f
  call reenter_kernel
1:
  movq %r15, %rsp
  RESTORE_REGS
  iretq

Code: Select all

void reenter_kernel(struct regs *regs) {
  struct task *this = get_current_task();
  if (this->flags & TI_RESCHED) {
    this->flags &= ~TI_RESCHED;
    schedule();
  }
}
Danyy wrote:I also still don't understand why my way is defective.
All your tasks are sharing the same stack and end up clobbering each other's call frames. That's a pretty major defect. You also have no way to schedule anything that isn't in an interrupt. That also means you cannot schedule out of a system call without losing the context of the system call. For example, in my case, if a user process calls read() and the read is blocking, I can just call schedule() out of the loop that waits for data to arrive. You cannot.
Danyy wrote:I could switch a task anywhere if I have the state of the current task.
You have the state of the current task all the time, you can just call the task switcher and it can see all the registers it wants.
Carpe diem!
Danyy
Posts: 16
Joined: Mon Sep 21, 2020 11:22 pm
Libera.chat IRC: Danyy

Re: What is the problem with this code? Tasks and scheduling

Post by Danyy »

nullplan wrote:
Danyy wrote:What I dont understand is how do you store the rip of the current task. I know that each time we call a function rip is passed onto the stack however you do not call low level task switch directly from the task itself. You call it from your schedule() or switch_task() function when you call it you get rip of that function. Not the task that called switch_task()
But if I have the RIP and RSP of the calling task, I have the context to restore back to higher level. Sure, the lowest RIP on the stack is just my arch_switch_task(), and that one only has the RIP of schedule() to return to. But then the stack frame of schedule() can return to wherever it was called from. And if, after some number of stack frames, it turns out that the task interrupted was a user space task, then the user space context will be at the very top of the stack, being restored before returning to user space.

You can think of the call stack as a sequence of stack frames. Technically a linked list, but its structure is not so set in stone. By saving RSP, I am not only saving the lowest stack frame, but the entire sequence of stack frames. And yes, the lowest few of them might always or often be very similar, but they can diverge further up.
Danyy wrote:You set the kt->rip to be the start of the task or the entry point. I also don't understand how you stop the task and switch to the next one. Yes you do your low level switch but where do you do it if the only thing the IRQ does is set the task to be scheduled. How do you call your switch_task()
Yes, that was the example for spawning. The actual switch is more like

Code: Select all

void schedule(void) {
  struct task *next_task = select_next_task(); /* left as an exercise for the reader. */
  arch_switch_task(next_task);
}
/* in arch-specific code: */
void arch_switch_task(struct task *next) {
  struct task *this = get_current_task();
  if (this->flags & TI_FPU)
    arch_fpusave(this->fpusave_area);
  if (this->flags & TI_DEBUG)
    arch_disable_debugregs();
  arch_lowlevel_task_switch(&this->rsp, next->rsp);
  /* when we get here, 'this' has been scheduled back in. */
  set_current_task(this);
  if (!(this->flags & TI_KERNEL)) {
    arch_load_cr3(this->cr3);
    arch_set_tss_rsp0(this);
  }
  if (this->flags & TI_DEBUG)
    arch_load_debugregs(this);
  if (this->flags & TI_FPU)
    arch_fpuload(this->fpusave_area);
}
schedule() can be called from anywhere. As I said, I handle the case of having to reschedule in the highest level interrupt handling code, namely the portion that returns to user space:

Code: Select all

return_to_userspace:
/* situation: Interrupt / system call has been handled. Handler has returned here. We have already determined that we will return to user space. r15 points to registers on stack, followed by IRET frame */
  movq %gs:CPU_CURRENT, %rax
  testq $TI_RESCHED, TASK_FLAGS(%rax)
  jz 1f
  call reenter_kernel
1:
  movq %r15, %rsp
  RESTORE_REGS
  iretq

Code: Select all

void reenter_kernel(struct regs *regs) {
  struct task *this = get_current_task();
  if (this->flags & TI_RESCHED) {
    this->flags &= ~TI_RESCHED;
    schedule();
  }
}
Danyy wrote:I also still don't understand why my way is defective.
All your tasks are sharing the same stack and end up clobbering each other's call frames. That's a pretty major defect. You also have no way to schedule anything that isn't in an interrupt. That also means you cannot schedule out of a system call without losing the context of the system call. For example, in my case, if a user process calls read() and the read is blocking, I can just call schedule() out of the loop that waits for data to arrive. You cannot.
Danyy wrote:I could switch a task anywhere if I have the state of the current task.
You have the state of the current task all the time, you can just call the task switcher and it can see all the registers it wants.
Ok so let me see if I got everything correct (or if I am an idiot) and you ll get rid of me forever.

First we have a switch task function which basically does the following: when its called we know that the first thing on the stack is the caller's rip. We don't care who the caller is, we just care that they wanted to switch tasks (The caller in this case is the high level switch task or schedule() but it can be anything). They push their rip to the stack, we push the registers that we further want to save onto the stack we now have a stack that looks something like this:

rax rcx rbx rdx r8 r9 ... r15 rdi rsi rip

The first thing on the stack is the register that we pushed the last and the last thing is the rip that the caller pushed. We get the current rsp and store this stack frame into somewhere (in my case task_t.kernel_stack) that is on our task_t structure.

Then we get the next task's rsp as a parameter passed (it doesn't only have to be rsp but the whole task_t structure). By next task I mean the next runnable task on the task queue. We load the current stack from there which has the same structure as the stack frame above. We pop all registers in reverse order except for rip which is automatically popped with ret.

Now after this everything starts becoming fuzzy for me so bear with me here. Now after we return from the low level task switch we are in a high level task switch which caused the task switch to the task we are currently trying to switch to. This task switch also has a return which returns to its caller which is the schedule(). Of course schedule also has a caller which is the IRQ 0 but it really can be anything.

Since when the IRQ first happened we save the current tasks rip on the stack and called schedule which called high level task switch which called low level task switch, so every step we store the rip onto our task which we saved to the stack in our task_t structure. So we just return from the low level to the high level switch which returns to the schedule which returns to the task we want to switch to and everything works. When a new interrupt kicks in we rinse and repeat in reverse order. But it doesn't have to be an IRQ per se. We could just call schedule() from kernel and return to that location after we return to kernel in the task queue. So let me see if I can "draw" a diagram illustrating each step: We have two tasks which we want to switch.

Task1() -> IRQ0() -> returnFromInterrupt() -> schedule() -> highLevelSwitch() -> lowLevelSwitch()
Now we have rip for all the steps above in the current stack since we didn't pop or ret anything. We also saved the current stack and loaded a new stack of the new task that we switched to which also contains the same structure of rips:

ret lowLevelSwitch() -> ret highLevelSwitch -> ret schedule() -> ret returnFromInterrupt() -> Task2()

Now we are in the next task. The same thing happens when a new IRQ0 happens: (Considering that the next runnable task is Task1 once again)

Task2() -> IRQ0() -> ... -> lowLevelSwitch()

ret lowLevelSwitch -> ret ... -> Task1()

And rinse and repeat.

Also about the stacks, don't worry about those I am aware that for each application I should allocate a new kernel stack or these stack frames would overwrite each other or cause overflows. If I am correct up to this point please let me know. I am having a hard time sometimes but the system is beautiful. (If I understood it correctly)
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: What is the problem with this code? Tasks and scheduling

Post by nullplan »

Danyy wrote:First we have a switch task function which basically does the following: when its called we know that the first thing on the stack is the caller's rip. We don't care who the caller is, we just care that they wanted to switch tasks (The caller in this case is the high level switch task or schedule() but it can be anything). They push their rip to the stack, we push the registers that we further want to save onto the stack we now have a stack that looks something like this:

rax rcx rbx rdx r8 r9 ... r15 rdi rsi rip
Well, yes. Except that there is no reason to preserve RAX, RCX, RDX, RDI, RSI, R8, R9, R10, or R11. The place where you need to preserve those is on entry to the kernel (e.g. interrupt entry). The Lowlevel switch function is an ABI compliant function, and only needs to preserve the registers noted in the ABI as preserved.

The rest you wrote is essentially correct.
Carpe diem!
Post Reply