How to preserve Ring 0 Thread register state

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
b1gb4dw0lf
Posts: 2
Joined: Thu Oct 24, 2019 3:17 am

How to preserve Ring 0 Thread register state

Post by b1gb4dw0lf »

Hello,

I have implemented a multi-threaded 64-bit kernel based on JOS framework. I want to add swap operation as a kernel thread.
I am currently doing it blocking on disk operations. I'd like to do this operation non-blocking, so I want to schedule kernel threads before they finish or while waiting on something else.

Since a kernel thread is operating in ring 0, I can't use the syscall (and don't want to). I want to do a syscall like call to save kthread's frame, and on reschedule to continue from where it left.

Basically, I want to load TSS values whenever I call a kernel function from the kernel thread to preserve its register state.
Do you have any suggestions on how to do this? Do I have to implement a whole new handling for kernel threads? Or is there a way to re-use existing INT handling? Or any other way with direct calling?

Thanks!
nullplan
Member
Member
Posts: 1796
Joined: Wed Aug 30, 2017 8:24 am

Re: How to preserve Ring 0 Thread register state

Post by nullplan »

You're not making a lot of sense. You reference the TSS, which makes me think you want to use hardware task switching, but then you also say your kernel is 64-bit. In Long Mode, there is no hardware task switching. Hardware task switching is highly discouraged, given that it is the path less travelled, and therefore badly tested.

If you already are in ring 0, what is stopping you from saving your task state yourself and switching to another task? I'll try to sketch my task system for you.

In my OS, all threads have multiple possible states. Threads that are sleeping interruptibly must belong to some sleep queue and wait for an event to happen. They cannot be scheduled, but they can be awoken with a signal. Threads that are sleeping uninterruptibly cannot even be awoken with a signal, not even SIGKILL. Runnable threads are part of a queue in the scheduler.

If a thread wants to be scheduled out, it calls a function "schedule()", which takes no arguments and returns nothing. This function will pick the next task to run (if all else fails, it will take the idle thread) and call a function "switch_task()", which takes the current thread's and the next thread's task info structure as arguments. This will save a few values in the former, then call "arch_switch_task()", then restore those values again. "arch_switch_task()" does the same thing on an arch-specific level, and calls "__arch_switch_task()". Inventive naming, I know. That function, finally, is an assembler function. It saves all non-volatile registers on stack, then saves RSP into the current thread's task info struct (as "kernel stack pointer"), before loading RSP from the other thread's task info struct and restoring those registers, and setting the "current thread" variable, and returning. From the compiler's point of view, and external function is called, and it returns later on (maybe) with those registers intact that were meant to stay that way. And all the other ones are lost, anyway.

This means, a suspended task is always in the middle of that function. A new thread gets a synthetic stack made up in there and "returns" to its own start function.
Carpe diem!
b1gb4dw0lf
Posts: 2
Joined: Thu Oct 24, 2019 3:17 am

Re: How to preserve Ring 0 Thread register state

Post by b1gb4dw0lf »

Thanks for the reply!

By TSS, I meant the usage as here for software multitasking: https://wiki.osdev.org/Task_State_Segment

In my kernel, tasks are handled this way:
- A task is created and added into runq
--- task struct contains a frame
- It is scheduled by calling sched_yield
--- In this function a task's page table is loaded
--- Task's register state is popped by IRETQ
--- For now I'll not mention the other small things as status and run count and etc...
- On exit, a task interrupts and goes into kernel where kernel state is popped and kills the task by freeing and doing its thing.

When a userland task does a syscall, it can switch to kernel data segment, thus it can preserve its register state.
In my case, I can't find a way to save the register state because I am already running in ring 0, so I am already using kernel data segment with a private stack for the thread.
I can get the cpu core's stack by using `swapgs` instruction, otherwise I can't find a way to accomplish this. However, I couldn't go pass general protection faults when I try to swap back.
I guess basically I am lost on how to save a ring 0 task's register state and where to. I hope it is more clear now.

My use case is simply this:

Code: Select all

void kthread_swap() {
    page = get_page();
    swap_disk_write();
    if disk_idle:          // If it is not done
        sched_yield(); // So it does not block
    else:
        continue;
}
nullplan
Member
Member
Posts: 1796
Joined: Wed Aug 30, 2017 8:24 am

Re: How to preserve Ring 0 Thread register state

Post by nullplan »

IRETQ only restores flags, RSP, RIP, CS, and SS. All other registers you have to save in a different way. I separate these two things out: My "__arch_switch_task()" routine restores the non-volatile registers, and my "__arch_return_to_user()" routine restores the volatile registers and does IRETQ. A kernel task never needs the volatile registers saved, and never needs the IRETQ, and also doesn't need CR3 reloaded, since a kernel task only accesses kernel memory, and that is mapped the same way in all address spaces.

A new user task gets a kernel stack made up of the non-volatile registers, then the address of __arch_return_to_user(), then the volatile registers, then the IRETQ frame. __arch_switch_task() will "return" to __arch_return_to_user() which will "interrupt return" to userspace.

Oh, and if a task should time out (pretty rare occurrence), the interrupt handler will set a flag in the task information, which will cause __arch_return_to_user() to call schedule().
Carpe diem!
Post Reply