Page 1 of 1

Is this Context Switch correct?

Posted: Mon Sep 15, 2014 4:47 pm
by KemyLand
Hi!

I've just baked my ContextSwitch asm routine for x86. I'll give you some info before the code:

I have a TaskDigest struct which holds ONLY the necessary info for task switching and other critical operations. It is generated from both a Process* and a Thread*. I use it so to reduce the counting of stack elements when writing asm :oops: Here's its code:

Code: Select all

typedef struct
{
    RegisterSet *regs; // Dynamically include the .h that defines it as per arch
    MemorySpaceDescriptor *space; // Same as above
} TaskDigest;
Okay. Inside my PMSwitchContext(Bool autoselect, PID pid, Size tid):

Code: Select all

	/* If autoselect == true, ignore pid/tid and continue the chain. If false, switch to pid/tid */
	Process *newproc = autoselect ? _runningProcess->next : PMGetProcess(pid);
	Thread *newthread = PMGetThread(newproc, tid);
	
	TaskDigest olddigest = { .regs = &(PMCurrentThread->regs), .mmdesc = &(PMCurrentProc->memorySpace) };
	TaskDigest newdigest = { .regs = &(newthread->regs), .mmdesc = &(newproc->memorySpace) };
	
	/*
        Atomic_NE() is a macro that acquires a specific system resource, but doesn't releases it. It is completely abstract.
        It could be a spinlock, a interrupt-disable, or a semaphore. In this case, ATOM_MASTER is a interrupt-disable which
        blocks all the entire system, because this operation is not for the faint of heart.
        ContextSwitch() is an ArchImplement-ish function which (obviously) changes
	to another context. ContextSwitch() is responsible for disabling the atomicity as
	its final act before the actual far jump. If it fails, a Kernel Panic will ocurr. (An
	`simple exception message here isn't enough to know how to repair the damage made).
	*/
	
	Atomic_NE(ATOM_MASTER,
		ContextSwitch(&olddigest, &newdigest);
	);
My implementation of ContextSwitch() for x86 is:

Code: Select all

.code32
.text
.global ContextSwitch
ContextSwitch:

# push everything before using anything, to freeze the data
pushad
pushfd

# Both these are of type TaskDigest, as defined in /proc/task.c
# The %ebx one MUST be PMCurrent!
movl 44(%esp), %ebx # %ebx is now address to save old context.
movl 48(%esp), %ecx # %ecx is now address to get new context

# now dump the old registers to %ebx
movl 40(%esp), %eax # Return address
movl %eax, (%ebx) # Save return address
movl 4(%esp), %eax # Old %eflags
movl %eax, 4(%ebx) # Save eflags
movl 36(%esp), %eax # %EDI
movl %eax, 36(%ebx)
movl 32(%esp), %eax # %ESI
movl %eax, 32(%ebx)
movl 28(%esp), %eax # %EBP
movl %eax, 28(%ebx)
movl 24(%esp), %eax # %ESP
movl %eax, 24(%ebx)
movl 20(%esp), %eax # %EBX
movl %eax, 20(%ebx)
movl 16(%esp), %eax # %EDX
movl %eax, 16(%ebx)
movl 12(%esp), %eax # %ECX
movl %eax, 12(%ebx)
movl 8(%esp), %eax # Finally, %EAX!!!
movl %eax, 8(%ebx)

# now, to the black magic for paging >:D
# first, save %CR3
movl %cr3, %eax
movl %eax, 40(%ebx)

# load the new one
movl 40(%ecx), %eax
movl %eax, %cr3

# note, if the kernel wasn't mapped on the new %cr3
# page fault now, then double, and finally die with triple

# Load the new registers (Again!!???)
movl 4(%ecx), %eax
push %eax
popfd # Just done some black magic for eflags
movl 8(%ecx), %eax
movl 12(%ecx), %ecx
movl 16(%ecx), %edx
movl 20(%ecx), %ebx
# %ESP must be here, but it would crash everything, so it must wait
movl 28(%ecx), %ebp
movl 32(%ecx), %esi
movl 36(%ecx), %edi

# Now the final (and most obscure) black magic. Unique for the x86
# First, make a false interrupt stack

push %eax # We'll use it for moving around
movl (%ecx), %eax
push %eax # Pushing EIP
movl 44(%ecx), %eax
push %eax # Pushing CS
movl 4(%ecx), %eax
push %eax # Pushing EFLAGS
movl 24(%ecx), %eax
push %eax # Remember ESP? It's here!
movl 48(%ecx), %eax
push %eax # Pushing SS

# Don't forget to retrieve %EAX! Else it would be equal to SS, jeje
movl 24(%esp), %eax

# Ladies and Gentlemen, awesome with an IRET, without a previous INT!!!
iret
I know you must compare some addressing with the TaskDigest implementation, but I need to know if this code is correct. Thanks!

Re: Is this Context Switch correct?

Posted: Mon Sep 15, 2014 5:16 pm
by SpyderTL
I just wrote this exact same code, albeit in a different language. (xml :shock:)

Your code looks correct, to me. But, just out of curiosity, why not just push all registers, flags and cr3, swap stack pointers to the new thread, and then pull cr3, flags, and registers? I'm sure it's faster (fewer clock cycles), but is there any reason to read/write registers one at a time?

Do you just want to force the registers into memory in a particular order?

Re: Is this Context Switch correct?

Posted: Mon Sep 15, 2014 5:24 pm
by KemyLand
SpyderTL wrote:I just wrote this exact same code, albeit in a different language. (xml :shock:)

Your code looks correct, to me. But, just out of curiosity, why not just push all registers, flags and cr3, swap stack pointers to the new thread, and then pull cr3, flags, and registers? I'm sure it's faster (fewer clock cycles), but is there any reason to read/write registers one at a time?

Do you just want to force the registers into memory in a particular order?
Exactly. I must force the registers in specific locations.
Note: How do you did that in XML? :shock: :shock: :shock:

Re: Is this Context Switch correct?

Posted: Mon Sep 15, 2014 6:54 pm
by SpyderTL
Short answer: I decided to write my own "language" when I realized that there are no good ASM IDEs out there. I thought for sure that someone would have made one by now.

Long answer: http://forum.osdev.org/viewtopic.php?f=15&t=27971

Re: Is this Context Switch correct?

Posted: Mon Sep 15, 2014 7:40 pm
by Brendan
Hi,

KemyLand wrote:I know you must compare some addressing with the TaskDigest implementation, but I need to know if this code is correct.
I didn't notice any bugs, but...


If you're saving registers in a struct, then you shouldn't need to save them on the stack first. For example, "movl 36(%esp), %eax # %EDI" and "movl %eax, 36(%ebx)" could be replaced with "movl %edi, 36(%ebx)" (this applies to EAX, ECX, EDX, ESI, EDI and EBP). This means that you'd need to wait until after saving ECX before doing "movl 48(%esp), %ecx # %ecx is now address to get new context". With those changes there's no need to use "PUSHAD" and you could just push EBX alone.

For C calling conventions, various registes are "caller preserved" and the called function can trash them without caring. For the "80x86 cdecl" calling convention, this means you don't need to save or load EAX, ECX or EDX. Also, by using one of these registers (instead of EBX) for the address to save the old context you could delete the "PUSHAD" (without pushing EBX). In the same way, your code needn't preserve the state of most flags (e.g. any of the "arithmetic" flags) in EFLAGS (more on this a little later).

Because you will always be switching from kernel code to kernel code you should not need to save or load SS (assuming a "flat paging" OS). In addition, you do not need to care if the caller was using virtual8086 or something, and the "interrupt enable" flag should always be clear; which means that (combined with the earlier "caller doesn't care about most flags in EFLAGS") you should not need to save or load EFLAGS.

If you aren't saving or loading SS or EFLAGS; then you don't need a slow IRET and can and should just use a normal RET.

A task's CR3 should never change and therefore you should never need to save it (and you only need to load it). If CR3 is the same for the old task and the new task (e.g. the tasks are different threads in the same process) then you should avoid reloading CR3 (and avoid flushing TLBs when it's not necessary).

If you combine all of the above; the context switch would become:
  • get "address to save old context" from the stack into EAX, ECX or EDX
  • save EBX, ESI, EDI and EBP in your structure
  • save ESP
  • get "address to load new context" from the stack into EAX, ECX or EDX
  • load ESP
  • check if CR3 needs to change; and load CR3 from the new context if it's necessary
  • load EBX, ESI, EDI and EBP from your structure
  • return normally
Also note that there are several things that you don't care about yet, but may care about later. These include keeping track of how much time each task used, thread specific or thread local storage, saving and loading FPU/MMX/SSE/AVX state, and possibly saving/loading various other things (e.g. debug registers and performance monitoring counters).


Cheers,

Brendan

Re: Is this Context Switch correct?

Posted: Mon Sep 15, 2014 9:35 pm
by KemyLand
But I couldn't understand why not to save/load EFLAGS/SS, can you explain it again?
Also, this code is intended to explicitly switch from Kernel to Anywhere.
The ISR should have other switcher BTW...
And the IntEnable flag must be enabled when returning, so the code can be switched. The only case where this is false is when doing atomic operations.
As Kernel->User is possible, I continue to use iret.

BTW, I use PID/TID format to switch even between kernel code because in my kernel, PID 0 is, well, all kernel space. Thus PID 0:TID 0 == Core Kernel.

As userspace could happen to be loaded, I don't see your optimizations valid here, but well thought :wink:

Re: Is this Context Switch correct?

Posted: Mon Sep 15, 2014 10:56 pm
by Brendan
Hi,

First, there is a bug that I didn't notice before. If there is no privilege level change, IRET doesn't load SS or ESP from the stack.
KemyLand wrote:But I couldn't understand why not to save/load EFLAGS/SS, can you explain it again?
You could have code to find the first 10000 prime numbers in your context switching function; but that would be additional overhead that's both unnecessary and pointless.

In the same way, you could have code to save and load EFLAGS and/or SS in your context switching function if you really want to.
KemyLand wrote:Also, this code is intended to explicitly switch from Kernel to Anywhere.
Why? For this style of kernel; there is never any situation where you need to return directly to CPL=3; and privilege level switching should be considered a completely separate concern that has nothing to do with task switching.

Think of it like this:
  • CPL =3 code is running
  • Something causes a switch to CPL=0 (e.g. SYSCALL/SYSENTER, call gate, interrupt, whatever)
  • The kernel does some stuff (starts handling the system call or whatever it was)
  • Kernel may or may not do a task switch (from "task running kernel code" to "different task that returns to kernel code"), but that changes nothing and can mostly be ignored
  • The kernel does some more stuff (finishes handling the system call or whatever it was)
  • The kernel returns to CPL=3 using whatever method is suitable (e.g. SYSRET for the SYSCALL handler, RETF for a call gate handler, etc)
  • CPL =3 code is running
To put it another way; it's impossible for CPL=3 code to call your "ContextSwitch()" function directly; therefore the old task's "return EIP" that's saved during a task switch will always be for kernel code; therefore the new task's "return EIP" that's loaded during a task switch will always be for kernel code.

The only possible exception to this is when a new task is first created. However; when creating a new task you want to do the minimum necessary; then switch to the new task (later on), and then finish setting up the rest (later on) before returning to CPL=3. This means that if a high priority task creates a low priority task you don't waste time messing with less important things. It also means that (when creating a process) you are able to access the process' address space.


Cheers,

Brendan