OSDev.org

Posted: **Fri Nov 25, 2011 6:33 pm**

First of all, thanks to everyone for reading and considering my question. I'm floored this actually compiled and executed the first time without any issues. Therefore, I have to have some something wrong!

I am building out the initialization of my kernel. GDT and IDT are initialized. I have created a heap and started paging (actually in reverse order with the heap created after paging is started). Then I create a null process which will end up being a kernel butler process to maintain some housekeeping (not sure what that will be yet, but it is ready just in case). I have then established the timer interrupt set to fire 50 times per second. All of this works great. Thanks to James Molloy's tutorial that has helped me to this point. http://www.jamesmolloy.co.uk/tutorial_html/

As a part of the timer interrupt handler, I have added a Reschedule procedure to perform a context switch. The interrupt handler saves all registers on the stack on the way in and pops them all back in place on the way back out. The interrupt handler does not use the kernel stack, but the process stack.

The issue I have is that it works. Well, it works with just a null process by itself anyway. What am I missing?

Code: Select all

#ifndef PROCESS_H
#define PROCESS_H

typedef enum {PRC_ERR = 0,
				PRC_CURR = 1,
				PRC_RDY  = 2,
				PRC_MSGW = 3,
				PRC_SEMW = 4,
				PRC_DLYW = 5,
				PRC_SUSP = 6} ProcessState;
				
typedef struct ProcessEntry {
	uint16 pid;
	ProcessState state;
	uint16 priority;
	uint32 sem;
	uint32 msg;
	uint8 hasMsg;
	uint32 esp;
	uint16 ss;
	struct ProcessEntry *next;
} ProcessEntry;

extern ProcessEntry *rdyHead;
extern ProcessEntry *rdyTail;
extern ProcessEntry *currProc;

extern ProcessEntry *NullProcess;

extern uint16 nextPID;
extern uint16 processCount;

void InitProcess(void);
void ReadyProcess(ProcessEntry *prc);
void Reschedule(void);

#endif

Code: Select all

#include "kernel.h"

ProcessEntry *currProc = 0;

void Reschedule(void)
{
	// Put the current process on the ready queue; if 
	// ready queue is empty, this process will be back
	// at the top of the queue.
	ReadyProcess(currProc);
	
	// Make sure the stack is static for the rest if
	// this function until the return
	
	// Save the current stack information into the 
	// process structure.  All other registers are 
	// saved on the stack already.
	asm volatile("mov %%ss,%0":"=r"(currProc->ss));
	asm volatile("mov %%esp,%0":"=r"(currProc->esp));
	
	// Get the new current process off the top of the 
	// ready queue.  It is possible that it is the 
	// process that just gave up the CPU.
	currProc = rdyHead;
	rdyHead = rdyHead->next;
	currProc->next = 0;
	
	// From the process structure, restore the stack
	// for the next process to get a slice of CPU.
	asm volatile("mov %0,%%ss"::"r"(currProc->ss));
	asm volatile("mov %0,%%esp"::"r"(currProc->esp));
	
	// The return of this function will return to the 
	// new current process.
}

Again, thank you for your assistance!

Posted: **Fri Nov 25, 2011 7:45 pm**

I am at an early stage of development myself, but I can see that there is a lot which needs to go in that your code doesn't seem to cover at the moment. Most obviously, all the machine registers need to be saved, because an application might get upset if it discovers that the contents of its registers have changed while it has been away. Also, in a protected mode operating system, each process runs in its own memory space, defined by page tables, so when a process switch is made, the page tables need to be changed by reloading the cr3 register. Then there are numerous things which you will need to keep track of, such as the memory and other resources which have been allocated to each application, so that it can be released when the program terminates.

Also there are more sophisticated scheduling alrorithms than round robin. Typically there are priority queues, and there will be at least one queue of applications which have been put on hold, whilst something like a disk access completes. (Such thinhgs take milliseconds to complete, and a computers can get an awful lot of work done in that time.)

Posted: **Sat Nov 26, 2011 5:53 am**

Hi,

eryjus wrote:What am I missing?

When your "Reschedule()" function gets more complicated, there will be function parameters and local variables on the stack. When you switch stacks any local variable or function parameter you relied on may become trashed (or not, if the compiler felt like storing it in a register).

Also; when you create a new thread you will need to create a new kernel stack for that thread, and the new thread's kernel stack will need to contain all of the things that the "Reschedule()" function will expect to pop off the stack the first time it switches to the new thread. At the moment it might be easy enough to guess what the new thread's kernel stack needs to contain; but as things get more complicated it will become harder ("guess the magic stack layout") and more fragile - minor changes in the "Reschedule()" function (or even just changing compiler version or changing from "-O1" to "-O2" or something) will result in a slightly different stack layout, and an OS that crashes when threads are created.

For features (things that your code currently lacks, that can/will complicate "Reschedule()" eventually); there's switching address spaces (CR3), support for CPL=3 (patching the "esp0" field in the TSS during task switches), saving/loading MMX/FPU/SSE/AVX state, support for multiple CPUs, tracking the amount of CPU time each thread has used (and/or each process), and possibly other things (changing the IO-permission bitmap in the TSS, per-thread debug registers, per-thread performance monitoring, emulated/per-thread TSC, etc).

Also, you'll find that "Reschedule()" is called from more than just the timer IRQ - for example, if a thread blocks for any reason (e.g. "sleep()", waiting for file IO, etc) then the kernel code that handles these things will call "Reschedule()" themselves. Once you upgrade to a better scheduling algorithm you'll probably also have cases where a high priority thread unblocks and you want it to preempt the currently running (lower priority) thread; and you'll want to switch directly to the high priority thread (without the extra overhead of deciding which thread to switch to, because you already know).

For all of the reasons above; I recommend that people begin with a "Switch_To_Thread(Thread_ID)" function. Your "Reschedule()" would decide which thread to switch to and then call the "Switch_To_Thread(Thread_ID)" function; and other parts of the kernel will call "Reschedule()" or "Switch_To_Thread(Thread_ID)" directly. The "Switch_To_Thread(Thread_ID)" function should probably be in 100% assembly to avoid all of the problems with stack layout and local variables, etc (but also because it ends up requiring a lot of assembly anyway, to access ESP directly and to access special things, like EFLAGS, CR3, FXSAVE, MSRs, debug registers, TSC, etc).

Cheers,

Brendan

Posted: **Sat Nov 26, 2011 11:51 am**

@Casm and @Brendan: Thank you both for your replies. Good call on the paging (and other) registers. Also, the point on preempting a process (or task -- BTW: which is the correct terminology?) directly when a process unblocks is what I have been missing. This was why the code was swimming for me.

@Brendan, you mention a Switch_To_Task(Task_ID) function. I like the thought. I assume that you are suggesting that Task_ID be implemented as PID. This then sparked the internal debate over implementing the task list as a singly or doubly linked list or as an array. In my kernel, I was planning on a linked list and establishing it as a priority queue, managing the order as the individual conditions change (such as priority, state, etc.). For performance reasons, this would have to be implemented as a doubly linked list. However, in that scenario, passing PID to Switch_To_Task would not be efficient.

Changing the design to a doubly linked list of pointers to array elements would then suffice since the PID could then be the array index. But then that imposes a hard limitation on the number of threads that can execute. It also creates scenario of PID reuse, which I was not planning for (I have no design plans that prevent it; I just hadn't considered it). So, which implementation is more common (I'm deliberately trying to stay away from a debate over which is "better")?

@Casm and @Brendan, you both also bring up a point about under what conditions the Switich_To_Task function is called, and the state of the stack for this call. Not only is this a concern for creating processes, but also for calling Switch_To_Task directly.

For example: ProcessA is in the ready queue with its own stack and the stack was left in a state where it was preempted through the timer interrupt. ProcessB, the current process, now enters a blocked state (such as sleep()), and therefore needs to give up the processor to the next process on the ready queue. Since the kernel function will call Reschedule without going through the interrupt handler, ProcessA will receive control and will exit in a different manner than it went in, meaning the code path in and the code path out will not match. Or, does this matter if I am storing all the registers (speaking specifically of the calls and returns, not the register saves and restores which is a clear issue in my code currently for direct calls) and return path on the process stack directly? Can I overcome this by calling executing INT 20h (timer interrupt handler) to trigger a reschedule from sleep()? I am wondering if the processor keeps track of whether it is processing an interrupt and cares about re-entrant interrupts (which will eventually happen with the above scenario). So, is my design flawed and the only way to overcome these challenges to maintain my own kernel stack (which I understand as a best practice, but I was hoping to stay simple for now)?

@Brendan, you also gave me a lot more to think about, which is what I was asking for... so thank you! I do not yet fully understand all the registers and thoughts you put out there, but believe me I will print your reply for future reference!

Next stop: the Intel Software Development Manual to figure out more about the TSS. Again, I was hoping I would not have to cross this bridge so soon....

Again, thank you for your consideration to my thoughts.

Posted: **Sat Nov 26, 2011 3:22 pm**

eryjus wrote:@Casm and @Brendan, you both also bring up a point about under what conditions the Switich_To_Task function is called, and the state of the stack for this call. Not only is this a concern for creating processes, but also for calling Switch_To_Task directly.

For example: ProcessA is in the ready queue with its own stack and the stack was left in a state where it was preempted through the timer interrupt. ProcessB, the current process, now enters a blocked state (such as sleep()), and therefore needs to give up the processor to the next process on the ready queue. Since the kernel function will call Reschedule without going through the interrupt handler, ProcessA will receive control and will exit in a different manner than it went in, meaning the code path in and the code path out will not match. Or, does this matter if I am storing all the registers (speaking specifically of the calls and returns, not the register saves and restores which is a clear issue in my code currently for direct calls) and return path on the process stack directly? Can I overcome this by calling executing INT 20h (timer interrupt handler) to trigger a reschedule from sleep()? I am wondering if the processor keeps track of whether it is processing an interrupt and cares about re-entrant interrupts (which will eventually happen with the above scenario).

One possible scenario is that a process would generate a software interrupt when it blocked, and that would give control to the sceduler. The scheduler would then save that processes eip, flags and other registers to the process (or thread) control block, having retrieved the first two from the processes's stack. then it would push the saved flags, code segment selector and eip of the next thread to run onto the stack (after reloading the stack pointer), and do an iret. That process would then start running at the point where it left off. If nothing else happens in the meantime, the timer interrupt will eventually go off, and then the process will be repeated.

You could do an int 20h, or alternatively some other interrupt vector, pointed at the relevant code, if that would serve your purposes better. Or you could avoid software interrupts entirely, and just do a call to get into the scheduler code. The only thing there is that, since a call pushes four bytes (eip) onto the stack, and an iret pops ten, you need to do some arithmetic on the stack pointer to make sure the implicit pushes and pops balance out.

The processor doesn't care how it came to be executing the code which is currently running, whether it be as the result of a hardware interrupt or anything else. The only thing you have to remember with a hardware generated interrupt is to send an end of interrupt signal to the interrupt controller.

Posted: **Sat Nov 26, 2011 5:08 pm**

eryjus wrote:Next stop: the Intel Software Development Manual to figure out more about the TSS. Again, I was hoping I would not have to cross this bridge so soon....

From Intel's System Programming Guide, Part 1, section 7.2 "Task Management Data Structures":

When operating in protected mode, a TSS and TSS descriptor must be created for at least one task, and the segment selector for the TSS must be loaded into the task register (using the LTR instruction).

This effectively resolves my thoughts about avoiding dealing with the TSS yet when switching tasks.

Posted: **Sun Nov 27, 2011 12:33 am**

Hi,

eryjus wrote:@Brendan, you mention a Switch_To_Task(Task_ID) function. I like the thought. I assume that you are suggesting that Task_ID be implemented as PID.

I use the word "task" to mean "process or thread or job or whatever you're switching". In most cases (but not all), the only things that get CPU time are threads (e.g. a process never gets CPU time and never executes, only threads that belong to a process get CPU time and are executed). In this case it'd be "Switch_To_Thread(Thread_ID)".

Of course OSs may be different. For example, a kernel might schedule processes instead (if threads aren't supported at all, or are implemented in user-space), or a kernel might be a messed up thing where there's just "entities" that have attributes (with no clear distinction between processes and threads), or you could have both kernel threads and user-space threads (or "fibres"), or some other arrangement.

eryjus wrote:This then sparked the internal debate over implementing the task list as a singly or doubly linked list or as an array.

I tend to use many singly linked lists of tasks (plus other data structures for special occasions); where tasks are moved between linked lists and/or other data structures when they change state. For example, when a task is given CPU time it might be removed from a "ready to run" queue (a task isn't in the "ready to run" state when it's in the "running" state); and then when the task stops running it might be put back onto a "ready to run" queue (not necessarily the same one) or put somewhere else instead if it blocked and isn't "ready to run" again (e.g. put onto a data structure that tracks when to wake up sleeping tasks).

eryjus wrote:For example: ProcessA is in the ready queue with its own stack and the stack was left in a state where it was preempted through the timer interrupt. ProcessB, the current process, now enters a blocked state (such as sleep()), and therefore needs to give up the processor to the next process on the ready queue. Since the kernel function will call Reschedule without going through the interrupt handler, ProcessA will receive control and will exit in a different manner than it went in, meaning the code path in and the code path out will not match.

The code path into and out of the "switch_to_task()" function will remain the same in all cases; and that's all that matters. The "switch_to_task()" function switches to the new task, then the new task returns to whatever that new task was doing before it called "switch_to_task()" (which may be an IRQ handler, "Reschedule()", any other kernel function, an exception handler or any anything else).

eryjus wrote:Or, does this matter if I am storing all the registers (speaking specifically of the calls and returns, not the register saves and restores which is a clear issue in my code currently for direct calls) and return path on the process stack directly?

For scheduling, it won't matter too much where the task's state is stored. However, there's no guarantee that a user-space stack has enough space (or is actually valid). Normally each thread would have a user-space stack and a kernel-stack; and the thread's state would be stored on the thread's kernel-stack during thread/task switches.

eryjus wrote:Can I overcome this by calling executing INT 20h (timer interrupt handler) to trigger a reschedule from sleep()?

The "sleep()" kernel function would put the task onto some sort of data structure (something designed to keep track of when sleeping tasks should be woken up), and then the "sleep()" kernel function would call "Reschedule()" itself. When it's time for the sleeping task to be woken up, something (e.g. a timer IRQ handler) would remove the task from whatever is being used to keep track of sleeping threads and tell the scheduler that the task is ready to run again. The scheduler would put the task onto a "raedy to run" queue (and/or it might look at the task's priority and decide to preempt the currently running task).

eryjus wrote:I am wondering if the processor keeps track of whether it is processing an interrupt and cares about re-entrant interrupts (which will eventually happen with the above scenario).

In general, the CPU doesn't keep track of whether it's processing an interrupt or not and can handle nested interrupts; but there are a few special cases (mostly involving SMI/SMM) that you probably won't ever need to worry about. The PIC and IO APIC chips do keep track of which IRQs the CPU/s are supposed to be servicing though; which is why an IRQ handler will typically disable preemption (or disable IRQs), then do it's work, then send an EOI, then enable preemption/IRQs after that (to make sure that a task switch can't occur before the EOI is sent).

Cheers,

Brendan

Posted: **Sun Nov 27, 2011 6:50 pm**

Thank you all for your help. Just to complete the thread, I thought I would post my current code, which work but is not as robust as I would like. If someone can help me understand, the following block cause me many hours of head-scratching trying to understand why esp would not get updated.

mov eax,cr0 ; disable paging; needs to be late
and eax,0x7fffffff ; because it clobbers ebx
mov cr0,eax

Beyond the above question, the rest is informational only. Please enjoy.
Anyway, my current "Process.h":

Code: Select all

#ifndef PROCESS_H
#define PROCESS_H

typedef enum {PRC_FREE = 0,
				PRC_CURR = 1,
				PRC_RDY  = 2,
				PRC_MSGW = 3,
				PRC_SEMW = 4,
				PRC_DLYW = 5,
				PRC_SUSP = 6} ProcessState;
				
typedef enum {PTY_ERR = 0,
				PTY_HI = 1,
				PTY_NORM = 2,
				PTY_LOW = 3} Priority;

typedef struct ProcessQueueEntry {
	uint16 pid;
	struct ProcessQueueEntry *prev;
	struct ProcessQueueEntry *next;
} ProcessQueueEntry;

typedef struct QueueHeadTail {
	ProcessQueueEntry *Head;
	ProcessQueueEntry *Tail;
} QueueHeadTail;

typedef struct ProcessEntry {
	ProcessState state;		// offset 0
	Priority priority;		// offset 4
	uint32 sem;				// offset 8
	uint32 msg;				// offset 12
	uint16 hasMsg;			// offset 16
	uint16 ss;				// offset 18
	uint32 esp;				// offset 20
	uint32 cr3;				// offset 24
	uint32 *stack;			// offset 28
	uint32 stackLength;		// offset 32
	ProcessQueueEntry *queueEntry;
} ProcessEntry;

typedef void (*TaskFunc)(void);

#define MAX_PROCESSES 1024
extern ProcessEntry ProcessList[MAX_PROCESSES];

extern ProcessQueueEntry *nullProcess;
extern ProcessQueueEntry *currentProcess;
extern QueueHeadTail freeQ;
extern QueueHeadTail waitQ;
extern QueueHeadTail rdyHighQ;
extern QueueHeadTail rdyNormQ;
extern QueueHeadTail rdyLowQ;
extern uint16 processCount;

void Enqueue(QueueHeadTail *queue, ProcessQueueEntry *pqe);
ProcessQueueEntry *Dequeue(QueueHeadTail *queue);

void InitProcess(void);
void ReadyProcess(uint16 pid);
void Reschedule(void);
void SwitchToTask(uint16 pid, uint16 pidOld);
uint16 GetPID(void);
uint16 CreateProcess(TaskFunc t);
void UserReturn(void);
uint32 GetFlags(void);
uint32 GetCR3(void);
uint16 GetDS(void);

#define SetFlags(x) EnableInterrupts(x)

#define INITIAL_STACK_SIZE 4096

#endif

I opted for a table of processes with a doubly linked list for a priority queue implementation. I do not like the limitation, so this will likely change but not right now.

My SwitchToTask.c:

Code: Select all

#include "kernel.h"

void _SwitchTasks(ProcessEntry *p, ProcessEntry *o);

void SwitchToTask(uint16 pid, uint16 pidOld)
{
	if (pid == pidOld) return;
	
	currentProcess = ProcessList[pid].queueEntry;
	_SwitchTasks(&ProcessList[pid], &ProcessList[pidOld]);
}

The intermediate step was my way of trying to keep some of the complexities of my implementation out of assembly. Many will suggest that I should drop the second parameter since we are performing a switch from the current task, which is known. And you are right. I will do this soon enough.

And my _SwitchTasks.s code:

Code: Select all

	global	_SwitchTasks

_SwitchTasks:
; the stack has eip on it already on the way in.  This can only
; be called from SwitchToTask, which guarantees a near call.

; save the state of the process
	push	ebp
	mov		ebp,esp		; address of new struct is now at [ebp+8]
						; address of old struct is now at [ebp+12]

	pushfd				; push the flags onto the stack
	cli					; disable interrupts; critical stuff going on
	push	eax			; save the registers: eax
	push	ebx			; save the registers: ebx
	push	ecx			; save the registers: ecx
	push	edx			; save the registers: edx
	push	esi			; save the registers: esi
	push	edi			; save the registers: edi

	xor		eax,eax		; clear eax, using lower 16
	mov		ax,ds		; save the registers: ds
	push	eax
	mov		ax,es		; save the registers: es
	push	eax
	mov		ax,fs		; save the registers: fs
	push	eax
	mov		ax,gs		; save the registers: gs
	push	eax

; perform a context switch; stack to remain static
	mov		ebx,[ebp+12]	; ptr to old Process

	mov		eax,cr3			; save cr3
	mov		[ebx+24],eax

	mov		eax,esp			; save esp
	mov		[ebx+20],eax

	xor		eax,eax			; save ss
	mov		ax,ss
	mov		[ebx+18],ax

	mov		ebx,[ebp+8]		; ptr to the new Process

	mov		ax,[ebx+18]		; get ss
	mov		ss,ax

	mov		eax,[ebx+20]	; get esp
	mov		esp,eax

	mov		eax,cr0			; disable paging; needs to be late
	and		eax,0x7fffffff	; because it clobbers ebx
	mov		cr0,eax

	mov		eax,[ebx+24]	; get cr3
	mov		cr3,eax

; restore the state of the register's process
	pop		eax				; restore gs
	mov		gs,ax
	pop		eax				; restore fs
	mov		fs,ax
	pop		eax				; restore es
	mov		es,ax
	pop		eax				; restore ds
	mov		ds,ax
	pop		edi				; restore edi
	pop		esi				; restore esi
	pop		edx				; restore edx
	pop		ecx				; restore ecx
	pop		ebx				; restore ebx

	mov		eax,cr0
	or		eax,0x80000000
	mov		cr0,eax

	pop		eax				; restore eax
	popfd
	pop		ebp				; restore ebp

	sti

	ret

Still no TSS. But I am also running everything at Priv Lvl 0. I know I still have lots to do.

Anyway, thanks again for the help!

Posted: **Sun Nov 27, 2011 7:49 pm**

Why do you disable paging ? I'm not sure what you are trying to do with that ?

Posted: **Sun Nov 27, 2011 7:54 pm**

I expect that when I get to my user space code, I will need to reset page tables (CR3) to be appropriate for the process receiving control. Though everything runs against the same page directory today, I am certain it will not be that way as I get deeper into the code. I thought it better to plan for it now rather than put is aside for later.

I also thought it was required to disable paging in order to update CR3, or at least was considered a best practice.

Posted: **Sun Nov 27, 2011 7:55 pm**

Nope, you should never disable paging. When you reload CR3 you only need to make sure that the code and variables you are currently using don't change. The usual and simplest way of doing this is to make sure that every set of pagetables contains within its context a copy of that code. Usually the entire kernel is mapped into every CR3 context.

[Edit - added] If I recall, I think JamesM's tutorial does disable paging so that some physical memory can be copied as part of the 'clone' functionality. This is just sample code and is not how it would be done in a real OS.

Posted: **Sun Nov 27, 2011 8:28 pm**

Thank you. I have made the necessary adjustments.

Nothing like a good code review!

Thanks again!

OSDev.org

Context switching question - can it really be this simple??

Context switching question - can it really be this simple??

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl

Re: Context switching question - can it really be this simpl