Mixing Multitasking Types

AJ · Post by AJ » Thu Jan 04, 2007 6:10 am

Hi all,

I have got the hang of stack-based multitasking and hardware task switching now, and have a 'sub version' of my kernel which uses each type.

Now I have finished playing, I want to put a more useable system in place and am looking at using stack switching as my main tasking system.

Just one question - how does everyone tackle privilege-level switching? Do you generally just have one kernel tss, one user tss and load the details in to each one as required?

Also, if I wanted to restrict port access from userland code, presumably I would have to do the same thing - set up one tss with the appropriate port bitmap, and then change this bitmap for each user program I want to restrict.

From articles on the net, this seems to be how most people do it, but I'm concerned that if I'm just using 2 tss's, I will lose the performance benefit gained through stack-based switching. I can also see that management of TSS contents is going to be a bit of a headache!

Thanks in advance,
Adam

Brendan · Post by **Brendan** » Thu Jan 04, 2007 7:34 am

Hi,

AJ wrote:I have got the hang of stack-based multitasking and hardware task switching now, and have a 'sub version' of my kernel which uses each type.

Excellent! - any chance of some perfomance statistics to compare each method?

AJ wrote:Just one question - how does everyone tackle privilege-level switching? Do you generally just have one kernel tss, one user tss and load the details in to each one as required?

No - you have a single static TSS for everything, and (if necessary) dynamically change the SS0:ESP0, SS1:ESP1 and/or SS2:ESP2 fields during each (software) task switch.

For most OS's SS0 never needs to be changed, and SS1:ESP1 and SS2:ESP2 aren't used. If your kernel uses a single kernel stack (for all tasks) then you don't need to change ESP0 either.

In addition, if you use the I/O permission bitmap you'd want to change that during the task switch. Having a flag to indicate when this bitmap contains "no I/O access" can speed things up (as most tasks wouldn't have access to any I/O ports).

AJ wrote:Also, if I wanted to restrict port access from userland code, presumably I would have to do the same thing - set up one tss with the appropriate port bitmap, and then change this bitmap for each user program I want to restrict.

There are alternative methods - for example, you can restrict I/O port access using the general protection fault handler (i.e. user mode code always generates a GPF, and the GPF handler checks if the access should be allowed and emulates the I/O port access). This can save messing about with the TSS during task switches. It is slower, but I/O port access is slow anyway. Another way is to have kernel functions for I/O port access.

Which method is best depends on a lot of things (mostly, how often user-code accesses I/O ports, how much time it takes to fill the I/O permission bitmap when necessary, and how much time a kernel API call and/or general protection fault take). IMHO this means if you're doing task switches very frequently, the overhead of using the I/O permission bitmap will be more than the overhead of the kernel API and/or general protection fault handler.

You could also probably use several methods - for example, don't start using the I/O permission bitmap for a task until the task accesses several I/O ports (and occasionally stop using the I/O permission bitmap). That way, for something like a keyboard device driver you'd avoid the I/O permission bitmap for normal IRQs (where it only reads from one I/O port between task switches).

Cheers,

Brendan

Combuster · Post by **Combuster** » Thu Jan 04, 2007 7:45 am

i use one and the same TSS - during task switch i write the new kernel sp in there. Since the tss is located in a fixed location, i can easily use paging to change the port bitmap

AJ · Post by AJ » Thu Jan 04, 2007 7:56 am

Thanks for so much useful information.

Excellent! - any chance of some perfomance statistics to compare each method?

I had hoped to do this for myself but as I use the PIT for task switching, I'm b*ggered if I can think of a sensible way of doing this right now, but when I have something, I'll post it!

Thanks again for the comprehensive reply! Oh - one more thing - presumably you can also change CS, DS etc.. in the fixed TSS so that you can have some form of protection in use too? - I'll give it a try...

Adam

Brendan · Post by **Brendan** » Thu Jan 04, 2007 10:22 am

Hi,

AJ wrote:Thanks again for the comprehensive reply! Oh - one more thing - presumably you can also change CS, DS etc.. in the fixed TSS so that you can have some form of protection in use too? - I'll give it a try...

Unless you do a hardware task switch the CS, DS, etc fields in the TSS will never be read by the CPU.

If you do use a hardware task switch (e.g. for the double fault exception handler) the CPU will write the current values from the CPU into the TSS, then load new values from the TSS the CPU is switching to. This means that you never need anything in these fields for the "normal" TSS (but would need them in "extra" TSSs).

Cheers,

Brendan

AJ · Post by AJ » Thu Jan 11, 2007 1:03 pm

Sorry to whinge on about this, but it's something I don't seem to be doing successfuly despite all the advice given to me - I'm getting fed up of GP faults and don't seem to be able to keep track of where my stack pointer is! I'm fine with same-segment switches, it's this usermode thing which is bothering me.

So, when I launch a new task I do the following:

1. Allocate a stack in a kernel-mode segment with all the register and segment values I want.
2. Allocate a stack in a user-mode segment.

When the task switch occurs for the first time:
3. Load SS0 and ESP0 in the TSS with the ring0 stack values.
4. Load SS and ESP in the TSS with the ring3 stack values.
5. Pop the new task values off my ss0:esp0 stack.
6. Switch back to my ring3 stack for execution.
7. iret, restoring a ring 3 CS, eip, eflags etc...

When it occurs again:
8. Push all registers on to the task's *kernel* mode stack (which is in SS0 and ESP0).
9. Run my task scheduler.
10. Resume from 3. for the new task.

Or am I completely barking up the wrong tree?

Cheers,
Adam

Combuster · Post by **Combuster** » Thu Jan 11, 2007 1:46 pm

A few (8)) semi-random remarks (might help, might not):

I started with stack based switching with just one thread, i.e. set up a scheduler, let it regularly switch stacks to the same task. Once this works, you can add extra tasks to the list. This saves you from debugging both the scheduler and task creation code at the same time.

ESPs are a difficult thing that need some thought: you need to keep track of three variations on ESP:
- ESP for userspace (this one is automatically pushed on the stack when your handler is called)
- Current ESP for ring 0: the ESP value when the task switch occurred. I save and restore it from the task table. Changing it is the basis for the stack-switching approach.
- Base ESP for ring 0: the value of ESP0. Forgetting to change it during a task switch will cause userland code to reuse the same kernel stack over again. Setting it to the ESP value is wrong, as it'll slowly eat your stack. The correct value should be the bottom of the new stack: (it should point to the end of the piece of memory you allocated for your stack)

And an annotation of your own algorithm:

1. Allocate a stack in a kernel-mode segment with all the register and segment values I want.
Make sure you add SS3, ESP3, eflags+ip+cs, segment registers, and GPRs in the correct order from the end of this bit of memory. Use the top and bottom of this precreated stack as Current ESP0 and Base ESP0 respectively
2. Allocate a stack in a user-mode segment.
If your userland code only does JMP $ thats not even necessary

When the task switch occurs for the first time:
Switching to your first task is in my case the same as any other task switch. I'd do steps 8+ as well here
3. Load SS0 and ESP0 in the TSS with the ring0 stack values.
SS0 is usually a constant
4. Load SS and ESP in the TSS with the ring3 stack values.
this is not necessary - these values are on the stack and will be read from there
4½. Load ESP with the top of the new task's ring0 stack
see the note above, kernel ESP must be reloaded for step 5 and current ESP != base ESP
5. Pop the new task values off my ss0:esp0 stack.
GPRs and Segment registers, ok
6. Skip.
Forcing a jump to userland will break all tasks currently in kernel mode
7. iret, restoring a ring 3 CS, eip, eflags etc...
IRET is necessary. it will pop CS EIP and EFLAGS, checks the new CS, and if the privilege level goes down it'll also pop SS and ESP

When it occurs again:
See note above. if you just write into the kernel stack it doesnt matter wether its get scheduled later (or not).
8. Push all registers on to the task's *kernel* mode stack (which is in SS0 and ESP0).
Ok
8½. Save current ESP0 separately
POPAD does not restore it, and if you want to have a pre-emptible kernel it is NOT constant
9. Run my task scheduler.
Gets the next task (under the assumption that its fair and all that crap). Should return information about the next stack to pick.
10. Resume from 3. for the new task.
Ok

A tip for later: add some code to set the TS flag. Its not necessary for basic userland stuff, but it'll save you some problems when you're accidentally using FP math.

Also, poke debugging (outputting characters to video memory to see where the code breaks, optionally with CLI/HLT) and Bochs saved my task-switching code more than once.

So far the mile post, i hope its useful.

Brendan · Post by **Brendan** » Thu Jan 11, 2007 7:35 pm

Hi,

AJ wrote:Sorry to whinge on about this, but it's something I don't seem to be doing successfuly despite all the advice given to me - I'm getting fed up of GP faults and don't seem to be able to keep track of where my stack pointer is! I'm fine with same-segment switches, it's this usermode thing which is bothering me.

Just a quick comment here....

For most OS's using software task switching there is no CPL=0 to CPL=3 task switch, and (usually) you can ignore CPL=3 entirely.

Imagine you've got some CPL=3 code happily running. When does a task switch need to happen? There's only 2 situations:

a) as the result of calling a kernel function (e.g. kernel functions like "sleep()", "spawn()", IPC related kernel functions, etc) - in this case CPL=3 code switches to the kernel and then the kernel decides to do a task switch.
b) as the result of an interrupt (e.g. timer IRQ, exception, etc) - in this case the interrupt switches to CPL=0, so the CPU is already in CPL=0 by the time the task switch occurs.

This means that you never need to do a task switch directly from CPL=3 to CPL=0 (the switch from CPL=3 to CPL=0 always occurs before the task switch). Because of this it means you also never need to do a task switch directly from CPL=0 to CPL=3, because tasks were at CPL=0 before they stopped running.

This is mostly just a difference in perspective - "all tasks are kernel tasks that may run CPL=3 code".

The only tricky part is creating a new user-level thread, but this isn't very tricky either. You just spawn a kernel thread, and then use it to "return" to CPL=3 code. If the new thread is a new process then you can spawn a new kernel thread, setup a new address space, load an executable from disk, do any linking and relocation, etc and then "return" to CPL=3.

Cheers,

Brendan

AJ · Post by AJ » Fri Jan 12, 2007 3:06 am

Thanks again. Looking at the above, my problem with the transition from CPL0 to CPL3 was twofold:

* I didn't realise that if the CS changed, the CPU automatically pops SS3 and ESP3 off the stack. Looking back through the intel docs, this should have been obvious

.

* As you said, I was experienceing stack creep - only tracking the top rather than the base of the stack.

As mentioned above, I do already have a working MT system - as long as I remain in CPL0 - it's just that final hurdle of getting in to userland. I will try re-implementing based on your thorough replies and let you know how I got on.

Thanks again,
Adam

AJ · Post by AJ » Fri Jan 12, 2007 10:50 am

OK - I feel like I'm getting there now. If the kernel is the only task, fine, the stack remains constant and the scheduling process is entirely transparent. I have my scheduler outputting a '-' to the screen, so I know it's being called!

When I add a new task, the switch seems to happen fine - until the first instruction when I get a triple fault (I have a full IDT set up which normally dumps regs to COM1, but this doesn't even appear. At the moment, double faults do not have their own TSS - I'll get round to that!).

The triple fault happens whatever the first instruction is (whether it is an asm function just doing jmp $, or whether I have used a c function, in which case it is push ebp).

After the triple fault, Bochs gives the registers exactly as I would expect for the ring 3 task (GP regs = 0, seg regs all = ring 3 data (rpl3), code seg = ring 3 code (rpl3), eip = first instruction of new task, cpl = 3, ESP is in the ring 3 stack, CR3 has the page directory address I would expect (I have not switched address spaces yet.).

The only unusual symptom is that CR2 is loaded with a value in the new ring 3 stack, but I have checked with another function and the stack is *definitely* paged in!

I now have esp's swimming round my head so will give it another go tomorrow!

Cheers,
Adam

JAAman · Post by **JAAman** » Fri Jan 12, 2007 11:05 am

8½. Save current ESP0 separately
POPAD does not restore it, and if you want to have a pre-emptible kernel it is NOT constant

this is misleading:
this is usually not true for processes (since kernel stack shouldnt be global anyway), and only usually correct for threads

this is a matter of design choice not necessity -- my design is a pre-emptible kernel where each threads kernel stack is at the same virtual address

there are good reasons most people dont do this, but im not most people, im just pointing out that your statement, while largely considered to be true, is itself not a constant, and dependant on implementation

saying "it is usually not constant" would be a better choice of words, as it is possible to have a pre-emptible kernel with a constant ESP0 (doing it my way, or just not supporting threads at all -- either way your statement should be qualified)

Combuster · Post by **Combuster** » Fri Jan 12, 2007 3:32 pm

@JAAman:
I talked about Base ESP(0/3) and Current ESP(0/3) as two completely different things. The base is the bottom of the stack while the current stackpointer is the ESP value upon interrupt.
You WILL need to save the current ESP for ring 0 if your kernel is pre-emptible: the kernel stack is of undefined size when the scheduler is called and hence Current ESP is not constant. Wether the Base ESP (the ESP0 in the TSS) is constant indeed depends on design.

@AJ:
Some things that tend to mess up things badly in userland code:
- Forgetting to mark pages as non-supervisor (page fault, which explains bailing on first instruction with CR2 loaded)
- Forgetting to correctly set ESP0 and SS0 in the TSS (causes a series of GPFs -> triple fault)

AJ · Post by AJ » Mon Jan 15, 2007 2:46 am

@AJ:
Some things that tend to mess up things badly in userland code:
- Forgetting to mark pages as non-supervisor (page fault, which explains bailing on first instruction with CR2 loaded)
- Forgetting to correctly set ESP0 and SS0 in the TSS (causes a series of GPFs -> triple fault)

Genius

! Of course, because I had only ever worked in kernel space, I had only worried about the 'Present' and 'R/W' bits of my paging mechanism. Once this was cleared up, fine!

One thing this multitasking thing has taught me in a *big* way is organisation and commenting of source code - if my source was less messy, I should have spotted that much earlier.

Thanks for putting so much time in to the answers. Oh - and I will certainly try to get around to the performance monitoring as mentioned in an earlier post...

Cheers,
Adam

AJ · Post by AJ » Tue Jan 16, 2007 7:12 am

I now have it working pretty well. There were some hairy moments with conforming/non conforming code segments, restoring proper segment addresses in interrupts etc... but I got there in the end - thanks!

Adam

Combuster · Post by **Combuster** » Tue Jan 16, 2007 7:19 am

AJ wrote:I now have it working pretty well. There were some hairy moments with conforming/non conforming code segments, restoring proper segment addresses in interrupts etc... but I got there in the end - thanks!

Adam

You're welcome.
Basically, all the tips I gave you were the mistakes i made once myself

OSDev.org

Mixing Multitasking Types

Mixing Multitasking Types

Re: Mixing Multitasking Types