Amaging Triple Fault

virusx · Post by **virusx** » Wed Oct 13, 2004 12:51 am

hi,
I am using software based Task switching and there is only one TSS.
When a taskswitch occurs, i get triple fault.
Bochs shows the cpu was executing push %ebp(55). The cr2 contains the address of tss+8 i.e the ss0. The switched task has iopl 3.

What am i really missing?

thanks in Advance

Pype.Clicker · Post by **Pype.Clicker** » Wed Oct 13, 2004 5:10 am

that sounds much like the SS0 field couldn't be accessed while the system was trying to raise an exception.

Do both threads run in the same address space ? If not did you properly mapped the TSS in both address spaces ?

Can you tell what was the original fault (page ? stack ? gpf ?)

Did you load TSS.SS0 with a valid selector and TSS.ESP0 with a valid offset ?

bubach · Post by **bubach** » Wed Oct 13, 2004 6:06 am

virusx wrote:I am using software based Task switching

Software Task Swiching vs. TSS, what should i choose?
Any bad/good parts i need to know about?

distantvoices · Post by **distantvoices** » Wed Oct 13, 2004 6:29 am

hmmm ... if you take a look at the osfaq, you'll quickly dicover *this one*

http://www.osdev.org/osfaq2/index.php/C ... 0Switching

hope this helps

bubach · Post by **bubach** » Wed Oct 13, 2004 7:39 am

opps, sorry...

Dreamsmith · Post by **Dreamsmith** » Wed Oct 13, 2004 11:06 am

I've updated that particular page of the FAQ to properly cite the quote Solar included but didn't remember where he got it from.

It's interesting to note that the difference in speed between hardware and software task switching is actually decreasing in later Pentium processors -- the save/load register portion of a task switch is becoming a smaller and smaller percentage of the overall time spent during a task switch. On a P4, doing the impossible of eliminating it entirely would only shave 6% off the time involved (assuming you also had an instantaneous scheduling algorithm). On modern processors at least, performance considerations appear to be moot. One should probably choose ones method based on ease of implementation vs. portability issues instead.

Colonel Kernel · Post by **Colonel Kernel** » Wed Oct 13, 2004 10:55 pm

I read that quote... interesting stuff. I'm curious to see if the results are similar with various Athlons...

I'm also curious to know if anyone's recommendation of whether to use h/w or s/w task switching changes in light of this new info.

Dreamsmith · Post by **Dreamsmith** » Sat Oct 16, 2004 1:08 am

Colonel Kernel wrote:I'm also curious to know if anyone's recommendation of whether to use h/w or s/w task switching changes in light of this new info.

My guess would be, probably not. Speed wasn't the only reason for going with software task switching, so seeing the speed difference dwindle isn't likely to change anyone's mind. Also, it never was that much faster, so the speed argument only impressed those who wanted to shave off every last cycle they can. Although the percentage is dwindling, it still is a bit faster, so those same people will still prefer it for the same reason.

Now, hardware task switching is easier to implement, so I imagine there are probably more than a few of us who implemented it that way first, with the intention of going back and "doing it right" later. At least, that's always been my plan, but I hadn't gotten back to it because I had bigger fish to fry (for example, improving the performance of my TCP/IP stack is much higher on my priority list than optimizing task switches). All this does is push software task switching even further down my priority list.

mystran · Post by **mystran** » Sat Oct 16, 2004 1:36 am

I personally like the model where you save userspace stuff to stack when you go to kernel, then in kernel the thread does whatever it wants to do. Now, a C kernel takes care of everything but ebx/ebp as part of normal C calling convention, so task switching from a kernel task to another is simply a question of saving/restoring ebx/ebp and swapping the stack.

Why I like this model? Because now I don't need to take the hit from switching the page-directory unless I need to go to userland. This means I can schedule task, check stuff like whether there's something more to do in kernelspace, and possibly queue it again, scheduling another task. Only when I know I'm really going to want to go into the userland the virtual memory context switch is really needed.

Why would such things happen? Well, for example that allows you to handle stuff like reading from a file directly in the thread that actually wants the stuff read. Say, a block from disk arrives, so we can switch to that thread temporarily, see what other blocks we still need, then switch back to the first thread, and return to userland with no context switch at all. That means less stuff to coordinate, as more is implicit in the stacks of the threads.

So basicly, there are co-operative kernel threads, which sometimes (quite often actually) trip to userspace, and appear to be pre-emptive, because any interrupt can bring them back to kernel. As long as they stay in kernel they don't need to care about which address space is actually active, because (almost) everything threads ever need in kernel is mapped in every context anyway.

That's basicly why I do it with software task-switches. Besides, no offence anyone, but I find this actually easier to implement, especially if you want to support "arbitary" amounts of threads and processes.

Dreamsmith · Post by **Dreamsmith** » Sat Oct 16, 2004 2:44 am

mystran wrote:That's basicly why I do it with software task-switches. Besides, no offence anyone, but I find this actually easier to implement, ...

I guess it depends on exactly what you want to be easier. Here, for example, is my task switch code:

Code: Select all

    ljmp(task->selector);

Not sure how any software task switching code could be simpler than a single instruction, or easier to implement.

mystran wrote:...especially if you want to support "arbitary" amounts of threads and processes.

I've never seen this as an even remotely sensible argument. Yes, it's true, the hardware approach does limit you to around 8180 simultaneously executing tasks unless you implement allocating/deallocating task selectors on the fly. There'd be a real concern here, if it weren't for the fact that the system would have long since suffered memory exhaustion, and the fact that, at 128 task switches per second, each task would only be getting one 8ms time slice per minute, with over 1 minute intervals between slices, making the difference between this situation and being completely hung/crashed more or less academic.

Although theoretically different, I would say that there's no practical difference between being able to support 8180 simultaneously executing tasks and being able to support an infinite number.

In either case, that limit it WAY too high. A system that doesn't wish to be overly susceptible to buggy programs or DOS attacks should set far lower limits to the amount of forking/spawning it'll tolerate before a panic/reboot or some other emergency measure to stop runaway resource allocation. I would classify actually allowing an arbitrary number of tasks as a design flaw...

Brendan · Post by **Brendan** » Sat Oct 16, 2004 3:17 am

Hi,

Dreamsmith wrote:
mystran wrote:...especially if you want to support "arbitary" amounts of threads and processes.
I've never seen this as an even remotely sensible argument. Yes, it's true, the hardware approach does limit you to around 8180 simultaneously executing tasks unless you implement allocating/deallocating task selectors on the fly. There'd be a real concern here, if it weren't for the fact that the system would have long since suffered memory exhaustion, and the fact that, at 128 task switches per second, each task would only be getting one 8ms time slice per minute, with over 1 minute intervals between slices, making the difference between this situation and being completely hung/crashed more or less academic.

My OS supports up to 65536 threads, where each thread consumes a minimum of 12 Kb (thread data area which includes kernel stack, page table for user stack and one page of user stack). Therefore to have the maximum number of threads you'd need at least 768 Mb. IMHO by the time I've completed the OS computers with over 4 Gb will be common.

You'd be right about the amount of CPU time each thread would get if all of these threads are actually running, but it's almost impossible for this to be the case - most of the threads would be blocked waiting for a message (e.g. waiting for a keypress or something). My OS also supports up to 256 CPUs, which would mean there'd only be 256 threads per CPU (with max. threads and max. CPUs). Considering this, a limit of 65536 threads may be too small.

Cheers,

Brendan

virusx · Post by **virusx** » Tue Oct 19, 2004 12:13 am

hi,
>>Did you load TSS.SS0 with a valid selector and TSS.ESP0 >>with a valid offset ?
Yes, I have updated ss0 at boot. ans esp0 at each context switch.

>>Can you tell what was the original fault (page ? stack ? >>gpf ?)
On bochs it gives: exception 3rd (14) occoured without resolution.
On vmware it says the vm suffered a stackfault in kernel mode...

>>Do both threads run in the same address space ? If not did >>you properly mapped the TSS in both address spaces ?

No they run in seperate Address space. and i have carefully loaded cr3 value of the task to run.
What other things do i need to do?

Thanks

Solar · Post by **Solar** » Tue Oct 19, 2004 12:29 am

Brendan wrote:
IMHO by the time I've completed the OS computers with over 4 Gb will be common.

And how do you rate the chance that, say, a Pentium V extends the number of supported HW-tasks?

And since it came up in a different thread, here is a good example for what user-level threads are good for: Have the kernel only schedule the processes, and have the processes handle their threads. Reduces the load on the kernel...

Brendan · Post by **Brendan** » Tue Oct 19, 2004 7:10 am

Hi,

Solar wrote: And how do you rate the chance that, say, a Pentium V extends the number of supported HW-tasks?

Considering that AMD's 64 bit mode/long mode (which has already spread to almost all 80x86 manufacturers) does not support hardware task switching, and that Intel's Itanium doesn't support native hardware task switching either, I'd say the chance of any 80x86 CPU manufacturer caring about hardware task switching (other than for backwards compatability reasons) is extremely small. The word deprecated springs to mind..

Solar wrote: And since it came up in a different thread, here is a good example for what user-level threads are good for: Have the kernel only schedule the processes, and have the processes handle their threads. Reduces the load on the kernel...

Reduces the load on the kernel while increasing the work that each (multi-threaded) process needs to do, which would increase overall workload. The only benefit I can see is that thread switches between threads in the same process wouldn't change address spaces, but the kernel's scheduler could do the same thing if this is desired.

Because the kernel can't know the priority of each thread within the process, and can't block each of these threads individually it'd end up being less effecient IMHO. For computers with multiple CPUs it would be impossible for a single process to have threads running on different CPUs. Also things like file handles, IO ports, IRQs, etc couldn't be allocated by a single thread, but would have to be allocated by the process, which means all threads within the process could use them without restriction (which wouldn't be the same as *nix, but may be desirable anyway).

The 8190 task hardware task switching limit isn't really a limit, as it's easily bypassed by dynamically changing TSS descriptors. Therefore I wouldn't consider this limit a good reason to use user-level threads.

Cheers,

Brendan

Dreamsmith · Post by **Dreamsmith** » Sun Oct 24, 2004 3:32 pm

Brendan wrote:... IMHO by the time I've completed the OS computers with over 4 Gb will be common. .... Considering this, a limit of 65536 threads may be too small.

Different design goals, different rules. My OS should be ready to deploy in a few months, and the first serious use of it is going to involve me shoehorning it into a handheld unit with as little RAM as possible. Each and every chip added to the design adds significant costs to the eventual (hopefully) hardware run. I want a stripped version runable on 64K, and a full version running on 512K. (It should be noted that most computers in use today has far less than this -- check the chips in your microwave, your television, your VCR, your calculator, and all the other computers you have if you don't believe me. ;D)

This aside, however, it really ought to be noted that blocked processes shouldn't be counted against the segment descriptor limit. Creating and destroying segment descriptors on every task switch would be annoying, but creating and destroying segment descriptors as tasks and added or removed from the run queue is trivial. The descriptor limit really only significantly impacts the number of running tasks you can easily support (and even that isn't too hard to get around, really).

To me, the best argument against hardware task switching involves the size of the TSS, which holds many more entries than I really need. Look at all those wasted bytes! *shudder*

OSDev.org

Amaging Triple Fault

Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault

Re:Amaging Triple Fault