[SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
[SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Hi Friends,
I am using Intel x86 TSS framework for task management
It perfectly works fine on emulator, my 4 year old dell laptop, on my pc with amd processor, on a brand new laptop with quad core intel processor
But I have a 1 year old laptop which is having a dual core xeon pentium 3 family processor.
task switch simply crashes my system.
Please note that the kernel itself boots fine and runs with it's own TSS (no LDT, ring0). the moment I switch to a TSS (user/kernel task) which has it's own LDT it crashes. In this case it is a TSS running in ring0, with LDT using same PDBR (page table directory reg) as that of kernel
I have made sure that the TSS is memset(zero) during initialization, IO_MAP_SIZE set to 103 (same in TSS desc), Debug T Bit set to 0
Again, please note that it works fine on many of the new and old laptops except the new laptop I have
I know the details I have provided are very abstract but that is all I have.
Any sort of help --> suggestions for debuging, common mistakes, anything processor specific I have to take care, etc.... is greatly appreciated.
Thanks in advance,
Regards,
- MosMan
I am using Intel x86 TSS framework for task management
It perfectly works fine on emulator, my 4 year old dell laptop, on my pc with amd processor, on a brand new laptop with quad core intel processor
But I have a 1 year old laptop which is having a dual core xeon pentium 3 family processor.
task switch simply crashes my system.
Please note that the kernel itself boots fine and runs with it's own TSS (no LDT, ring0). the moment I switch to a TSS (user/kernel task) which has it's own LDT it crashes. In this case it is a TSS running in ring0, with LDT using same PDBR (page table directory reg) as that of kernel
I have made sure that the TSS is memset(zero) during initialization, IO_MAP_SIZE set to 103 (same in TSS desc), Debug T Bit set to 0
Again, please note that it works fine on many of the new and old laptops except the new laptop I have
I know the details I have provided are very abstract but that is all I have.
Any sort of help --> suggestions for debuging, common mistakes, anything processor specific I have to take care, etc.... is greatly appreciated.
Thanks in advance,
Regards,
- MosMan
Last edited by prajwal on Sun Dec 05, 2010 2:14 am, edited 1 time in total.
complexity is the core of simplicity
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Triple Fault on TaskSwitch (Xeon Processor)
Common mistakes that fit the symptoms: buffer overflows, uninitialized memory, race conditions. None of them are easy to check though.
Did you try printf-debugging before switching to the "broken" tss?
Did you try printf-debugging before switching to the "broken" tss?
Re: Triple Fault on TaskSwitch (Xeon Processor)
Thanks Combuster. race condition is something which I would want to check again.
But the condition is consistent. either always crashes on my new laptop or never crash on other pcs I mentioned in my initial post
printf-debugging... yeah I did. I put a an infinite while loop in PIT handler to see if task switch is successful because PIT handler is the only place where the control can come after task switch and when there is a timer irq. in fact my task also is simply a while(1); loop
Regards
- MosMan
But the condition is consistent. either always crashes on my new laptop or never crash on other pcs I mentioned in my initial post
printf-debugging... yeah I did. I put a an infinite while loop in PIT handler to see if task switch is successful because PIT handler is the only place where the control can come after task switch and when there is a timer irq. in fact my task also is simply a while(1); loop
Regards
- MosMan
complexity is the core of simplicity
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Ok, the problem is fixed. Thanks for your help.
The issue was with the aligment of TSS.
I changed my GDT, LDT, more importantly TSS to be PAGE (4KB) aligned in memory.
Thanks,
- MosMan
The issue was with the aligment of TSS.
I changed my GDT, LDT, more importantly TSS to be PAGE (4KB) aligned in memory.
Thanks,
- MosMan
complexity is the core of simplicity
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Just curious, are you using hardware task switching ? As far as I remember, the TSS need not be page aligned but it should be wholly within a single physical page.
If a trainstation is where trains stop, what is a workstation ?
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Hi gerryg400, Yes I use Hardware Task switching
yeah I think you are right. TSS should be available within a single page. I do remember that nothing of page alignment constraint has been imposed on TSS locality in intel manuals. I read the latest document while fixing this and didn't see any such constraint.
Also, in my case the first 64 MB of memory is 1:1 mapped in all processes (kernel area) so there is no question of page fault in this area. My user TSS was at address 530 KB and of length 103 bytes. So, it was not spanning across page boundary and was available within a single phy page. As I mentioned it is working on many of the new laptops but except on my new laptop. So, as part of "brute force bug fix strategy " I just changed the TSS base address to 532 KB And there you go, it worked!!
I also moved LDT to page aligned address but I don't attribute LDT as the cause because I tried task switching with my new task using GDT instead of LDT... It still crashed!. So, I concluded that it is because of TSS location. Once it started working I haven't spent too much time on Postmortem
yeah I think you are right. TSS should be available within a single page. I do remember that nothing of page alignment constraint has been imposed on TSS locality in intel manuals. I read the latest document while fixing this and didn't see any such constraint.
Also, in my case the first 64 MB of memory is 1:1 mapped in all processes (kernel area) so there is no question of page fault in this area. My user TSS was at address 530 KB and of length 103 bytes. So, it was not spanning across page boundary and was available within a single phy page. As I mentioned it is working on many of the new laptops but except on my new laptop. So, as part of "brute force bug fix strategy " I just changed the TSS base address to 532 KB And there you go, it worked!!
I also moved LDT to page aligned address but I don't attribute LDT as the cause because I tried task switching with my new task using GDT instead of LDT... It still crashed!. So, I concluded that it is because of TSS location. Once it started working I haven't spent too much time on Postmortem
complexity is the core of simplicity
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
You might want to look up the relevant errata lists for something relevant - hardware task switching with unaligned TSSes is uncommon enough to be unheard of. You do need the exact processor model to find the right list (I can't look it up with the listed info - there are no dual-core chips in the pentium 3 generation out there, only chips that can be used in multisocket configurations). If you find something relevant in there you can at least be sure it was not caused by some other nasties.
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
I would say that is your problem. It's a good example of how a rarely used feature (e.g. hardware task switching) might perhaps receive less thorough testing and thus be more likely to contain a bug.
If a trainstation is where trains stop, what is a workstation ?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
I'd still like to see an Intel document confirming it...
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
See page 54 item G45 http://www.orpheuscomputing.com/downloa ... n-spec.pdf
If a trainstation is where trains stop, what is a workstation ?
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Hey Berkus, what's that mean ? I can view that page and download the doc.berkus wrote:403 with referer logged.gerryg400 wrote:See page 54 item G45 http://www.orpheuscomputing.com/downloa ... n-spec.pdf
If a trainstation is where trains stop, what is a workstation ?
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Should I attach the actual document to this post ?
Isn't Pentium 3 a lot older than 1 year ?But I have a 1 year old laptop which is having a dual core xeon pentium 3 family processor.
If a trainstation is where trains stop, what is a workstation ?
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Hi,
By omitting least-likely information and assuming "laptop" and "dual-core" is correct, and also assuming that "1 year old" means it was purchased one year ago (and designed anywhere from 12 months to 3 years ago), then I'd assume it's "Core" or "Core 2" (and maybe a "mobile" and/or
Celeron" variation of one of these). Normal Nehalem based CPUs use too much power for laptops and the "mobile Nehalem" CPUs are too new, and PentiumM and Netburst based CPUs are too old.
Cheers,
Brendan
Heh. There are no dual-core Pentium III chips, and Xeons are for servers not laptops.gerryg400 wrote:Isn't Pentium 3 a lot older than 1 year ?But I have a 1 year old laptop which is having a dual core xeon pentium 3 family processor.
By omitting least-likely information and assuming "laptop" and "dual-core" is correct, and also assuming that "1 year old" means it was purchased one year ago (and designed anywhere from 12 months to 3 years ago), then I'd assume it's "Core" or "Core 2" (and maybe a "mobile" and/or
Celeron" variation of one of these). Normal Nehalem based CPUs use too much power for laptops and the "mobile Nehalem" CPUs are too new, and PentiumM and Netburst based CPUs are too old.
Item G45 (on page 40 of that document) is "Processor May Assert DRDY# on a Write with No Data", and begins with "When a MASKMOVQ instruction is misaligned across a chunk boundary". Item G72 (the only item on page 54 of that document) is "AGTL+ Receiver May Induce Falling Edge Ledges and Undershoot Levels". Neither of these seem likely.gerryg400 wrote:See page 54 item G45 http://www.orpheuscomputing.com/downloa ... n-spec.pdf
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: [SOLVED] Triple Fault on TaskSwitch (Xeon Processor)
Sorry should have been G54 on page 45. Two typos ! But since it's prolly not a Pentium III I guess it's not this issue .....
If a trainstation is where trains stop, what is a workstation ?