OSDev.org

Posted: **Sat Dec 03, 2005 3:23 pm**

Hi,
I'm working on multitasking with protection. I started with stack based task switching which runs properly in kernel space.
But now if I try to run a task in ring 3 I get a double fault.
The output from my exception handler shows, that my stack seems to be corrupted somehow. But this seems to be a bit strange as the same layout works for kernel mode pretty fine.
Here's the setup:

Code: Select all

stack[0]=0x20 | 3;          //gs
stack[1]=0x20 | 3;          //fs
stack[2]=0x20 | 3;          //es
stack[3]=0x20 | 3;          //ds
stack[4]=0;                //edi
stack[5]=0;                //esi
stack[6]=0;                //ebp
stack[7]=0;                //esp
stack[8]=0;                //ebx
stack[9]=0;                //edx
stack[10]=0;             //ecx
stack[11]=0;             //eax
stack[12]=(uint)0x80002000;    //entry point (eip) 
stack[13]=0x18 | 0x03;        // user code seg
stack[14]=0x3202;           // eflags
stack[15]=0x80001FF4;    // ring 3 esp
stack[16]=0x20 | 0x03;      //  ss

I use the same layout for kernel mode except that there the stackend is #14.

The assembly stub:

Code: Select all

go_on: 
call ktask_handler
mov eax, [curProc]             ;switch to procs kstack
mov esp, [eax]                
mov [ktss + 4], esp            ;patch tss
mov al, 0x20
out 0x20,al
pop gs
pop fs
pop es
pop ds
popad
;cmp esp, 0x80000000    
;ja stop                          << here I jumped to a hlt for debugging
iretd

The hlt instruction above revealed that the stack layout is intact so far, but if the iretd is executed it throws a double fault.
The fault handler shows that the values are'shifted' by one meaning that the values are in the register that follows the correct target register.

Code: Select all

EIP: 0x05          ; should be 0x80002000
CS: 0x80002000 ; should be 0x18 | 3 >> 0x1B
EFLAGS: 0x1B    ; should be 0x3202
ESP: 0x3202      ; should be 0x80001FF4
SS:  0x80001FF4; should be 0x20 | 3 >> 0x23

As I stated if I used that cmp to inspect the stack just before the iret the content of the actual stack position is the expected 0x80002000 - I have totally no idea how the stack could get corrupted.

Posted: **Sat Dec 03, 2005 3:43 pm**

Hi,

OZ wrote:

Code: Select all

EIP: 0x05          ; should be 0x80002000
CS: 0x80002000 ; should be 0x18 | 3 >> 0x1B
EFLAGS: 0x1B    ; should be 0x3202
ESP: 0x3202      ; should be 0x80001FF4
SS:  0x80001FF4; should be 0x20 | 3 >> 0x23

This looks a lot like the CPU pushed an error code on the stack. 0x00000005 would be a problem with LDT entry 0x0000 caused by external software (e.g. a GPF caused by an IRQ) or could be a page-level protection violation caused while the CPU was in user-mode (for a page fault).

I'd assume that a page fault is more likely, and that your task switch did work and also that the CPU left a "gift" for you in CR2 (but your page fault handler crashed too).

Just a guess....

Cheers,

Brendan

Posted: **Sun Dec 04, 2005 11:12 am**

Hmm I don't know what I should say now ...
On the one hand I feel like a total idiot on the other I could say I got quite far without proper exception handling ... ::)
Well while debugging I looked at the assembly stubs of the isrs - I can't tell if I ever looked there before

and had to recognize that due to a lazy use of copy 'n paste one might end up with a kernel, that only knows double faults ...
Actually I never got suspicious that I only got double faults till now - well division by zero worked aswell :-[

Ok back to the problem, now the double fault mutated magically to a breakpoint exception - still don't know what this means ...
thx for your suggestions Brendan, but I had to discover aswell that I didn't use the supervisor bit so far - well I thought that leaving the kernel's page dir in place could be the problem but without protection that shouldn't be an issue.
Aswell I haven't got a LDT.

ps: bochs tells that my LDT is invalid - [CPU ] load_seg_reg: LDT invalid. Isn't the ldt optional or did I get that wrong?
I'm still searching on information on that ... google didn't give me much yet.

Posted: **Sun Dec 04, 2005 4:06 pm**

The LDT _is_ optional. You probably popped EFLAGS or whatnot off the stack improperly, setting LDT-Related flags

. That's the most common cause.

Posted: **Mon Dec 05, 2005 8:29 am**

when I got the little bit the intel manual mentions about the breakpoint exception right, then this is one of those that can be software generated, but should not be triggered by hardware ?!?
Any suggestions in which area I should search the fault?
Additionally it said in the manual that this is just a trap meaning I could go on with the execution afterwards.

Posted: **Sat Dec 31, 2005 8:10 am**

This bug I'm encountering is getting even more mysterious ...
What I did so far :
I changed to elf - format and added debug symbols.
I ran it up and down through gdb and it became clear that the switch itself just works fine. But when execution jumps to ring3 code it throws a breakpoint exception. Another point is that although this function doesn't do a thing more than for(;;); putting a __asm__("nop"); in front lets it switch back and forth to the task handler once or twice before it crashes with the breakpoint exception. Additionally this breakpoint exception is strange, when execution reaches the isr stub the stack contains 6 values instead of the normal 5 (eip,cs,eflags,esp,ss). The last pushed value is always a 5 and I don't get where it comes from as all that happens in between is done by the cpu automatically and has nothing to do with my code afaik.
To sum it up I have totally no idea why I get this breakpoint exception neither do I know where this sixth value on the stack comes from :'(

Posted: **Sat Dec 31, 2005 1:17 pm**

@OZ

stack[0]=0x20 | 3; //gs
stack[1]=0x20 | 3; //fs
stack[2]=0x20 | 3; //es
stack[3]=0x20 | 3; //ds
stack[4]=0; //edi
stack[5]=0; //esi
stack[6]=0; //ebp
stack[7]=0; //esp
stack[8]=0; //ebx
stack[9]=0; //edx
stack[10]=0; //ecx
stack[11]=0; //eax
stack[12]=(uint)0x80002000; //entry point (eip)
stack[13]=0x18 | 0x03; // user code seg
stack[14]=0x3202; // eflags
stack[15]=0x80001FF4; // ring 3 esp
stack[16]=0x20 | 0x03; // ss

I guess you cant do "stack[1]=0x20 | 3; ". You are ORing the selectors....? I guess this wont make a RING 3 descriptor, instead of this you must have a ring 3 decriptor in GDT.

Posted: **Sat Dec 31, 2005 2:05 pm**

He is setting the RPL to 3. The DPL of the descriptor in the GDT must match as well.

Posted: **Sun Jan 01, 2006 7:43 am**

hmm both should match already I set the descriptors like this:

Code: Select all

  kgdt_set_gate(3,0,0xFFFFFFFF,PRESENT + USER + S + CODE + A, G+ D);            //   user code 
    kgdt_set_gate(4,0,0xFFFFFFFF,PRESENT + USER + S + DATA + A, G +D);               //    user data 

/****  excerpt of globals header file  *****/
#define    PRESENT             128
#define    USER            96
#define    KERNEL                 0
#define    S         16
#define    CODE            10
#define    DATA            2
#define    A              1
#define    G              128
#define    D              64

I reread some parts of the intel manual, but the breakpoint exception is none of those who push a error code themselves, therefore my stack frame is corrupt when the isr handler gets called but why ...

Posted: **Thu Jan 05, 2006 8:13 am**

sry for pushing my thread again - just a few last questions ...
Since I don't know where to search for the cause of my bug I start searching all other again. Am I right that for adding protection to existing stackbased multitasking one needs to do the following:

add ring 3 code and data segments
add a tss and fill in SS0
use ltr instruction to load tss selector into task register
adjust existing stack frame for kernel mode by adding 'ESP?' and 'SS?'
adjust existing task handler stub to patch ESP0 of tss each switch

Is that everything that one needs to do or have I forgotten anything essential?
Is there a way that apart from an user issued int 3 a breakpoint exception is caused? Aren't those normally always user generated?

In another new thread there was some output of the bochs debugger - there was a command to show the content of the stack is there something similiar in gdb aswell? or should I try the bochs debugger instead?

Could the version of gcc be an issue? Had nobody else ever encountered a breakpoint exception where should be none?

Posted: **Thu Jan 05, 2006 11:21 am**

Before you continue further with that, can you verify that you mean exception 3 (#BP) by breakpoint, and not exception 1 (#DB)?

Posted: **Thu Jan 05, 2006 11:49 am**

thx a lot I guess not knowing all of one's own code anymore is deadly ... ::)
Please delete the thread, it's somehow embarassing but the fault 'mutated again magically' to a pagefault ... :-[
Therefore there is already an error code on the stack and the frame isn't corrupt anymore.

btw. You should add a new section to the faq :'os deving for dummies' - first tip by me the master of all dummies: never ever use copy 'n paste :'(

OSDev.org

MT: strange stack behaviour

MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour

Re:MT: strange stack behaviour