Page 1 of 2

Switching tasks and global protection fault

Posted: Mon Jul 06, 2020 3:49 pm
by mrjbom
Hi.
I'm trying to switch tasks using my scheduler. In pit_handler()
I call task_switch() where the problem occurs.
The "#PG" exception occurs during the task_switch() call. See serial.log
As you can see in the logs, the error code is 50(110010).

I looked at the Intel documentation(Intel® 64 and IA-32 architectures software developer’s manual (6-42 Vol. 3A)) to understand what it means.
It says: "Indicates that the processor detected one of a class of protection violations called “general-protection violations.”
The conditions that cause this exception to be generated comprise all the protection violations that do not cause
other exceptions to be generated (such as, invalid-TSS, segment-not-present, stack-fault, or page-fault excep-
tions).
"

The error code is described as follows:
The processor pushes an error code onto the exception handler's stack. If the fault condition was detected while
loading a segment descriptor, the error code contains a segment selector to or IDT vector number for the
descriptor; otherwise, the error code is 0. The source of the selector in an error code may be any of the following:
• An operand of the instruction.
• A selector from a gate which is the operand of the instruction.
• A selector from a TSS involved in a task switch.
• IDT vector number.


I didn't understand what the error code refers to...
But I suspect that I have problems with TSS.

I know that TSS is somehow related to GDT, I may have loaded it incorrectly, you can see it here

How can I fix the thread structure and the task_switch() code in order for the TSS to be configured normally?

Re: Switching tasks and global protection fault

Posted: Mon Jul 06, 2020 4:15 pm
by Octocontrabass
According to the error code, your IDT entry for interrupt 6, #UD, is invalid. Did you install a handler for #UD?

Re: Switching tasks and global protection fault

Posted: Mon Jul 06, 2020 5:14 pm
by mrjbom
Octocontrabass wrote:According to the error code, your IDT entry for interrupt 6, #UD, is invalid. Did you install a handler for #UD?
I added a handler for #UD, now only this exception occurs.

The documentation says that the exception saves CS and EIP, this refers to the operation that caused the exception. Where does she keep them? I don't know how to get this saved information or how to calculate the problem operation... Please tell me how to do this and I will provide information.
serial.log

Re: Switching tasks and global protection fault

Posted: Mon Jul 06, 2020 6:25 pm
by Octocontrabass
CS and EIP are saved on the stack, the same way as every other exception and interrupt.

It's a good idea to define a struct for the stack frame, then pass a pointer to the top of the stack to your exception handler. That way, you can access anything you need in order to troubleshoot an exception when it happens.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 4:02 am
by mrjbom
Octocontrabass wrote:CS and EIP are saved on the stack, the same way as every other exception and interrupt.

It's a good idea to define a struct for the stack frame, then pass a pointer to the top of the stack to your exception handler. That way, you can access anything you need in order to troubleshoot an exception when it happens.
I wrote a handler for #UD, but it still doesn't give me any useful information...
I'll give you its code just in case

Code: Select all

invalid_opcode:
  cli
  ;save all 32bit registers
  pushad
  push dword [esp + 32] ;push eip
  push word [esp + 48] ;push cs
  call invalid_opcode_handler
  pop dword [esp + 48] ;pop cs
  pop word [esp + 32] ;pop eip
  ;return all 32bit registers
  popad
  sti
  iretd

extern void invalid_opcode_handler(uint16_t cs, uint32_t eip);
I tried debugging task_switch(), the problem occurs after "ret", perhaps the control is passed to the wrong place...

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 5:03 am
by iansjack
mrjbom wrote: I tried debugging task_switch(), the problem occurs after "ret", perhaps the control is passed to the wrong place...
If you've debugged correctly, single-stepping through the code in a debugger, there should be no "perhaps" about the source of the error. It should be immediately obvious if control is returning to the wrong address, and inspection of the stack as you proceed through the switch should make the cause clear.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 7:40 am
by mrjbom
iansjack wrote:
mrjbom wrote: I tried debugging task_switch(), the problem occurs after "ret", perhaps the control is passed to the wrong place...
If you've debugged correctly, single-stepping through the code in a debugger, there should be no "perhaps" about the source of the error. It should be immediately obvious if control is returning to the wrong address, and inspection of the stack as you proceed through the switch should make the cause clear.
I found the reason for the error.
The fault was "__attribute__((packed))" which I used in structures... Most likely, the esp was just not where I expected and esp was loading something else.
Thanks for the answer.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 8:56 am
by Octacone
mrjbom wrote:
iansjack wrote:
mrjbom wrote: I tried debugging task_switch(), the problem occurs after "ret", perhaps the control is passed to the wrong place...
If you've debugged correctly, single-stepping through the code in a debugger, there should be no "perhaps" about the source of the error. It should be immediately obvious if control is returning to the wrong address, and inspection of the stack as you proceed through the switch should make the cause clear.
I found the reason for the error.
The fault was "__attribute__((packed))" which I used in structures... Most likely, the esp was just not where I expected and esp was loading something else.
Thanks for the answer.
I don't think your problem is fixed. I think it is just masked.
I always use __attribute__((packed) for all of my structures, so GCC doesn't waste space on padding.
Be careful, sometimes you'll spend days bug hunting a non-existant bug, just to figure out that your data is invalid because GCC added padding and you didn't compensate for it.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 9:11 am
by iansjack
You are correct that packing the structs was causing the error, but do you understand why? You will often find that you have to pack structs, so if you don't understand what the problem is here then you are setting yourself up for future problems.

The real error was not that you packed the structs but that you miscalculated the offsets. You might also want to reconsider the order of the fields in some of your structs so that it doesn't matter whether they are packed or not. And packing applies more widely than just structs. You can save a lot of memory by thinking about the order of structs and variable declarations.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 9:25 am
by mrjbom
iansjack wrote:You are correct that packing the structs was causing the error, but do you understand why? You will often find that you have to pack structs, so if you don't understand what the problem is here then you are setting yourself up for future problems.

The real error was not that you packed the structs but that you miscalculated the offsets. You might also want to reconsider the order of the fields in some of your structs so that it doesn't matter whether they are packed or not. And packing applies more widely than just structs. You can save a lot of memory by thinking about the order of structs and variable declarations.
I realized what the error was, because in task_switch() I use pre-calculated offsets, and because of the attribute, they were shifted to the wrong places where I expected them.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 9:30 am
by mrjbom
Octacone wrote: I don't think your problem is fixed. I think it is just masked.
I always use __attribute__((packed) for all of my structures, so GCC doesn't waste space on padding.
Be careful, sometimes you'll spend days bug hunting a non-existant bug, just to figure out that your data is invalid because GCC added padding and you didn't compensate for it.
I don't think I "masked" the problem, the code works as it should and the reason for the problem is clear.

Re: Switching tasks and global protection fault

Posted: Tue Jul 07, 2020 3:41 pm
by Octocontrabass
mrjbom wrote:

Code: Select all

  pop dword [esp + 48] ;pop cs
  pop word [esp + 32] ;pop eip
Functions clobber the arguments you push onto the stack. Use "add esp, 8" or "pop eax; pop eax" to clean up the stack after the function call.

Remove CLI and STI from your interrupt handlers. If you want interrupts disabled in your interrupt handlers, set up the IDT to use an interrupt gate.

You need CLD before you call any C function from your interrupt handler.

Re: Switching tasks and global protection fault

Posted: Thu Jul 09, 2020 5:46 am
by mrjbom
Octocontrabass wrote:
mrjbom wrote:

Code: Select all

  pop dword [esp + 48] ;pop cs
  pop word [esp + 32] ;pop eip
Functions clobber the arguments you push onto the stack. Use "add esp, 8" or "pop eax; pop eax" to clean up the stack after the function call.
If I can't push parameters to pass them to the C function, how can I pass parameters?

Re: Switching tasks and global protection fault

Posted: Thu Jul 09, 2020 5:51 am
by iansjack
mrjbom wrote: If I can't push parameters to pass them to the C function...?
That's not what was said. You do pass parameters via the stack (in 32-bit mode), but this means you have to clean up the stack when you exit the function.

Re: Switching tasks and global protection fault

Posted: Thu Jul 09, 2020 5:58 am
by mrjbom
iansjack wrote:
mrjbom wrote: If I can't push parameters to pass them to the C function...?
That's not what was said. You do pass parameters via the stack (in 32-bit mode), but this means you have to clean up the stack when you exit the function.
Oh, Yes, you're right, I don't quite understand what the problem is. I forgot that using pop I don't clear the stack...
Thanks.