Hi,
I'm seeing some strange behaviour from a very simple software task switcher I'm trying to implement. Any ideas would be greatly appreciated.
My switch ISR is int 18. I'm using bochs to do my testing.
In a nutshell, the problem appears to be, at the end of the task switch ISR, (when starting a new task), the new task code seems to get executed, but immediately (??) invokes interrupt #7 (no coprocessor I think). Here's stuff to substantiate my claim:
Task switch ISR:
000100a4 <_TrapISR>:
100a4: 50 push %eax
100a5: 53 push %ebx
100a6: 51 push %ecx
100a7: 52 push %edx
100a8: 56 push %esi
100a9: 57 push %edi
100aa: 55 push %ebp
100ab: a1 ec 07 01 00 mov 0x107ec,%eax
100b0: 89 20 mov %esp,(%eax)
100b2: a1 e4 07 01 00 mov 0x107e4,%eax
100b7: 8b 20 mov (%eax),%esp
100b9: 5d pop %ebp
100ba: 5f pop %edi
100bb: 5e pop %esi
100bc: 5a pop %edx
100bd: 59 pop %ecx
100be: 5b pop %ebx
100bf: 58 pop %eax
100c0: cf iret
The following ISR gets invoked after _TrapISR completes.
000100c5 <_XXXISR>:
100c5: 58 pop %eax
100c6: 5b pop %ebx
100c7: 59 pop %ecx
100c8: 5a pop %edx
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I added these pops to work out how I get here.
100c9: eb fe jmp 100c9 <_XXXISR+0x4>
...
The output from Bochs
EAX has return address,
EBX has CS
ECX has flags
000100c9-p[XGUI ] >>PANIC<< POWER button turned off.
000100c9-i[SYS ] Last time is 1135900224
000100c9-i[XGUI ] Exit.
000100c9-i[CPU0 ] protected mode
000100c9-i[CPU0 ] CS.d_b = 32 bit
000100c9-i[CPU0 ] SS.d_b = 32 bit
000100c9-i[CPU0 ] | EAX=000100e0 EBX=00000010 ECX=00007202 EDX=00000000
000100c9-i[CPU0 ] | ESP=000106cc EBP=33333333 ESI=11111111 EDI=22222222
000100c9-i[CPU0 ] | IOPL=3 NV UP DI PL NZ NA PO NC
000100c9-i[CPU0 ] | SEG selector base limit G D
000100c9-i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
000100c9-i[CPU0 ] | CS:0010( 0002| 0| 0) 00000000 000fffff 1 1
000100c9-i[CPU0 ] | DS:0018( 0003| 0| 0) 00000000 000fffff 1 1
000100c9-i[CPU0 ] | SS:0020( 0004| 0| 0) 00000000 000fffff 1 1
000100c9-i[CPU0 ] | ES:0020( 0004| 0| 0) 00000000 000fffff 1 1
000100c9-i[CPU0 ] | FS:0020( 0004| 0| 0) 00000000 000fffff 1 1
000100c9-i[CPU0 ] | GS:0020( 0004| 0| 0) 00000000 000fffff 1 1
000100c9-i[CPU0 ] | EIP=000100c9 (000100c9)
000100c9-i[CPU0 ] | CR0=0x00000011 CR1=0 CR2=0x00000000
000100c9-i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
000100c9-i[ ] restoring default signal behavior
The return address (in EAX) has the following code:
000100e0 <idle_main>:
100e0: eb fe jmp 100e0 <idle_main>
100e2: 89 f6 mov %esi,%esi
What could possibly have caused that exception?
I think I have setup the IDT properly, but incase it provides any help, here it is:
#define ADDRESS_HI(a) a##hi
#define ADDRESS_LO(a) a##lo
#define ISR_DESC(offset) \
.word ADDRESS_LO(offset) ; \
.word 16; /* XXX bootldr sets*/ \
.byte 0; \
.byte 0x8e; \
.word ADDRESS_HI(offset)
idt: .word (idt_end-idt)
.long (idt)
.word 0
#define DEFINE_ISR(x) ISR_DESC(x)
#include "isr.h"
idt_end:
// isr.h
DEFINE_ISR(_UnhandledISR) // 0 Division by 0
DEFINE_ISR(_UnhandledISR) // 1 Single step
DEFINE_ISR(_UnhandledISR) // 2 NMI
DEFINE_ISR(_UnhandledISR) // 3 Breakpoint
DEFINE_ISR(_UnhandledISR) // 4 Overflow
DEFINE_ISR(_UnhandledISR) // 5 Bound range
DEFINE_ISR(_UnhandledISR) // 6 Invalid Opcode
DEFINE_ISR(_XXXISR) // 7 No coprocessor
DEFINE_ISR(_UnhandledISR) // 8 Double fault
DEFINE_ISR(_UnhandledISR) // 9 Co-pro seg overrun
DEFINE_ISR(_UnhandledISR) // 10 Invalid TSS
DEFINE_ISR(_UnhandledISR) // 11 Seg not presemt
DEFINE_ISR(_UnhandledISR) // 12 Stack exception
DEFINE_ISR(_GpfISR) // 13 Gen Protection Fault
DEFINE_ISR(_UnhandledISR) // 14 Page flt
//DEFINE_ISR(_UnhandledISR) // 15 missed on purpose
DEFINE_ISR(_UnhandledISR) // 16 Co-proc error
DEFINE_ISR(_UnhandledISR) // 17 Arrangement error
DEFINE_ISR(_TrapISR) // 18
Once again any help will be greatly appreciated.
no coprocessor exception (i386) invoked dunno why
Re:no coprocessor exception (i386) invoked dunno why
Hi,
The "device not available exception" (int 7) can only be generated by floating point instructions, which you don't seem to use. This would imply that you're executing code where you shouldn't (e.g. returning to the wrong address), or something else is generating this interrupt (a software interrupt or an IRQ).
If the CPU is running the "jmp 100e0 <idle_main>" infinite loop with interrupts enabled when the int 7 occurs, then my best guess would be that the problem is caused by an IRQ, and that you've setup the PIC chips in a very unusual way (e.g. with IRQ 0 = int 0 and IRQ 7 = int 7, or perhaps IRQ 8 = int 0 and IRQ 15 = int 7). This doesn't sound too likely to me though.
Another problem is that you could be overwriting your code with something else, so that while "objdump" or a disassembler tells you that there's a "jmp 100e0 <idle_main>" instruction at "0x100E0", it may have been accidentally overwritten with something entirely different (for e.g. some data that looks like an FPU instruction).
My first test would be to disable interrupts for that "jmp 100e0 <idle_main>" to see if the task switch does work once (and if things like ESP are correct afterwards). If the machine locks up correctly and everything looks good, then I'd look at IRQ handling. Otherwise I'd use Bochs to single step from the middle of the task switch code to see what is happening.
Cheers,
Brendan
Don't use int 18 for your ISR (your task switch service?) - the first 32 interrupts are reserved by Intel for exceptions, and int 18 is used for the machine check exception.dushara wrote:My switch ISR is int 18. I'm using bochs to do my testing.
The "device not available exception" (int 7) can only be generated by floating point instructions, which you don't seem to use. This would imply that you're executing code where you shouldn't (e.g. returning to the wrong address), or something else is generating this interrupt (a software interrupt or an IRQ).
If the CPU is running the "jmp 100e0 <idle_main>" infinite loop with interrupts enabled when the int 7 occurs, then my best guess would be that the problem is caused by an IRQ, and that you've setup the PIC chips in a very unusual way (e.g. with IRQ 0 = int 0 and IRQ 7 = int 7, or perhaps IRQ 8 = int 0 and IRQ 15 = int 7). This doesn't sound too likely to me though.
Another problem is that you could be overwriting your code with something else, so that while "objdump" or a disassembler tells you that there's a "jmp 100e0 <idle_main>" instruction at "0x100E0", it may have been accidentally overwritten with something entirely different (for e.g. some data that looks like an FPU instruction).
My first test would be to disable interrupts for that "jmp 100e0 <idle_main>" to see if the task switch does work once (and if things like ESP are correct afterwards). If the machine locks up correctly and everything looks good, then I'd look at IRQ handling. Otherwise I'd use Bochs to single step from the middle of the task switch code to see what is happening.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:no coprocessor exception (i386) invoked dunno why
Hi Thanks for you reply.
I have figured out a bit more on what happens (though I don't know why).
It seems this no-coprocessor interrupt is generated when I enable interrupts for the first time.
I had my interrupts disabled from the very beginning and the return from the TrapISR enables interrupts via IRET (for the first time) which triggers the no co-processor int. TrapISR was called directly with int 18 (which I've since changed to 32).
Now at main(), When I enable interrupts for the first time, this no co-processor interrupt is invoked.
I don't know why this happens, I assumed maybe the coprocessor needs initialising and that I'm running in emulation mode or something like that.
For the moment atleast, I'm just returning from the ISR and I'm not locked in some sort of infinite loop.
Once again thanks for you help.
I have figured out a bit more on what happens (though I don't know why).
It seems this no-coprocessor interrupt is generated when I enable interrupts for the first time.
I had my interrupts disabled from the very beginning and the return from the TrapISR enables interrupts via IRET (for the first time) which triggers the no co-processor int. TrapISR was called directly with int 18 (which I've since changed to 32).
Now at main(), When I enable interrupts for the first time, this no co-processor interrupt is invoked.
I don't know why this happens, I assumed maybe the coprocessor needs initialising and that I'm running in emulation mode or something like that.
For the moment atleast, I'm just returning from the ISR and I'm not locked in some sort of infinite loop.
Once again thanks for you help.
Re:no coprocessor exception (i386) invoked dunno why
Hi,
For floating point exceptions, they have to be unmasked in the FPU, and will (depending on the setting of the NE flag in CR0) either generate IRQ 13 (which is mapped to int 0x75 with default/BIOS PIC settings) or an exception 16 (where the PIC chips and the interupt enable/disable flag are ignored).
IMHO it is completely impossible for this "int 7" to be caused by the FPU regardless of what your code does or doesn't do.
It looks to me like after you disable interrupts an IRQ occurs and this IRQ is still present when interrupts are enabled again. This isn't too unusual, but IRQs aren't normally mapped to interrupt 7.
I'm wondering what you do with the PIC chips - do you mask all IRQs when you disable interrupts, and do you reprogram the PICs so that IRQs are mapped to different interrupts?
Cheers,
Brendan
The coprocessor doesn't need initializing unless you try to use it, and even then you'd get a "device not available" exception immediately, regardless of whether interrupts are enabled or not (and only if the FPU isn't present, which is unlikely).dushara wrote:Now at main(), When I enable interrupts for the first time, this no co-processor interrupt is invoked.
I don't know why this happens, I assumed maybe the coprocessor needs initialising and that I'm running in emulation mode or something like that.
For floating point exceptions, they have to be unmasked in the FPU, and will (depending on the setting of the NE flag in CR0) either generate IRQ 13 (which is mapped to int 0x75 with default/BIOS PIC settings) or an exception 16 (where the PIC chips and the interupt enable/disable flag are ignored).
IMHO it is completely impossible for this "int 7" to be caused by the FPU regardless of what your code does or doesn't do.
It looks to me like after you disable interrupts an IRQ occurs and this IRQ is still present when interrupts are enabled again. This isn't too unusual, but IRQs aren't normally mapped to interrupt 7.
I'm wondering what you do with the PIC chips - do you mask all IRQs when you disable interrupts, and do you reprogram the PICs so that IRQs are mapped to different interrupts?
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re:no coprocessor exception (i386) invoked dunno why
i agree, it sounds like mismapped hard-ints
(you can also get a co-processor error if you disabled the hardware FPU -- there is a bit in one of the CRs (don't remember off-hand which one)
however that would still only trigger if you used FPU instructions, so i would say that your PIC has been mapped improperly
(you can also get a co-processor error if you disabled the hardware FPU -- there is a bit in one of the CRs (don't remember off-hand which one)
however that would still only trigger if you used FPU instructions, so i would say that your PIC has been mapped improperly
Re:no coprocessor exception (i386) invoked dunno why
I hadn't touched the PIC. All I do is this:
The bootloader disables ints, enters protected mode and loads the kernel to 0x10000. The kernel initialises the IDT with the table in the first message I posted. Then main calls a function that re-enables interrupts. At this point the no co-processor int is triggered.
I'm wondering whether the fact that I haven't remapped IRQs have something to do with it.
Looks like I should get the bochs source and compile it with debug enabled so I can step through from the beginning....
Once again, thanks everyone for all the info.
The bootloader disables ints, enters protected mode and loads the kernel to 0x10000. The kernel initialises the IDT with the table in the first message I posted. Then main calls a function that re-enables interrupts. At this point the no co-processor int is triggered.
I'm wondering whether the fact that I haven't remapped IRQs have something to do with it.
Looks like I should get the bochs source and compile it with debug enabled so I can step through from the beginning....
Once again, thanks everyone for all the info.
Re:no coprocessor exception (i386) invoked dunno why
You should _not_ be enabling interrupts until you've either remapped the PIC or masked out all IRQ's. I'd say try this first thing. It's only a couple lines of code, no real big deal....Why not try that now?dushara wrote: I hadn't touched the PIC. All I do is this:
The bootloader disables ints, enters protected mode and loads the kernel to 0x10000. The kernel initialises the IDT with the table in the first message I posted. Then main calls a function that re-enables interrupts. At this point the no co-processor int is triggered.
I'm wondering whether the fact that I haven't remapped IRQs have something to do with it.
Looks like I should get the bochs source and compile it with debug enabled so I can step through from the beginning....
Once again, thanks everyone for all the info.
Re:no coprocessor exception (i386) invoked dunno why
Yes! remapping IRQs seems to have fixed it. Stepping though Bochs didn't invoke int 7 when interrupts were enabled.
Thanks.
Thanks.