Really strange problem...
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Really strange problem...
The thing is, bochs tells all of us the following:
- your GDT is messed up. the code segment isn't a code segment
- An interrupt directly causes the triple-fault. the mov reg, constant can not produce an exception, which means that an interrupt fires at that point. It also means no exception handler could be invoked in the process. It does not mean that an exception handler was successfully executed in the past.
- Based on that, your DF handler can not be executed, and at least one other exception handler couldn't be executed. The likely candidates are the GPF and PF handlers.
- A pagefault happened somewhere, maybe as part of the crash.
That means that either the standard code segment is something other than 0x8 (which means your IDT is bogus) or something's wrong with the descriptor (meaning your GDT is bogus)
Instead of saying that certain things works fine (which bochs stubbornly refuses to agree with), TEST, DOUBLECHECK, CONFIRM AND KNOW certain things work fine.
I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.
- your GDT is messed up. the code segment isn't a code segment
- An interrupt directly causes the triple-fault. the mov reg, constant can not produce an exception, which means that an interrupt fires at that point. It also means no exception handler could be invoked in the process. It does not mean that an exception handler was successfully executed in the past.
- Based on that, your DF handler can not be executed, and at least one other exception handler couldn't be executed. The likely candidates are the GPF and PF handlers.
- A pagefault happened somewhere, maybe as part of the crash.
That means that either the standard code segment is something other than 0x8 (which means your IDT is bogus) or something's wrong with the descriptor (meaning your GDT is bogus)
Instead of saying that certain things works fine (which bochs stubbornly refuses to agree with), TEST, DOUBLECHECK, CONFIRM AND KNOW certain things work fine.
I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.
Re: Really strange problem...
Combuster wrote:The thing is, bochs tells all of us the following:
- your GDT is messed up. the code segment isn't a code segment
- An interrupt directly causes the triple-fault. the mov reg, constant can not produce an exception, which means that an interrupt fires at that point. It also means no exception handler could be invoked in the process. It does not mean that an exception handler was successfully executed in the past.
- Based on that, your DF handler can not be executed, and at least one other exception handler couldn't be executed. The likely candidates are the GPF and PF handlers.
- A pagefault happened somewhere, maybe as part of the crash.
That means that either the standard code segment is something other than 0x8 (which means your IDT is bogus) or something's wrong with the descriptor (meaning your GDT is bogus)
Instead of saying that certain things works fine (which bochs stubbornly refuses to agree with), TEST, DOUBLECHECK, CONFIRM AND KNOW certain things work fine.
I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.
Code: Select all
Initializing VFS - Virtual File System
This is IRQ0
Initializing DevFS - Device File System
This is IRQ0
This is IRQ6
However maybe is the pagefault handler that is making wrong. Can a pagefault cause these problems?
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: Really strange problem...
I once had a strange problem with it triple faulting rather than pagefaulting. Turns out that the page fault handler caused a page fault, and then so did the double fault handler.
Re: Really strange problem...
Can we see your interrupt code?
OS-LUX V0.0
Working on...
Memory management: the Pool
Working on...
Memory management: the Pool
Re: Really strange problem...
My interrupt code isn't wrong, because I used it many times, and it worked fine. I think the problem is in my page fault handler.cr2 wrote:Can we see your interrupt code?
But there is the code:
Code: Select all
#define INTERRUPT(name, irq) \
.global _##name ; \
.align 16 ; \
_##name: \
pushl ##irq ; \
jmp COMMON_ISR ; \
.extern default_handler
.align 16
COMMON_ISR:
pushal
pushl %ds
pushl %es
pushl %fs
pushl %gs
movl %esp, %eax
pushl %eax
call default_handler
popl %eax
movl %eax, %esp
popl %gs
popl %fs
popl %es
popl %ds
popal
addl $4, %esp
iretl
INTERRUPT(irq_00, $0x00)
... //All the other IRQs
Code: Select all
void default_handler(unsigned int stack)
{
struct irq_context_t *c = (struct irq_context_t*)&stack;
int cur = c->irq;
...
if(cur == 0 && scheduler_active)
scheduler(&stack);
outportb(0x20, 0x20);
if(cur & 8)
outportb(0x20, 0x20);
printk("This is IRQ%d\n", cur);
}
Code: Select all
void add_idt_entry(void (*handler)(void), int indice, unsigned short opz)
{
idt[indice].offset0_15 = (unsigned int)handler & 0xFFFF;
idt[indice].offset16_31 = (unsigned int)handler >> 16;
idt[indice].segmento = 0x08;
idt[indice].opzioni = opz;
}
void init_interrupt()
{
outportb( ICU0, ICU_RESET );
outportb( ICU1, ICU_RESET );
outportb( ICU0 + 1, 0x20 );
outportb( ICU1 + 1, 0x28 );
outportb( ICU0 + 1, 0x04 );
outportb( ICU1 + 1, 0x02 );
outportb( ICU0 + 1, 0x01 );
outportb( ICU1 + 1, 0x01 );
outportb( ICU0 + 1, 0xFF );
outportb( ICU1 + 1, 0xFF );
irq_mask = 0xFFFF;
add_idt_entry(_exc_00, 0x0, 0x470);
... //All the other exceptions
add_idt_entry(syscall_handler, 0x80, 0x770);
add_idt_entry(_irq_00, 0x20, 0x470);
... //All the other irqs
unsigned int idt_reg[2];
idt_reg[0] = (DIM_IDT*8) << 16;
idt_reg[1] = (unsigned int)idt;
__asm__ __volatile__ ("lidt (%0)": :"g" ((char*)idt_reg+2));
printk("");
}
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: Really strange problem...
Could you show us all the data structures (namely: irq_context, idt,...)?
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Really strange problem...
Again bochs says otherwise. And mainly, two working cases doesn't at all imply that all cases work as expected, and that there are absolutely no bugs left over. Which goes back to my first point: Don't claim something is correct nunless you can prove for all possible cases that it is correct. Bochs gives a counterexample, meaning that the original statement is false.My interrupt code isn't wrong, because I used it many times, and it worked fine.
Will you do your homework instead of just giving us the sources to your OS at the rate of 1 line an hour and expect us to fix your problem. I still want to see the following:
I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.
Re: Really strange problem...
Hi,
Basically what I'm saying is that any simple mistake (like using an uninitialized pointer or a bad array index) could have corrupted the GDT, and everything might work fine before the GDT is corrupted, and everything might work fine for a while after the GDT was corrupted until something causes a segment register load (e.g. an interrupt).
In this case, the wost thing is that the bug could be anywhere - what the CPU is doing when the exception occurs makes no difference.
The only way to find out what's going on is to find out what the GDT contains when the exception occurs - not before. If the GDT has been corrupted then you'd need to figure out what corrupted it, and every piece of code you've written could be causing the problem (except for any CPL=3 code that doesn't have permission to write to the GDT).
Cheers,
Brendan
Not necessarily - Bochs says the IDT and/or GDT is wrong when the exception occurs, but doesn't say the IDT and/or GDT wasn't right some time before this.Combuster wrote:Again bochs says otherwise.My interrupt code isn't wrong, because I used it many times, and it worked fine.
Basically what I'm saying is that any simple mistake (like using an uninitialized pointer or a bad array index) could have corrupted the GDT, and everything might work fine before the GDT is corrupted, and everything might work fine for a while after the GDT was corrupted until something causes a segment register load (e.g. an interrupt).
In this case, the wost thing is that the bug could be anywhere - what the CPU is doing when the exception occurs makes no difference.
The only way to find out what's going on is to find out what the GDT contains when the exception occurs - not before. If the GDT has been corrupted then you'd need to figure out what corrupted it, and every piece of code you've written could be causing the problem (except for any CPL=3 code that doesn't have permission to write to the GDT).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Really strange problem...
IIRC, you can put a watch on a piece of memory (such as the GDT) in Bochs debugger? IN that case, you could find out exactly what code is corrupting the system tables.
Cheers,
Adam
Cheers,
Adam
Re: Really strange problem...
Can you give me a link to something about Bochs debugger? I don't know how to use it.AJ wrote:IIRC, you can put a watch on a piece of memory (such as the GDT) in Bochs debugger? IN that case, you could find out exactly what code is corrupting the system tables.
Cheers,
Adam
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Really strange problem...
STFW. Its one click from bochs' homepage
And my patience ends here as well. You've been warned before about not doing background research. If your next post doesn't show enough improvement I see little choice than to add another one.
And my patience ends here as well. You've been warned before about not doing background research. If your next post doesn't show enough improvement I see little choice than to add another one.
Re: Really strange problem...
cyr1x wrote:Could you show us all the data structures (namely: irq_context, idt,...)?
Code: Select all
struct irq_context_t
{
unsigned int off;
unsigned int gs, fs, es, ds;
unsigned int edi, esi, ebp, esp, ebx, edx, ecx, eax;
unsigned int irq;
unsigned int eip;
unsigned int cs;
unsigned int eflags;
unsigned int uesp;
unsigned int uss;
unsigned int v_ds, v_es, v_fs, v_gs;
} __attribute__ ((packed));
static unsigned short int irq_mask;
#define SLAVE_IRQ 8
#define MASTER_SLAVE 2
#define ICU0 0x20
#define ICU1 0xA0
#define ICU_RESET 0x11
#define DIM_IDT 256
struct idt_t {
unsigned short int offset0_15;
unsigned short int segmento;
unsigned short int riservato : 5;
unsigned short int opzioni : 11;
unsigned short int offset16_31;
};
struct idt_t idt[DIM_IDT];
Code: Select all
...
00017554789e[CPU ] int_trap_gate(): selector null
00017554791e[CPU ] int_trap_gate(): selector null
00017554793e[CPU ] int_trap_gate(): selector null
00017554795e[CPU ] int_trap_gate(): selector null
00017554797e[CPU ] int_trap_gate(): selector null
00017554797e[CPU ] interrupt(): not accessable or not code segment cs=0x0008
00017554797i[CPU ] CPU is in protected mode (active)
00017554797i[CPU ] CS.d_b = 32 bit
00017554797i[CPU ] SS.d_b = 32 bit
00017554797i[CPU ] | EAX=d0000398 EBX=d0000000 ECX=00000086 EDX=00000340
00017554797i[CPU ] | ESP=c000a4ec EBP=c000a604 ESI=00000000 EDI=00000001
00017554797i[CPU ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00017554797i[CPU ] | SEG selector base limit G D
00017554797i[CPU ] | SEG sltr(index|ti|rpl) base limit G D
00017554797i[CPU ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00017554797i[CPU ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017554797i[CPU ] | SS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017554797i[CPU ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017554797i[CPU ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017554797i[CPU ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017554797i[CPU ] | EIP=00000161 (00000161)
00017554797i[CPU ] | CR0=0xe0010011 CR2=0xd0000398
00017554797i[CPU ] | CR3=0x00003000 CR4=0x00000000
Now I'll put a watchpoint also on IDT memory to see if something changes it. Do you know how to put watchpoints on a range of physical addresses? For now I do only 'watch read address' and 'watch write address', so an address at a time.
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: Really strange problem...
Code: Select all
<bochs:256> info gdt
Global Descriptor Table (base=0xc000a4e0, limit=47):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, laddr=00000000, limit=fffff * 4Kbytes, Execute/Read, 32-bit
GDT[0x02]=Data segment, laddr=00000000, limit=fffff * 4Kbytes, Read/Write, Accessed
GDT[0x03]=Code segment, laddr=00000000, limit=fffff * 4Kbytes, Execute/Read, 32-bit
GDT[0x04]=Data segment, laddr=00000000, limit=fffff * 4Kbytes, Read/Write
GDT[0x05]=32-Bit TSS (Busy) at 0x00000000, length 0x00067
<bochs:257> info idt
Interrupt Descriptor Table (base=0xc000a5c0, limit=2048):
IDT[0x00]=32-Bit Interrupt Gate target=0x0008:0xc00000e0, DPL=0
IDT[0x01]=32-Bit Interrupt Gate target=0x0008:0xc00000f0, DPL=0
IDT[0x02]=32-Bit Interrupt Gate target=0x0008:0xc0000100, DPL=0
IDT[0x03]=32-Bit Interrupt Gate target=0x0008:0xc0000110, DPL=0
IDT[0x04]=32-Bit Interrupt Gate target=0x0008:0xc0000120, DPL=0
IDT[0x05]=32-Bit Interrupt Gate target=0x0008:0xc0000130, DPL=0
IDT[0x06]=32-Bit Interrupt Gate target=0x0008:0xc0000140, DPL=0
IDT[0x07]=32-Bit Interrupt Gate target=0x0008:0xc0000150, DPL=0
IDT[0x08]=32-Bit Interrupt Gate target=0x0008:0xc0000160, DPL=0
IDT[0x09]=32-Bit Interrupt Gate target=0x0008:0xc0000170, DPL=0
IDT[0x0a]=32-Bit Interrupt Gate target=0x0008:0xc0000180, DPL=0
IDT[0x0b]=32-Bit Interrupt Gate target=0x0008:0xc0000190, DPL=0
IDT[0x0c]=32-Bit Interrupt Gate target=0x0008:0xc00001a0, DPL=0
IDT[0x0d]=32-Bit Interrupt Gate target=0x0008:0xc00001b0, DPL=0
IDT[0x0e]=32-Bit Interrupt Gate target=0x0008:0xc00001c0, DPL=0
IDT[0x0f]=32-Bit Interrupt Gate target=0x0008:0xc00001d0, DPL=0
IDT[0x10]=32-Bit Interrupt Gate target=0x0008:0xc00001e0, DPL=0 from 0x10 to 0x1f
IDT[0x20]=32-Bit Interrupt Gate target=0x0008:0xc0000210, DPL=0
IDT[0x21]=32-Bit Interrupt Gate target=0x0008:0xc0000220, DPL=0
IDT[0x22]=32-Bit Interrupt Gate target=0x0008:0xc0000230, DPL=0
IDT[0x23]=32-Bit Interrupt Gate target=0x0008:0xc0000240, DPL=0
IDT[0x24]=32-Bit Interrupt Gate target=0x0008:0xc0000250, DPL=0
IDT[0x25]=32-Bit Interrupt Gate target=0x0008:0xc0000260, DPL=0
IDT[0x26]=32-Bit Interrupt Gate target=0x0008:0xc0000270, DPL=0
IDT[0x27]=32-Bit Interrupt Gate target=0x0008:0xc0000280, DPL=0
IDT[0x28]=32-Bit Interrupt Gate target=0x0008:0xc0000290, DPL=0
IDT[0x29]=32-Bit Interrupt Gate target=0x0008:0xc00002a0, DPL=0
IDT[0x2a]=32-Bit Interrupt Gate target=0x0008:0xc00002b0, DPL=0
IDT[0x2b]=32-Bit Interrupt Gate target=0x0008:0xc00002c0, DPL=0
IDT[0x2c]=32-Bit Interrupt Gate target=0x0008:0xc00002d0, DPL=0
IDT[0x2d]=32-Bit Interrupt Gate target=0x0008:0xc00002e0, DPL=0
IDT[0x2e]=32-Bit Interrupt Gate target=0x0008:0xc00002f0, DPL=0
IDT[0x2f]=32-Bit Interrupt Gate target=0x0008:0xc0000300, DPL=0
IDT[0x30]=32-Bit Interrupt Gate target=0x0008:0xc00001e9, DPL=0 from 0x30 to 0xff
<bochs:258> s 10000
Next at t=17971208
(0) [0x00100af1] 0008:c0000af1 (unk. ctxt): jbe .+0x00000045 (0xc0000b38) ; 7645
<bochs:259> s 10000
Next at t=17981208
(0) [0x00100bda] 0008:c0000bda (unk. ctxt): mov eax, dword ptr ss:[ebp+0x3c] ; 8b453c
<bochs:260> s 10000
Next at t=17991208
(0) [0x00104de1] 0008:c0004de1 (unk. ctxt): mov edx, dword ptr ds:[eax] ; 8b10
<bochs:261> s 10000
(0).[17996978] [0x00000161] 0008:00000161 (unk. ctxt): inc dword ptr ds:[eax] ; ff00
Code: Select all
00017996966e[CPU ] int_trap_gate(): selector null
00017996968e[CPU ] int_trap_gate(): selector null
00017996970e[CPU ] int_trap_gate(): selector null
00017996972e[CPU ] int_trap_gate(): selector null
00017996974e[CPU ] int_trap_gate(): selector null
00017996976e[CPU ] int_trap_gate(): selector null
00017996978e[CPU ] int_trap_gate(): selector null
00017996978e[CPU ] interrupt(): not accessable or not code segment cs=0x0008
00017996978i[CPU ] CPU is in protected mode (active)
00017996978i[CPU ] CS.d_b = 32 bit
00017996978i[CPU ] SS.d_b = 32 bit
00017996978i[CPU ] | EAX=d0000398 EBX=d0000000 ECX=00000086 EDX=00000340
00017996978i[CPU ] | ESP=c000a4ec EBP=c000a604 ESI=00000000 EDI=00000001
00017996978i[CPU ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00017996978i[CPU ] | SEG selector base limit G D
00017996978i[CPU ] | SEG sltr(index|ti|rpl) base limit G D
00017996978i[CPU ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00017996978i[CPU ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017996978i[CPU ] | SS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017996978i[CPU ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017996978i[CPU ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017996978i[CPU ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00017996978i[CPU ] | EIP=00000161 (00000161)
00017996978i[CPU ] | CR0=0xe0010011 CR2=0xd0000398
00017996978i[CPU ] | CR3=0x00003000 CR4=0x00000000
00017996978p[CPU ] >>PANIC<< exception(): 3rd (13) exception with no resolution
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Really strange problem...
What about "checking at the faulting instruction" do you not get?
The crash is obvious:
Now we are two full pages further and we've haven't gotten a single step closer to discovering the cause of the problem, and I wonder why
The crash is obvious:
EAX points nowhere, causing a pagefault.EAX=d0000398
CR2=0xd0000398
The IDT entry for the pagefault contains CS=0 causing a GP, but since a GP and a PF don't stack, the processor panics into a doublefaultint_trap_gate(): selector null
The IDT for #DF points to cs=0x008, but the corresponding GDT entry is corrupt, causing the third and final exception.interrupt(): not accessable or not code segment cs=0x0008
>>PANIC<< exception(): 3rd (13) exception with no resolution
Now we are two full pages further and we've haven't gotten a single step closer to discovering the cause of the problem, and I wonder why
Re: Really strange problem...
Ok, so at the fault the IDT is completely wrong. So there is something that changes values in the IDT. I need to put a breakpoint on all the memory occupied by IDT, but i don't know how to do this, because I know only how to put a breakpoint on a single address.Combuster wrote:What about "checking at the faulting instruction" do you not get?
So in your opinion the first cause of the crash is a page fault?Combuster wrote:The crash is obvious:
EAX points nowhere, causing a pagefault.EAX=d0000398
CR2=0xd0000398
Sorry... But I'm very confused... This is the first time I debug my kernel... This is the absolutely first time that I debug something, so I need your help...Combuster wrote:Now we are two full pages further and we've haven't gotten a single step closer to discovering the cause of the problem, and I wonder why
Sorry for stupid questions and similar, but I'm very ignorant about this argument.
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System