Page 2 of 3

Re: Really strange problem...

Posted: Sat Oct 25, 2008 9:18 am
by Combuster
The thing is, bochs tells all of us the following:
- your GDT is messed up. the code segment isn't a code segment
- An interrupt directly causes the triple-fault. the mov reg, constant can not produce an exception, which means that an interrupt fires at that point. It also means no exception handler could be invoked in the process. It does not mean that an exception handler was successfully executed in the past.
- Based on that, your DF handler can not be executed, and at least one other exception handler couldn't be executed. The likely candidates are the GPF and PF handlers.
- A pagefault happened somewhere, maybe as part of the crash.

That means that either the standard code segment is something other than 0x8 (which means your IDT is bogus) or something's wrong with the descriptor (meaning your GDT is bogus)

Instead of saying that certain things works fine (which bochs stubbornly refuses to agree with), TEST, DOUBLECHECK, CONFIRM AND KNOW certain things work fine.

I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.

Re: Really strange problem...

Posted: Sun Oct 26, 2008 5:31 am
by Jeko
Combuster wrote:The thing is, bochs tells all of us the following:
- your GDT is messed up. the code segment isn't a code segment
- An interrupt directly causes the triple-fault. the mov reg, constant can not produce an exception, which means that an interrupt fires at that point. It also means no exception handler could be invoked in the process. It does not mean that an exception handler was successfully executed in the past.
- Based on that, your DF handler can not be executed, and at least one other exception handler couldn't be executed. The likely candidates are the GPF and PF handlers.
- A pagefault happened somewhere, maybe as part of the crash.

That means that either the standard code segment is something other than 0x8 (which means your IDT is bogus) or something's wrong with the descriptor (meaning your GDT is bogus)

Instead of saying that certain things works fine (which bochs stubbornly refuses to agree with), TEST, DOUBLECHECK, CONFIRM AND KNOW certain things work fine.

I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.

Code: Select all

Initializing VFS - Virtual File System
This is IRQ0
Initializing DevFS - Device File System
This is IRQ0
This is IRQ6
Some IRQs work. I've tested with bochs that some IRQs work. If the GDT or the IDT are incorrect, how can some IRQs work?

However maybe is the pagefault handler that is making wrong. Can a pagefault cause these problems?

Re: Really strange problem...

Posted: Sun Oct 26, 2008 5:42 am
by CodeCat
I once had a strange problem with it triple faulting rather than pagefaulting. Turns out that the page fault handler caused a page fault, and then so did the double fault handler.

Re: Really strange problem...

Posted: Sun Oct 26, 2008 11:58 am
by cr2
Can we see your interrupt code?

Re: Really strange problem...

Posted: Sun Oct 26, 2008 2:19 pm
by Jeko
cr2 wrote:Can we see your interrupt code?
My interrupt code isn't wrong, because I used it many times, and it worked fine. I think the problem is in my page fault handler.
But there is the code:

Code: Select all

#define INTERRUPT(name, irq) \
	.global _##name ; \
	.align 16 ; \
_##name: \
	pushl	##irq ; \
	jmp	COMMON_ISR ; \

.extern default_handler
.align 16
COMMON_ISR:
	pushal

	pushl	%ds
	pushl	%es
	pushl	%fs
	pushl	%gs

	movl	%esp, %eax
	pushl	%eax

	call	default_handler

	popl	%eax
	movl	%eax, %esp

	popl	%gs
	popl	%fs
	popl	%es
	popl	%ds

	popal

	addl	$4, %esp

	iretl

INTERRUPT(irq_00, $0x00)
... //All the other IRQs

Code: Select all

void default_handler(unsigned int stack)
{
	struct irq_context_t *c = (struct irq_context_t*)&stack;
	int cur = c->irq;

	...

	if(cur == 0 && scheduler_active)
		scheduler(&stack);

	outportb(0x20, 0x20);
	if(cur & 8)
		outportb(0x20, 0x20);

	printk("This is IRQ%d\n", cur);
}
This code handles the IDT:

Code: Select all

void add_idt_entry(void (*handler)(void), int indice, unsigned short opz)
{
	idt[indice].offset0_15 = (unsigned int)handler & 0xFFFF;
	idt[indice].offset16_31 = (unsigned int)handler >> 16;
	idt[indice].segmento = 0x08;
	idt[indice].opzioni = opz;
}

void init_interrupt()
{
	outportb( ICU0, ICU_RESET );
	outportb( ICU1, ICU_RESET );
	outportb( ICU0 + 1, 0x20 );
	outportb( ICU1 + 1, 0x28 );
	outportb( ICU0 + 1, 0x04 );
	outportb( ICU1 + 1, 0x02 );
	outportb( ICU0 + 1, 0x01 );
	outportb( ICU1 + 1, 0x01 );
	outportb( ICU0 + 1, 0xFF );
	outportb( ICU1 + 1, 0xFF );
	irq_mask = 0xFFFF;

	add_idt_entry(_exc_00, 0x0, 0x470);
        ... //All the other exceptions

	add_idt_entry(syscall_handler, 0x80, 0x770);

	add_idt_entry(_irq_00, 0x20, 0x470);
	... //All the other irqs

	unsigned int idt_reg[2];
	idt_reg[0] = (DIM_IDT*8) << 16;
	idt_reg[1] = (unsigned int)idt;
	__asm__ __volatile__ ("lidt (%0)": :"g" ((char*)idt_reg+2));

	printk("");
}

Re: Really strange problem...

Posted: Mon Oct 27, 2008 2:10 am
by cyr1x
Could you show us all the data structures (namely: irq_context, idt,...)?

Re: Really strange problem...

Posted: Mon Oct 27, 2008 4:41 am
by Combuster
My interrupt code isn't wrong, because I used it many times, and it worked fine.
Again bochs says otherwise. And mainly, two working cases doesn't at all imply that all cases work as expected, and that there are absolutely no bugs left over. Which goes back to my first point: Don't claim something is correct nunless you can prove for all possible cases that it is correct. Bochs gives a counterexample, meaning that the original statement is false.

Will you do your homework instead of just giving us the sources to your OS at the rate of 1 line an hour and expect us to fix your problem. I still want to see the following:
I want proof of you using the bochs debugger that at the faulting instruction the IDT is correct, and the GDT is correct (and by executing it, the computer crashes) since that is pretty much what you claim.

Re: Really strange problem...

Posted: Mon Oct 27, 2008 5:53 am
by Brendan
Hi,
Combuster wrote:
My interrupt code isn't wrong, because I used it many times, and it worked fine.
Again bochs says otherwise.
Not necessarily - Bochs says the IDT and/or GDT is wrong when the exception occurs, but doesn't say the IDT and/or GDT wasn't right some time before this.

Basically what I'm saying is that any simple mistake (like using an uninitialized pointer or a bad array index) could have corrupted the GDT, and everything might work fine before the GDT is corrupted, and everything might work fine for a while after the GDT was corrupted until something causes a segment register load (e.g. an interrupt).

In this case, the wost thing is that the bug could be anywhere - what the CPU is doing when the exception occurs makes no difference.

The only way to find out what's going on is to find out what the GDT contains when the exception occurs - not before. If the GDT has been corrupted then you'd need to figure out what corrupted it, and every piece of code you've written could be causing the problem (except for any CPL=3 code that doesn't have permission to write to the GDT).


Cheers,

Brendan

Re: Really strange problem...

Posted: Mon Oct 27, 2008 6:41 am
by AJ
IIRC, you can put a watch on a piece of memory (such as the GDT) in Bochs debugger? IN that case, you could find out exactly what code is corrupting the system tables.

Cheers,
Adam

Re: Really strange problem...

Posted: Mon Oct 27, 2008 3:01 pm
by Jeko
AJ wrote:IIRC, you can put a watch on a piece of memory (such as the GDT) in Bochs debugger? IN that case, you could find out exactly what code is corrupting the system tables.

Cheers,
Adam
Can you give me a link to something about Bochs debugger? I don't know how to use it.

Re: Really strange problem...

Posted: Mon Oct 27, 2008 3:26 pm
by Combuster
STFW. Its one click from bochs' homepage :evil:

And my patience ends here as well. You've been warned before about not doing background research. If your next post doesn't show enough improvement I see little choice than to add another one.

Re: Really strange problem...

Posted: Tue Oct 28, 2008 7:57 am
by Jeko
cyr1x wrote:Could you show us all the data structures (namely: irq_context, idt,...)?

Code: Select all

struct irq_context_t
{
	unsigned int off;

	unsigned int	gs, fs, es, ds;

	unsigned int	edi, esi, ebp, esp, ebx, edx, ecx, eax;

	unsigned int	irq;

	unsigned int	eip;

	unsigned int	cs;

	unsigned int	eflags;

	unsigned int	uesp;

	unsigned int	uss;

	unsigned int v_ds, v_es, v_fs, v_gs;
} __attribute__ ((packed));

static unsigned short int irq_mask;

#define SLAVE_IRQ 8
#define MASTER_SLAVE 2
#define ICU0 0x20
#define ICU1 0xA0
#define ICU_RESET 0x11

#define DIM_IDT 256

struct idt_t {
	unsigned short int offset0_15;
	unsigned short int segmento;
	unsigned short int riservato : 5;
	unsigned short int opzioni : 11;
	unsigned short int offset16_31;
};

struct idt_t idt[DIM_IDT];
However now the error is:

Code: Select all

...
00017554789e[CPU  ] int_trap_gate(): selector null
00017554791e[CPU  ] int_trap_gate(): selector null
00017554793e[CPU  ] int_trap_gate(): selector null
00017554795e[CPU  ] int_trap_gate(): selector null
00017554797e[CPU  ] int_trap_gate(): selector null
00017554797e[CPU  ] interrupt(): not accessable or not code segment cs=0x0008
00017554797i[CPU  ] CPU is in protected mode (active)
00017554797i[CPU  ] CS.d_b = 32 bit
00017554797i[CPU  ] SS.d_b = 32 bit
00017554797i[CPU  ] | EAX=d0000398  EBX=d0000000  ECX=00000086  EDX=00000340
00017554797i[CPU  ] | ESP=c000a4ec  EBP=c000a604  ESI=00000000  EDI=00000001
00017554797i[CPU  ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00017554797i[CPU  ] | SEG selector     base    limit G D
00017554797i[CPU  ] | SEG sltr(index|ti|rpl)     base    limit G D
00017554797i[CPU  ] |  CS:0008( 0001| 0|  0) 00000000 000fffff 1 1
00017554797i[CPU  ] |  DS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017554797i[CPU  ] |  SS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017554797i[CPU  ] |  ES:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017554797i[CPU  ] |  FS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017554797i[CPU  ] |  GS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017554797i[CPU  ] | EIP=00000161 (00000161)
00017554797i[CPU  ] | CR0=0xe0010011 CR2=0xd0000398
00017554797i[CPU  ] | CR3=0x00003000 CR4=0x00000000
I've put a watch on gdt memory, and it is accessed (read and written) only when the OS is loaded by GRUB and when the OS initializes the GDT, so the GDT is correct.

Now I'll put a watchpoint also on IDT memory to see if something changes it. Do you know how to put watchpoints on a range of physical addresses? For now I do only 'watch read address' and 'watch write address', so an address at a time.

Re: Really strange problem...

Posted: Tue Oct 28, 2008 3:45 pm
by Jeko

Code: Select all

<bochs:256> info gdt
Global Descriptor Table (base=0xc000a4e0, limit=47):
GDT[0x00]=??? descriptor hi=0x00000000, lo=0x00000000
GDT[0x01]=Code segment, laddr=00000000, limit=fffff * 4Kbytes, Execute/Read, 32-bit
GDT[0x02]=Data segment, laddr=00000000, limit=fffff * 4Kbytes, Read/Write, Accessed
GDT[0x03]=Code segment, laddr=00000000, limit=fffff * 4Kbytes, Execute/Read, 32-bit
GDT[0x04]=Data segment, laddr=00000000, limit=fffff * 4Kbytes, Read/Write
GDT[0x05]=32-Bit TSS (Busy) at 0x00000000, length 0x00067
<bochs:257> info idt
Interrupt Descriptor Table (base=0xc000a5c0, limit=2048):
IDT[0x00]=32-Bit Interrupt Gate target=0x0008:0xc00000e0, DPL=0
IDT[0x01]=32-Bit Interrupt Gate target=0x0008:0xc00000f0, DPL=0
IDT[0x02]=32-Bit Interrupt Gate target=0x0008:0xc0000100, DPL=0
IDT[0x03]=32-Bit Interrupt Gate target=0x0008:0xc0000110, DPL=0
IDT[0x04]=32-Bit Interrupt Gate target=0x0008:0xc0000120, DPL=0
IDT[0x05]=32-Bit Interrupt Gate target=0x0008:0xc0000130, DPL=0
IDT[0x06]=32-Bit Interrupt Gate target=0x0008:0xc0000140, DPL=0
IDT[0x07]=32-Bit Interrupt Gate target=0x0008:0xc0000150, DPL=0
IDT[0x08]=32-Bit Interrupt Gate target=0x0008:0xc0000160, DPL=0
IDT[0x09]=32-Bit Interrupt Gate target=0x0008:0xc0000170, DPL=0
IDT[0x0a]=32-Bit Interrupt Gate target=0x0008:0xc0000180, DPL=0
IDT[0x0b]=32-Bit Interrupt Gate target=0x0008:0xc0000190, DPL=0
IDT[0x0c]=32-Bit Interrupt Gate target=0x0008:0xc00001a0, DPL=0
IDT[0x0d]=32-Bit Interrupt Gate target=0x0008:0xc00001b0, DPL=0
IDT[0x0e]=32-Bit Interrupt Gate target=0x0008:0xc00001c0, DPL=0
IDT[0x0f]=32-Bit Interrupt Gate target=0x0008:0xc00001d0, DPL=0
IDT[0x10]=32-Bit Interrupt Gate target=0x0008:0xc00001e0, DPL=0 from 0x10 to 0x1f
IDT[0x20]=32-Bit Interrupt Gate target=0x0008:0xc0000210, DPL=0
IDT[0x21]=32-Bit Interrupt Gate target=0x0008:0xc0000220, DPL=0
IDT[0x22]=32-Bit Interrupt Gate target=0x0008:0xc0000230, DPL=0
IDT[0x23]=32-Bit Interrupt Gate target=0x0008:0xc0000240, DPL=0
IDT[0x24]=32-Bit Interrupt Gate target=0x0008:0xc0000250, DPL=0
IDT[0x25]=32-Bit Interrupt Gate target=0x0008:0xc0000260, DPL=0
IDT[0x26]=32-Bit Interrupt Gate target=0x0008:0xc0000270, DPL=0
IDT[0x27]=32-Bit Interrupt Gate target=0x0008:0xc0000280, DPL=0
IDT[0x28]=32-Bit Interrupt Gate target=0x0008:0xc0000290, DPL=0
IDT[0x29]=32-Bit Interrupt Gate target=0x0008:0xc00002a0, DPL=0
IDT[0x2a]=32-Bit Interrupt Gate target=0x0008:0xc00002b0, DPL=0
IDT[0x2b]=32-Bit Interrupt Gate target=0x0008:0xc00002c0, DPL=0
IDT[0x2c]=32-Bit Interrupt Gate target=0x0008:0xc00002d0, DPL=0
IDT[0x2d]=32-Bit Interrupt Gate target=0x0008:0xc00002e0, DPL=0
IDT[0x2e]=32-Bit Interrupt Gate target=0x0008:0xc00002f0, DPL=0
IDT[0x2f]=32-Bit Interrupt Gate target=0x0008:0xc0000300, DPL=0
IDT[0x30]=32-Bit Interrupt Gate target=0x0008:0xc00001e9, DPL=0 from 0x30 to 0xff
<bochs:258> s 10000
Next at t=17971208
(0) [0x00100af1] 0008:c0000af1 (unk. ctxt): jbe .+0x00000045 (0xc0000b38) ; 7645
<bochs:259> s 10000
Next at t=17981208
(0) [0x00100bda] 0008:c0000bda (unk. ctxt): mov eax, dword ptr ss:[ebp+0x3c] ; 8b453c
<bochs:260> s 10000
Next at t=17991208
(0) [0x00104de1] 0008:c0004de1 (unk. ctxt): mov edx, dword ptr ds:[eax] ; 8b10
<bochs:261> s 10000
(0).[17996978] [0x00000161] 0008:00000161 (unk. ctxt): inc dword ptr ds:[eax]    ; ff00

The error is the same:

Code: Select all

00017996966e[CPU  ] int_trap_gate(): selector null
00017996968e[CPU  ] int_trap_gate(): selector null
00017996970e[CPU  ] int_trap_gate(): selector null
00017996972e[CPU  ] int_trap_gate(): selector null
00017996974e[CPU  ] int_trap_gate(): selector null
00017996976e[CPU  ] int_trap_gate(): selector null
00017996978e[CPU  ] int_trap_gate(): selector null
00017996978e[CPU  ] interrupt(): not accessable or not code segment cs=0x0008
00017996978i[CPU  ] CPU is in protected mode (active)
00017996978i[CPU  ] CS.d_b = 32 bit
00017996978i[CPU  ] SS.d_b = 32 bit
00017996978i[CPU  ] | EAX=d0000398  EBX=d0000000  ECX=00000086  EDX=00000340
00017996978i[CPU  ] | ESP=c000a4ec  EBP=c000a604  ESI=00000000  EDI=00000001
00017996978i[CPU  ] | IOPL=0 id vip vif ac vm RF nt of df if tf SF zf af PF cf
00017996978i[CPU  ] | SEG selector     base    limit G D
00017996978i[CPU  ] | SEG sltr(index|ti|rpl)     base    limit G D
00017996978i[CPU  ] |  CS:0008( 0001| 0|  0) 00000000 000fffff 1 1
00017996978i[CPU  ] |  DS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017996978i[CPU  ] |  SS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017996978i[CPU  ] |  ES:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017996978i[CPU  ] |  FS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017996978i[CPU  ] |  GS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00017996978i[CPU  ] | EIP=00000161 (00000161)
00017996978i[CPU  ] | CR0=0xe0010011 CR2=0xd0000398
00017996978i[CPU  ] | CR3=0x00003000 CR4=0x00000000
00017996978p[CPU  ] >>PANIC<< exception(): 3rd (13) exception with no resolution

Re: Really strange problem...

Posted: Tue Oct 28, 2008 4:05 pm
by Combuster
What about "checking at the faulting instruction" do you not get? :-({|=

The crash is obvious:
EAX=d0000398
CR2=0xd0000398
EAX points nowhere, causing a pagefault.
int_trap_gate(): selector null
The IDT entry for the pagefault contains CS=0 causing a GP, but since a GP and a PF don't stack, the processor panics into a doublefault
interrupt(): not accessable or not code segment cs=0x0008
>>PANIC<< exception(): 3rd (13) exception with no resolution
The IDT for #DF points to cs=0x008, but the corresponding GDT entry is corrupt, causing the third and final exception.

Now we are two full pages further and we've haven't gotten a single step closer to discovering the cause of the problem, and I wonder why :evil:

Re: Really strange problem...

Posted: Tue Oct 28, 2008 4:40 pm
by Jeko
Combuster wrote:What about "checking at the faulting instruction" do you not get? :-({|=
Ok, so at the fault the IDT is completely wrong. So there is something that changes values in the IDT. I need to put a breakpoint on all the memory occupied by IDT, but i don't know how to do this, because I know only how to put a breakpoint on a single address.
Combuster wrote:The crash is obvious:
EAX=d0000398
CR2=0xd0000398
EAX points nowhere, causing a pagefault.
So in your opinion the first cause of the crash is a page fault?
Combuster wrote:Now we are two full pages further and we've haven't gotten a single step closer to discovering the cause of the problem, and I wonder why :evil:
Sorry... But I'm very confused... This is the first time I debug my kernel... This is the absolutely first time that I debug something, so I need your help...
Sorry for stupid questions and similar, but I'm very ignorant about this argument.