A triple-fault I can't work out

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

A triple-fault I can't work out

Post by KrnlHckr »

Hi everyone,

I've been pretty quiet on the board for many months. Over the period of my absence, I've been putting a bit of time into playing with my skeleton OS. I'm able to boot it and get it to display a simple, "Hi I'm alive!" message. Thanks to so many of you that helped be get over the GDT hurdle.

I've been poking around with my exception handling and noticed one problem. I've got an #ifdef block written that will allow me to subjectively work on this code through my DEBUG variable in the Makefile. In trying to work out this odd triple fault I've commented out that whole block.

Below, I've included the snips of relevant (hopefully) code. You can see where, in loader.asm, I jump to my main and follow it up with a jmp $. Presumably this will put the processor in a localized loop should main() return for some cornball reason.

In my kernel.c, you'll see where I print some stuff out to screen, and return to the asm stub. Note that I've disabled my DEBUG ifdef. When I run this, the "Testing" message is displayed, followed by my "I GOT HERE" sanity check. Then immediately triple-fault the processor.

My question is why? Shouldn't the jmp $ returned to from main() loop me? I think it might do with stack handling from main back to asm code, but I'm not certain. :?

Here's something even odder... if I uncomment the for( ; ; ) loop in the C code, it still triple-faults! No endless loop in either case. What is going on here?

Thanks (again) for your help!

-Sean

From my loader.asm file:

Code: Select all

start:
        mov     esp, _sys_stack ; points stack to new stack area
        call    main            ; jumps to c kernel
        jmp     $               ; just in case kernel returns
From my kernel.c file:

Code: Select all

        {snip}
        cls();
        settextcolor(WHITE,BLUE);
        puts("Test Kernel 0.1\n");
        puts("---------------\n");

/* exception handling testing */
/* #ifdef DEBUG */
/*      int test; */
/*      test = 1 / 0; */
/*      puts(test); */
/* #endif */

        /* loop forever - do not return to start.asm jmp $ */
        /* commented for testing purposes */
        /* for (;;) ; */

        puts("I GET THIS FAR!\n");
        return -1;
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
User avatar
suthers
Member
Member
Posts: 672
Joined: Tue Feb 20, 2007 3:00 pm
Location: London UK
Contact:

Post by suthers »

Yah i have a similar problem.
I can't do for(;;); or jmp$ I get a triple fault, i just resorted to doing a cli, hlt to stop, the weird thing is that it just happened one day it was working fine, the next, without making any major changes, it didn't.
Anybody know why?
Thanks in advance,


Jules
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Post by Combuster »

where is the stack pointing, at the end (where it should be) or at the beginning of a strip of memory?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
pietro10
Posts: 12
Joined: Wed Feb 13, 2008 10:06 pm

Post by pietro10 »

From the looks, you're putting ESP at the top. The stack grows DOWN, so ESP needs to go at the bottom. There are two ways to do this:

Code: Select all

stacksize EQU 4096
[SECTION .text]
kstart:
MOV EAX, stack
ADD EAX, stacksize
MOV ESP, EAX
[SECTION .bss]
stack: RESB stacksize
or

Code: Select all

[SECTION .text]
kstart:
MOV EAX, stack
MOV ESP, EAX
[SECTION .bss]
RESB 4096
stack:
PGOS. It's my hand-written OS. Deal with it.
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

Post by KrnlHckr »

Here is my link.ld file:

Code: Select all

OUTPUT_FORMAT("elf32-i386")
ENTRY(start)
phys = 0x00100000;
SECTIONS
{
        .text phys : AT(phys)
        {
                code = .;
                *(.text)
                . = ALIGN(4096);
        }
        .data : AT(phys + (data - code))
        {
                data = .;
                *(.data)
                *(.rodata*)
                . = ALIGN(4096);
        }
        .bss : AT(phys + (bss - code))
        {
                bss = .;
                *(.bss)
                *(.COMMON*)
                . = ALIGN(4096);
        }
        end = .;
}
The behavior that I'm seeing is so random. If I run the code to include the for ( ; ; ) loop, it never loops - triple-fault. If I use a return, it triple-faults. If I, return and have a hlt in between the call main and jmp $, triple-fault.

If I use the DEBUG block to test, sometimes it catches the exception. Sometimes it triple-faults. Rinse, repeat. It's like Bochs is on crack.


But a for loop should loop and it is not doing so...
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

Hi,

Firstly, Where is _sys_stack declared? How is it defined?

Secondly, what does bochs output when the triple fault occurs? The initial register dump along with the fault code will help us immensely.

Cheers,

James
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

Post by KrnlHckr »

JamesM wrote:The initial register dump along with the fault code will help us immensely.
Sure thing!

Code: Select all

00041967871i[CPU0 ] CPU is in protected mode (active)
00041967871i[CPU0 ] CS.d_b = 32 bit
00041967871i[CPU0 ] SS.d_b = 32 bit
00041967871i[CPU0 ] EFER   = 0x00000000
00041967871i[CPU0 ] | RAX=0000000000000000  RBX=000000000000000a
00041967871i[CPU0 ] | RCX=0000000041200f69  RDX=0000000000000000
00041967871i[CPU0 ] | RSP=0000000000102ff8  RBP=0000000000067ebc
00041967871i[CPU0 ] | RSI=00000000001000f3  RDI=00000000000b8000
00041967871i[CPU0 ] |  R8=0000000000000000   R9=0000000000000000
00041967871i[CPU0 ] | R10=0000000000000000  R11=0000000000000000
00041967871i[CPU0 ] | R12=0000000000000000  R13=0000000000000000
00041967871i[CPU0 ] | R14=0000000000000000  R15=0000000000000000
00041967871i[CPU0 ] | IOPL=0 id vip vif ac vm RF nt of df IF tf sf zf af PF cf
00041967871i[CPU0 ] | SEG selector     base    limit G D
00041967871i[CPU0 ] | SEG sltr(index|ti|rpl)     base    limit G D
00041967871i[CPU0 ] |  CS:0008( 0001| 0|  0) 00000000 000fffff 1 1
00041967871i[CPU0 ] |  DS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00041967871i[CPU0 ] |  SS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00041967871i[CPU0 ] |  ES:0010( 0002| 0|  0) 00000000 000fffff 1 1
00041967871i[CPU0 ] |  FS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00041967871i[CPU0 ] |  GS:0010( 0002| 0|  0) 00000000 000fffff 1 1
00041967871i[CPU0 ] |  MSR_FS_BASE:0000000000000000
00041967871i[CPU0 ] |  MSR_GS_BASE:0000000000000000
00041967871i[CPU0 ] | RIP=000000000010006e (000000000010006e)
00041967871i[CPU0 ] | CR0=0x00000011 CR1=0x0 CR2=0x0000000000000000
00041967871i[CPU0 ] | CR3=0x00000000 CR4=0x00000000
00041967871i[CPU0 ] >> add edi, eax : 01C7
00041967871e[CPU0 ] exception(): 3rd (13) exception with no resolution, shutdown status is 00h, resetting
00041967871i[SYS  ] bx_pc_system_c::Reset(SOFTWARE) called
00041967871i[CPU0 ] cpu software reset
The loader.asm code is attached. _sys_stack is at the very end, after the 4k reservation in the bss.

This is what I mean about random:

Code: Select all

00031641887i[CPU0 ] >> ret  : C3
00031641887p[CPU0 ] >>PANIC<< exception(): 3rd (13) exception with no resolution

Code: Select all

00022963667i[CPU0 ] >> mov eax, dword ptr ds:[esi] : 8B06
00022963667p[CPU0 ] >>PANIC<< exception(): 3rd (13) exception with no resolution

Code: Select all

00022414411i[CPU0 ] >> mov eax, dword ptr ds:[esi] : 8B06
00022414411p[CPU0 ] >>PANIC<< exception(): 3rd (13) exception with no resolution
Every run seems to choke in different locations, ever though between runs I haven't changed anything.

In the bochs1.png, you can see a "complete" run. The white on blue text is the C code while the white on black is the assembly code. In bochs2.png, you see a subsequent run where there is no visual evidence that the code even ran at all.

FWIW:

Code: Select all

$ gcc34 --version
gcc34 (GCC) 3.4.6 20060404 (Red Hat 3.4.6-4)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ld --version
GNU ld version 2.17.50.0.6-2.el5 20061020
Copyright 2005 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License.  This program has absolutely no warranty.

$ nasm -v
NASM version 0.98.39 compiled on Jul 12 2006

$ bochs -h
00000000000i[APIC?] local apic in  initializing
=====================================================
                       Bochs x86 Emulator 2.3.6
             Build from CVS snapshot, on December 24, 2007
=====================================================

$ make clean
rm -f *.o kernel.bin
$ make
nasm -f elf -o loader.o loader.asm
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o gdt.o gdt.c
gdt.c: In function `gdt_install':
gdt.c:27: warning: assignment makes integer from pointer without a cast
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o idt.o idt.c
idt.c: In function `idt_install':
idt.c:27: warning: passing arg 1 of `memset' from incompatible pointer type
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o isrs.o isrs.c
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o string.o string.c
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o system.o system.c
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o screen.o screen.c
screen.c: In function `scroll':
screen.c:24: warning: passing arg 2 of `memcpy' from incompatible pointer type
gcc34 -Wall -O -fstrength-reduce -fomit-frame-pointer -finline-functions -nostdinc -fno-builtin  -I include  -c -o kernel.o kernel.c
kernel.c: In function `main':
kernel.c:44: warning: control reaches end of non-void function
ld -T link.ld -o kernel.bin loader.o gdt.o idt.o isrs.o string.o system.o screen.o kernel.o
$ 
The warning about main and non-void I know is due to my not including a return from main as needed by the ANSI standard. The warings about pointer types, etc are (I think) due to prototyping and data type assumptions in paramaters.

Regards!
-Sean
Attachments
The C code and ASM code were all called before triple-fault.
The C code and ASM code were all called before triple-fault.
bochs.png (14.62 KiB) Viewed 3076 times
Did anything run at all?
Did anything run at all?
bochs2.png (16.56 KiB) Viewed 3076 times
kernel.tar.gz
Untars into &lt;pwd&gt;/kernel/rm -f *.o kernel.bin

I use a loopback device for fd0 and install the kernel.bin onto it. Grub has already been installed on the floppy image w/ usual commands.
(6.54 KiB) Downloaded 122 times
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

Just plain screwy...

Post by KrnlHckr »

I've been playing around with this now, reworking sections, to try and learn what is causing my problem. I must have a brainpan loaded with bricks.

Code: Select all

00021755432i[CPU  ] >> jmp .+0xfffffffe (0x00100034) : EBFE
00021755432p[CPU  ] >>PANIC<< exception(): 3rd (13) exception with no resolution
This clearly shows that the 'jmp' instruction triple-faults the processor. I checked the code to see where this is, and it's my ASM code:

Code: Select all

start:
        mov     esp, _sys_stack ; points stack to new stack area
        call    main            ; jumps to c kernel
        mov     esi, msg
        call    sprint
loop1:
        jmp     loop1           ; just in case kernel returns
Using 'objdump', I can see that this chunk of code is oh-so straight forward in simplicity:

Code: Select all

00100020 <start>:
  100020:       bc 00 30 10 00          mov    $0x103000,%esp
  100025:       e8 26 0a 00 00          call   100a50 <main>
  10002a:       be 07 02 10 00          mov    $0x100207,%esi
  10002f:       e8 07 00 00 00          call   10003b <sprint>

00100034 <loop1>:
  100034:       eb fe                   jmp    100034 <loop1>
How can a simple 'jmp' cause this problem. Why is a loop not looping?
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
robos
Member
Member
Posts: 33
Joined: Sun Apr 06, 2008 7:04 pm
Location: Southern California

Post by robos »

Are interrupts disabled?

And on your 640k quote in your signature... that's an urban myth. There's no proof he ever said that (try searching on 640k myth)
- Rob
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

Post by KrnlHckr »

robos wrote:Are interrupts disabled?
I disable interrupts to set up the GDT and LDT and the ISRs. After the I re-enable them. In the ISR asm code, each blcok starts out with 'cli', but I don't see anything anywhere where 'sti' is called.
robos wrote:And on your 640k quote in your signature... that's an urban myth. There's no proof he ever said that (try searching on 640k myth)
He can sue me for libel if he's torqued about it. :)
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Post by Brendan »

Hi,
KrnlHckr wrote:
robos wrote:Are interrupts disabled?
I disable interrupts to set up the GDT and LDT and the ISRs. After the I re-enable them. In the ISR asm code, each blcok starts out with 'cli', but I don't see anything anywhere where 'sti' is called.
Then interrupts are probably enabled, and the general protection fault/triple fault is probably caused by an unhandled IRQ.

When the CPU starts an interrupt it pushes CS, EIP and EFLAGS onto the stack, and when you do IRET it pops CS, EIP and EFLAGS off the stack, *including* the previous state of the IF flag in EFLAGS.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

Post by KrnlHckr »

Brendan wrote:Then interrupts are probably enabled, and the general protection fault/triple fault is probably caused by an unhandled IRQ.

When the CPU starts an interrupt it pushes CS, EIP and EFLAGS onto the stack, and when you do IRET it pops CS, EIP and EFLAGS off the stack, *including* the previous state of the IF flag in EFLAGS.
This certainly looks to be the case. I've stepped through the executable with bochs and the triple-fault occurs at EIP=5b100051. When I run 'objdump', I can see:

Code: Select all

00100051 <isr0>:
  100051:       fa                      cli    
  100052:       6a 00                   push   $0x0
  100054:       6a 00                   push   $0x0
  100056:       e9 2a 01 00 00          jmp    100185 <isr_common_stub>
What this suggests to me is that the call to my isr0 handler is causing the fault. My code in kernel.c is deliberately attempting to do a divison by zero and that -does- seem to be working, but it faults at the start of isr0 code.

It's also confusing me why the EIP shows '5b100051' when objdump shows '00100051'. The machine still seems to be able to locate the code in question since the div by 0 causes the jump to isr0 code.

edit: but as I continue the stepping:

Code: Select all

bx_dbg_read_linear: physical memory read error (phy=0x5b100051, lin=0x5b100051
Makes sense since isr0 -is- at 0x00100051. I'm guessing that I've mapped the IDT incorrectly.

Code: Select all

gdtr:base=0x00104006, limit=0x17
idtr:base=0x00104040, limit=0x6ff
:?
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
User avatar
KrnlHckr
Member
Member
Posts: 36
Joined: Tue Jul 17, 2007 9:16 am
Location: Washington, DC Metro Area
Contact:

Post by KrnlHckr »

KrnlHckr wrote:

Code: Select all

bx_dbg_read_linear: physical memory read error (phy=0x5b100051, lin=0x5b100051
Makes sense since isr0 -is- at 0x00100051. I'm guessing that I've mapped the IDT incorrectly.
GOT IT!

Since the IDT tells us what segment the ISR code can be found, I looked at that. Inside my idt.h file was the definition of the idt entries. I had my base_hi declared as an unsigned char and NOT an unsigned short.

DOH!!!

How the hell this code worked on my old laptop is beyond me, but I'm so happy that I got that bug worked out. Thanks everyone!

-sean
"If your code won't run, verify that you are, indeed, using the STABLE branches of your toolchain!" -- KrnlHckr, 2007 :oops:
Post Reply