(Fixed) Array Triple Fault

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

simeonz wrote:
Octacone wrote:Is this a joke!?

This actually doesn't crash:

Code: Select all

command_t command_list[10000]; //but when set to 10 does
In this manner, if you access the first element, you are accessing some location 1MB below the stack. This will jump over the unmapped memory gap and into some other memory, which in your case appears to be mapped. It is still buggy, but doesn't trigger crash immediately. If you enable "-fstack-check", it should crash. The option tells the compiler to probe all the allocated stack pages, whenever the stack frame becomes too big. (This switch is useful to prevent user mode exploits, where a function with very big frame can be used to jump over the guard page of the stack into heap territory.)

The "-fstack-usage" output from the previous discussion is suspicious somehow, because even 10 pointers should be 80 bytes. Is this array statically or stack allocated.
I enabled -fstack-check and it didn't crash. Really suspicious.

@LtG
What exactly to look for? I managed to disassemble all the code and .rodata .text tables.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Undefined Array Triple Fault

Post by LtG »

Post your linker.ld and the disassembled code that is responsible for the triple-fault. Or in the case of 10 vs 10k check for the difference in the disassembly and post that too. You could also post the relevant pieces of readelf or if it's not too big maybe "readelf -a" to post all of it.

Obviously, you are using a proper cross-compiler, right?
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

LtG wrote:Post your linker.ld and the disassembled code that is responsible for the triple-fault. Or in the case of 10 vs 10k check for the difference in the disassembly and post that too. You could also post the relevant pieces of readelf or if it's not too big maybe "readelf -a" to post all of it.

Obviously, you are using a proper cross-compiler, right?
Sure, I am using a proper cross-compiler.

Here is my Linker.ld:

Code: Select all

ENTRY(Bootloader_Main)
OUTPUT_FORMAT(elf32-i386)

SECTIONS
{
	. = 1M;
	kernel_start = .; _kernel_start = .; __kernel_start = .;
	.text BLOCK(4K) : ALIGN(4K)
	{
		*(.multiboot)
		*(.text)
		*(.rodata)
	}
	.data BLOCK(4K) : ALIGN(4K)
	{
		start_constructors  = .;
		*(.ctor*)
      KEEP(*(.init_array));
      KEEP(*(SORT_BY_INIT_PRIORITY(.init_array.*)));
      end_constructors = .;
		*(.data)
	}
	.bss BLOCK(4K) : ALIGN(4K)
	{
		*(COMMON)
		*(.bss)
	}
	kernel_end = .; _kernel_end = .; __kernel_end = .;
	/DISCARD/ : 
	{ 
		*(.fini_array*)
		*(.comment)
	}
}
I don't know about posting 500000 lines of code in here. That is kind of a lot.
Since command_list variable is causing this, let me post that maybe?

Code: Select all

 <3><1723>: Abbrev Number: 7 (DW_TAG_member)
    <1724>   DW_AT_name        : (indirect string, offset: 0xbb2): command_list
    <1728>   DW_AT_decl_file   : 21
    <1729>   DW_AT_decl_line   : 23
    <172a>   DW_AT_type        : <0x1b9d>
    <172e>   DW_AT_data_member_location: 0
 <3><8f28>: Abbrev Number: 7 (DW_TAG_member)
    <8f29>   DW_AT_name        : (indirect string, offset: 0xbb2): command_list
    <8f2d>   DW_AT_decl_file   : 7
    <8f2e>   DW_AT_decl_line   : 23
    <8f2f>   DW_AT_type        : <0x93c9>
    <8f33>   DW_AT_data_member_location: 0
 <3><d1c8>: Abbrev Number: 7 (DW_TAG_member)
    <d1c9>   DW_AT_name        : (indirect string, offset: 0xbb2): command_list
    <d1cd>   DW_AT_decl_file   : 8
    <d1ce>   DW_AT_decl_line   : 23
    <d1cf>   DW_AT_type        : <0xd3c5>
    <d1d3>   DW_AT_data_member_location: 0
This is from read elf -w -s don't quite remember.
Read elf reports no read only data, what?
What is the exact command you want me to run?
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Undefined Array Triple Fault

Post by LtG »

Readelf -a produces 500k lines? or 500?

I was mainly interested in the section and program headers (-a prints them among the first things). And I was interested to see the difference between the triple-fault 10 vs non-triple-fault 10k.

Also I'm a bit unclear as to when the triple-fault occurs, how far into your code can you let it progress before you get the triple-fault?

I think you still didn't mention if the allocation for the array is global of local to some function? Since grub is loading your kernel how does your entry point work? Do you have some assembly first which then calls some C function? If so, then certainly you can breakpoint before the C function, right?

Since you got gdb working, what exactly causes the triple-fault and just prior to the triple-fault what is the state of your IDT (checking IDTR -> relevant page tables and of course double check that CR3 has what you think it should have).
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

LtG wrote:Readelf -a produces 500k lines? or 500?

I was mainly interested in the section and program headers (-a prints them among the first things). And I was interested to see the difference between the triple-fault 10 vs non-triple-fault 10k.

Also I'm a bit unclear as to when the triple-fault occurs, how far into your code can you let it progress before you get the triple-fault?

I think you still didn't mention if the allocation for the array is global of local to some function? Since grub is loading your kernel how does your entry point work? Do you have some assembly first which then calls some C function? If so, then certainly you can breakpoint before the C function, right?

Since you got gdb working, what exactly causes the triple-fault and just prior to the triple-fault what is the state of your IDT (checking IDTR -> relevant page tables and of course double check that CR3 has what you think it should have).
Oh, here they are: (symbol tables not shown, too big, if you need them, just say)
10:
https://pastebin.com/x7YBkQK0
10000:
https://pastebin.com/cgrxbiyh

It is local to Shell_Class, private.
Yes I do have some assembly code before calling the main C code. I can break it wherever I want, not a problem.
I can go as far as the moment before calling Shell.Initialize(); then it crashes and reboots.
The thing is, no shell code -> everything works perfectly fine, paging works like a charm.
I will try to acquire some additional IDT info and those other things.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Undefined Array Triple Fault

Post by LtG »

I don't have much time now, I'll take a look at those later..

You can breakpoint just prior to Shell.Initialize(), then check CR3 (and relevant paging structures) as well as IDTR (and where that virtual address points, to see that your IDT is in fact valid), and then single step until the triple-fault.

I would probably start with the break-point in gdb set to Shell.Initialize(), single step until triple-fault so I'd know exactly what assembly instruction causes the triple-fault, and then do the same again but stop just prior to that last instruction and check the CR3, IDTR and paging structures. At this point they have to be wrong or it's that last instruction that makes them wrong. And from there it should be relatively simple to back trace to find the code that does the wrong thing.

You'll want to instruct gdb to always print the next assembly instruction while single stepping, you can look it up, can't remember exactly what it was. I think it was "display/i $pc" or something similar. If you want, you can copy/paste your gdb debugging session here, the relevant thing is to show contents of registers (especially CR3, IDTR, GDTR and segment registers) as well as the contents for IDT and the relevant paging structures.
User avatar
Ch4ozz
Member
Member
Posts: 170
Joined: Mon Jul 18, 2016 2:46 pm
Libera.chat IRC: esi

Re: Undefined Array Triple Fault

Post by Ch4ozz »

I dont know how often I already posted this but:
1. Get your system to crash
2. Take the qemu.log and look at the addresses (mainly CR3)
3. Open your binary in any disassembler (I'd suggest IDA because its great)
4. Go to the address and you will instantly see why this crash happens even with a minimum of assembly knowledge (aka common sense)
User avatar
dozniak
Member
Member
Posts: 723
Joined: Thu Jul 12, 2012 7:29 am
Location: Tallinn, Estonia

Re: Undefined Array Triple Fault

Post by dozniak »

So, you have 10 items in the array, how do you access it?

Do you iterate from 0 to 9, or maybe from 0 to 10?
Learn to read.
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

LtG wrote:I don't have much time now, I'll take a look at those later..

You can breakpoint just prior to Shell.Initialize(), then check CR3 (and relevant paging structures) as well as IDTR (and where that virtual address points, to see that your IDT is in fact valid), and then single step until the triple-fault.

I would probably start with the break-point in gdb set to Shell.Initialize(), single step until triple-fault so I'd know exactly what assembly instruction causes the triple-fault, and then do the same again but stop just prior to that last instruction and check the CR3, IDTR and paging structures. At this point they have to be wrong or it's that last instruction that makes them wrong. And from there it should be relatively simple to back trace to find the code that does the wrong thing.

You'll want to instruct gdb to always print the next assembly instruction while single stepping, you can look it up, can't remember exactly what it was. I think it was "display/i $pc" or something similar. If you want, you can copy/paste your gdb debugging session here, the relevant thing is to show contents of registers (especially CR3, IDTR, GDTR and segment registers) as well as the contents for IDT and the relevant paging structures.
There is a slight issue...
In order for GDB to work I have to disable optimizations. Which means broken paging for some reason, it works with optimizations enabled. So I can't do anything else.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

Ch4ozz wrote:I dont know how often I already posted this but:
1. Get your system to crash
2. Take the qemu.log and look at the addresses (mainly CR3)
3. Open your binary in any disassembler (I'd suggest IDA because its great)
4. Go to the address and you will instantly see why this crash happens even with a minimum of assembly knowledge (aka common sense)
Here it is:
With -O2 enabled, faulty shell line also enabled

Code: Select all

Triple fault
CPU Reset (CPU 0)
EAX=00000019 EBX=000003e8 ECX=0000096c EDX=000003d5
ESI=00000000 EDI=00000000 EBP=00000000 ESP=0011cfe8
EIP=00100b81 EFL=00200206 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT=     0011f5c6 00000017
IDT=     0011edc0 000007ff
CR0=80000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000004 CCD=0011cfe8 CCO=EFLAGS  
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
Can't install IDA, still looking into that.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

dozniak wrote:So, you have 10 items in the array, how do you access it?

Do you iterate from 0 to 9, or maybe from 0 to 10?
Never had a chance to try. But anyways (command_t* command_list[10]) -> command_list[1].something = something -> fault.
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
User avatar
eryjus
Member
Member
Posts: 286
Joined: Fri Oct 21, 2011 9:47 pm
Libera.chat IRC: eryjus
Location: Tustin, CA USA

Re: Undefined Array Triple Fault

Post by eryjus »

A couple of observations here:

* Paging is enabled
* The address of the PD in CR3 is 0x00000000 (confirm this is correct)
* If this is a page fault, the faulting address (CR2) is 0x00000000 (confirm this address is identity mapped)
* ESI, EDI, EBP are all 0x00000000
* EIP is 00100b81, when you disassemble the object, what is happening at that address? (Are ESI, EDI, or EBP used?)
* Null pointer assignment/reference?
Adam

The name is fitting: Century Hobby OS -- At this rate, it's gonna take me that long!
Read about my mistakes and missteps with this iteration: Journal

"Sometimes things just don't make sense until you figure them out." -- Phil Stahlheber
User avatar
Octacone
Member
Member
Posts: 1138
Joined: Fri Aug 07, 2015 6:13 am

Re: Undefined Array Triple Fault

Post by Octacone »

eryjus wrote:A couple of observations here:

* Paging is enabled
* The address of the PD in CR3 is 0x00000000 (confirm this is correct)
* If this is a page fault, the faulting address (CR2) is 0x00000000 (confirm this address is identity mapped)
* ESI, EDI, EBP are all 0x00000000
* EIP is 00100b81, when you disassemble the object, what is happening at that address? (Are ESI, EDI, or EBP used?)
* Null pointer assignment/reference?
I was about to post that. When I checked the address of the PD (page_directory->physical_page_tables) it was 0, like wow!
Why would that be 0? I am for sure allocating it correctly so PD (page_directory_t* page_directory) returns 0x11E050 (correct, expected), but page_directory->physical_pages_tables returns 0x0.
That is soooo suspicious. Also this data was gathered with optimizations enabled and shell disabled which means no triple fault or anything.
I hope that replacing every NULL with 0 inside the LibAlloc had nothing to do with it. (but why would you do that? c++ casting errors bypass method).

Disabling the optimizations revealed: //With -O1/2/3/4/5 everything is mapped and working fine

Code: Select all

(0).[7374714438] ??? (physical address not available)
(0).[7374714439] ??? (physical address not available)
bx_dbg_read_linear: physical address not available for linear 0x0000000000100a3b
00100b81 disassembly (shell with -O2)

Code: Select all

Does not exist when disassembled. 
00100a3b disassembly (no -O2 no shell)

Code: Select all

Does not exist when disassembled
Edit: The only way I can get it to work without crashing is with -O2 and shell disabled. Then everything is mapped correctly but page_directory->physical_page_tables still returns 0x0.

Edit 2: -O2 enabled shell enabled -> VMM.Initialize(); commented out but it is not Paging_Enable what causes it, so it is something inside the initialization chunk that overwrites something.
Even even even stranger:
uint32_t page_directory_address = PMM.Allocate_Blocks(sizeof(page_directory_t) / 4096); ---->>>> this returns 0x11D000
then
page_directory = (page_directory_t*) page_directory_address;
so TUI.Put_Hex((uint32_t) & page_directory, 0x0E); should return 0x11D000 right? No! It returns 0x11E050. Stuff likely getting overwritten. Right?

Edit 3: So the main question is: why would paging work perfectly with -O2 enabled and crash without -O2? Am I hunting a bug that does not exist? Likely a compiler bug?
Also this is how my page_directory looks like:

Code: Select all

typedef struct page_directory_t
{
	page_directory_entry_t physical_page_tables[1024];
	page_table_t* virtual_page_tables[1024];
}page_directory_t;

page_directory_t* page_directory;
This is how I allocate it:

Code: Select all

uint32_t page_directory_address = PMM.Allocate_Blocks(sizeof(page_directory_t) / 4096);
page_directory = (page_directory_t*) page_directory_address;
Then why would a damn compiler put page_directory->physical_page_tables at 0x0? Also don't worry about virtual_page_tables they are all perfectly allocated (I checked).
OS: Basic OS
About: 32 Bit Monolithic Kernel Written in C++ and Assembly, Custom FAT 32 Bootloader
LtG
Member
Member
Posts: 384
Joined: Thu Aug 13, 2015 4:57 pm

Re: Undefined Array Triple Fault

Post by LtG »

Octacone wrote: page_directory = (page_directory_t*) page_directory_address;
so TUI.Put_Hex((uint32_t) & page_directory, 0x0E); should return 0x11D000 right? No! It returns 0x11E050. Stuff likely getting overwritten. Right?
page_directory is a pointer, would you want to print that, not the _address_ of the pointer (the ampersand "&" in the second line).

Also I don't really get the point in your page_directory_t, why are there some virtual page tables? Also, I think customarily the "typedef struct X {..} X_t" stuff has the _t only on the latter "name", not the first one (X vs X_t)..
User avatar
eryjus
Member
Member
Posts: 286
Joined: Fri Oct 21, 2011 9:47 pm
Libera.chat IRC: eryjus
Location: Tustin, CA USA

Re: Undefined Array Triple Fault

Post by eryjus »

OK, so what to we know? I mean, really know and not assume to be true.

We know that cr3 has the value 0 in it when the system triple faults. I assume that it is written correctly initially (and the triple fault does not happen on the instruction immediately following setting the cr3 register on purpose).

Now, if you are setting the cr3 on purpose only once in your code, then you are executing some other thing that is setting is not on purpose. What might that be?

* You are executing data
* You are overwriting code
* You have several pages mapped to the same frame (my bet is on this one right now)

In particular, you are going to have a very big uphill battle to convince many people you found a compiler bug. Trust me on this (personal experience), even when you have convinced yourself 47 different ways you have a bug you probably don't. Let's assume there isn't a compiler bug.

So, how do we go about determining the real reason for the failure? Guessing at the cause and commenting out some code is not the optimal way to get the to root cause.

I would recommend you use something like `i686-elf-objdump -d kernel.elf` to compare your registers at crash with the line of assembly in EIP. Then scroll up until you can determine which function that is. Then go and review your C++ function to determine what line it is failing on and in particular which part of that line it is. Then, what are you assuming to be true with that line/state and are those valid assumptions?

I'm not trying to call you out, but I would not assume that your paging code is perfect yet (http://forum.osdev.org/viewtopic.php?f= ... 36&start=0). Remember, this is all still new and you might still have a bug buried deep in this code. My point is that you will want to disassemble what is in memory at the address of EIP at the time of the crash and compare it to what was in the output of *-objdump -- do they match? If not, find out why not. Verify your paging structures. You already indicate that this might be a problem, so it might be worthwhile to follow this line of thinking.
Adam

The name is fitting: Century Hobby OS -- At this rate, it's gonna take me that long!
Read about my mistakes and missteps with this iteration: Journal

"Sometimes things just don't make sense until you figure them out." -- Phil Stahlheber
Post Reply