(Fixed) Array Triple Fault

Octacone · Post by **Octacone** » Thu Jul 13, 2017 1:35 am

I moved this thread over here because it is more related.

To reintroduce you:

This command_t command_list[10]; causes a triple fault. While this command_t* command_list[10]; or this command_t command_list[]; doesn't.

When paging/interrupts is/are disabled it goes away.

Also this is my struct:

Code: Select all

typedef struct command_t
{
	char name[32];
	char description[128];
	void (*function_pointer)();
}command_t;

When I remove the numbers (32, 128) it goes away.

I've been struggling with this for days without any success.

(GDB) I am even more unlucky because when I do "break Kernel_Main, c, stepi" it crashes with: gdb/inline-frame.c:171: internal-error: inline_frame_this_id: Assertion `!frame_id_eq (*this_id, outer_frame_id)' failed. So no debugging possible. Bochs just hangs before VMM initialization. (freezes after PMM initialization with a blinking cursor). No error codes or anything.

LtG · Post by **LtG** » Thu Jul 13, 2017 1:47 am

Both the first and second form you used for the array create 10 elements in the array, the first form has 10 structs in the array the second just pointers, so no structs allocated in that one.

Are you saying that having that line of code causes it to triple fault? If you modify the second form from 10 to say 1000 or 10000 will it now crash too? If it does it could mean that you are just allocating too much memory and the issue isn't with this code, it's with your memory allocation, where you potentially don't handle page faults (or you have segmentation issues and cause a #GP) correctly, and that causes double fault -> triple.

If you don't have a double fault handler, create it. Then figure out what caused the double fault, and create a handler for that too. Of course you could (and should) start by having handlers for all the possible exceptions, just kpanic() in them and later add more proper code (ie. handling the actual exception instead of panicking).

As for debugging, you can always debug, even if gdb doesn't work. For instance you can add HLT in many places and see how far in your code you can get before triple fault, etc.

As for gdb, have you tried compiling your code with debugging enabled and optimizations disabled? And did gdb actually _crash_ or just give error? If it crashes then I'd consider that a bug and you should report it.

Octacone · Post by **Octacone** » Thu Jul 13, 2017 2:18 am

LtG wrote:Both the first and second form you used for the array create 10 elements in the array, the first form has 10 structs in the array the second just pointers, so no structs allocated in that one.

Are you saying that having that line of code causes it to triple fault? If you modify the second form from 10 to say 1000 or 10000 will it now crash too? If it does it could mean that you are just allocating too much memory and the issue isn't with this code, it's with your memory allocation, where you potentially don't handle page faults (or you have segmentation issues and cause a #GP) correctly, and that causes double fault -> triple.

If you don't have a double fault handler, create it. Then figure out what caused the double fault, and create a handler for that too. Of course you could (and should) start by having handlers for all the possible exceptions, just kpanic() in them and later add more proper code (ie. handling the actual exception instead of panicking).

As for debugging, you can always debug, even if gdb doesn't work. For instance you can add HLT in many places and see how far in your code you can get before triple fault, etc.

As for gdb, have you tried compiling your code with debugging enabled and optimizations disabled? And did gdb actually _crash_ or just give error? If it crashes then I'd consider that a bug and you should report it.

I want this command_t command_list[10];. I don't need to allocate anything, no pointers, nothing. Basically what I want is to have an array that contains 10 commands (struct command_t).
That exact line causes it to triple fault.
I already have an exception handler that contains every possible fault code. It just does not trigger.
Actually there is nothing to "hlt" debug in the first place, there is only one line of code that does it.
It didn't just crash, that error showed up and debugging had to be terminated (Y/N question).

Edit:
When I compiled it without any optimizations GDB started working again. Thanks!

LtG · Post by **LtG** » Thu Jul 13, 2017 2:29 am

Octacone wrote: I want this command_t command_list[10];. I don't need to allocate anything, no pointers, nothing. Basically what I want is to have an array that contains 10 commands (struct command_t).
That exact line causes it to triple fault.
I already have an exception handler that contains every possible fault code. It just does not trigger.
Actually there is nothing to "hlt" debug in the first place, there is only one line of code that does it.
It didn't just crash, that error showed up and debugging had to be terminated (Y/N question).

You don't need to allocate, the compiler needs to allocate. This should likely end up in the .bss section.

Given that you get triple fault it means that the CPU couldn't find the double fault handler, so it's not there. Either it's invalid code (possibly because something overwrote it), it has invalid IDT entry (something overwrote it), possibly your paging causes IDT to be in some different place than is expected by the CPU (IDTR doesn't point to correct IDT address when paging is considered).

Note, your binary size may increase due to the [10] allocation so stuff might be in different places, it's also possible that this increase in size causes your stack to be overwritten or your stack to overwrite something. You may have to check your linker script too.

What happens if you decrease the number 10 to say 1 or 2? Also, did you test what I suggested, increasing the pointer version from 10 to say 1k or 10k? Will it cause it triple fault also?

What binary format do you use? ELF? How is your OS/kernel loaded, GRUB?

Octacone · Post by **Octacone** » Thu Jul 13, 2017 2:44 am

LtG wrote:
Octacone wrote: I want this command_t command_list[10];. I don't need to allocate anything, no pointers, nothing. Basically what I want is to have an array that contains 10 commands (struct command_t).
That exact line causes it to triple fault.
I already have an exception handler that contains every possible fault code. It just does not trigger.
Actually there is nothing to "hlt" debug in the first place, there is only one line of code that does it.
It didn't just crash, that error showed up and debugging had to be terminated (Y/N question).
You don't need to allocate, the compiler needs to allocate. This should likely end up in the .bss section.

Given that you get triple fault it means that the CPU couldn't find the double fault handler, so it's not there. Either it's invalid code (possibly because something overwrote it), it has invalid IDT entry (something overwrote it), possibly your paging causes IDT to be in some different place than is expected by the CPU (IDTR doesn't point to correct IDT address when paging is considered).

Note, your binary size may increase due to the [10] allocation so stuff might be in different places, it's also possible that this increase in size causes your stack to be overwritten or your stack to overwrite something. You may have to check your linker script too.

What happens if you decrease the number 10 to say 1 or 2? Also, did you test what I suggested, increasing the pointer version from 10 to say 1k or 10k? Will it cause it triple fault also?

What binary format do you use? ELF? How is your OS/kernel loaded, GRUB?

That is what I am saying, I don't need to allocate because the compiler will do it for me.
Why would it overwrite anything? My kernel is loaded at 1 MB mark. The entire kernel is identity mapped, no problems in there.
It does crash both when I increase it and when I decrease it to 1/2.
Yup, GRUB + ELF.

Octacone · Post by **Octacone** » Thu Jul 13, 2017 3:02 am

Is this normal behavior?

a) -O2 enabled, ignoring the shell everything is fine
b) -O2 disabled, ignoring the shell it crashes

LtG · Post by **LtG** » Thu Jul 13, 2017 3:05 am

In normal user space dev work you don't really have to worry about the allocation. You do need to worry about it some extent, the compiler will take care of it but it still has to be allocated by the "OS", and since you're doing osdev you need to worry about it.

Not sure if you tested the following:
command_t* command_list[10]; // works according to you
vs
command_t* command_list[10000]; // does this work?

It's possible that the only relevant difference between your first and second forms (structs vs pointers to structs) is the size of the allocation the compiler makes for the arrays at compile time which is then reflected in the ELF file by the linker, thus you need to ensure that all of this is taken care of. You can inspect the generated ELF file with objdump and readelf in Linux.

You may also want to use gdb to breakpoint at some early stage of your boot code (ie after GRUB and you setting up paging and IDT, but before anything else) and check your IDT and paging, since clearly the CPU can't find your double fault handler..

edit. Note, the [10] vs [10000] could just as well be:
char blaa[100*1000]; // allocate lots of static memory

And it's important to consider is this allocation done inside a function (thus likely to end up on stack, where your stack may not be large enough to hold it) or globally where it would likely end up in the .bss segment I mentioned earlier..

LtG · Post by **LtG** » Thu Jul 13, 2017 3:13 am

Octacone wrote:Is this normal behavior?

a) -O2 enabled, ignoring the shell everything is fine
b) -O2 disabled, ignoring the shell it crashes

Normal in the sense that it should happen, likely yes. When you get problems due to optimization changes it's because your doing something "wrong". It's possible, though very unlikely, that it is caused by bugs in the compiler.

Essentially, suppose some code has a null pointer which is used, that should cause a crash, but if the optimizer can deduce that the value from the pointed location is never actually used it may remove that piece of code and thus it no longer crashes. The compiler (and optimizer) is required to maintain "observable behavior"..

So if you do something "wrong" the optimizations may affect the outcome. For instance in some cases optimized code may be smaller, thus execute in less time and thus in some cases it may prevent certain race conditions where the slower non-optimized code would expose it, but this can also work the other way around. So generally it's good to test both debug and non-debug as well as some different optimization levels. If any of them have different behavior then you should always stop and fix those before continuing, otherwise you have nasty surprises waiting that might explode a year later and you have no clue why some completely unrelated piece of new code causes it to break.

Of course everyone has different desires for their project so it depends how far you want to take certain things, including automated builds to build different optimization levels and run automated tests against those. Doing it manually won't work in the long run, but even manually testing other build settings (debug, optimization, etc) occasionally can be useful.

Octacone · Post by **Octacone** » Thu Jul 13, 2017 3:18 am

LtG wrote:In normal user space dev work you don't really have to worry about the allocation. You do need to worry about it some extent, the compiler will take care of it but it still has to be allocated by the "OS", and since you're doing osdev you need to worry about it.

Not sure if you tested the following:
command_t* command_list[10]; // works according to you
vs
command_t* command_list[10000]; // does this work?

It's possible that the only relevant difference between your first and second forms (structs vs pointers to structs) is the size of the allocation the compiler makes for the arrays at compile time which is then reflected in the ELF file by the linker, thus you need to ensure that all of this is taken care of. You can inspect the generated ELF file with objdump and readelf in Linux.

You may also want to use gdb to breakpoint at some early stage of your boot code (ie after GRUB and you setting up paging and IDT, but before anything else) and check your IDT and paging, since clearly the CPU can't find your double fault handler..

When I put a pointer it does work. Even if I put 10000. The problem is even if I allocate it (pointer case), accessing it would cause a fault.

I will take a look at objdump and readelf.

As far as I know IDT is perfectly fine. If I purposely trigger a division by zero exception, it triggers.

Octacone · Post by **Octacone** » Thu Jul 13, 2017 3:25 am

LtG wrote:
Octacone wrote:Is this normal behavior?

a) -O2 enabled, ignoring the shell everything is fine
b) -O2 disabled, ignoring the shell it crashes
Normal in the sense that it should happen, likely yes. When you get problems due to optimization changes it's because your doing something "wrong". It's possible, though very unlikely, that it is caused by bugs in the compiler.

Essentially, suppose some code has a null pointer which is used, that should cause a crash, but if the optimizer can deduce that the value from the pointed location is never actually used it may remove that piece of code and thus it no longer crashes. The compiler (and optimizer) is required to maintain "observable behavior"..

So if you do something "wrong" the optimizations may affect the outcome. For instance in some cases optimized code may be smaller, thus execute in less time and thus in some cases it may prevent certain race conditions where the slower non-optimized code would expose it, but this can also work the other way around. So generally it's good to test both debug and non-debug as well as some different optimization levels. If any of them have different behavior then you should always stop and fix those before continuing, otherwise you have nasty surprises waiting that might explode a year later and you have no clue why some completely unrelated piece of new code causes it to break.

Of course everyone has different desires for their project so it depends how far you want to take certain things, including automated builds to build different optimization levels and run automated tests against those. Doing it manually won't work in the long run, but even manually testing other build settings (debug, optimization, etc) occasionally can be useful.

That is very interesting. Something wrong with my code, oh boy. Questioning my entire OS right now. What could I have done wrong? Doing it manually is okay for now, since my kernel is very small. Maybe I am experiencing one of those nasty surprises right now.

LtG · Post by **LtG** » Thu Jul 13, 2017 3:26 am

Octacone wrote:
LtG wrote:In normal user space dev work you don't really have to worry about the allocation. You do need to worry about it some extent, the compiler will take care of it but it still has to be allocated by the "OS", and since you're doing osdev you need to worry about it.

Not sure if you tested the following:
command_t* command_list[10]; // works according to you
vs
command_t* command_list[10000]; // does this work?

It's possible that the only relevant difference between your first and second forms (structs vs pointers to structs) is the size of the allocation the compiler makes for the arrays at compile time which is then reflected in the ELF file by the linker, thus you need to ensure that all of this is taken care of. You can inspect the generated ELF file with objdump and readelf in Linux.

You may also want to use gdb to breakpoint at some early stage of your boot code (ie after GRUB and you setting up paging and IDT, but before anything else) and check your IDT and paging, since clearly the CPU can't find your double fault handler..
When I put a pointer it does work. Even if I put 10000. The problem is even if I allocate it (pointer case), accessing it would cause a fault.

I will take a look at objdump and readelf.

As far as I know IDT is perfectly fine. If I purposely trigger a division by zero exception, it triggers.

I added an edit to my earlier post, is this array global or local to some function? If local to some function can you disassemble that...

Octacone · Post by **Octacone** » Thu Jul 13, 2017 3:27 am

Is this a joke!?

This actually doesn't crash:

Code: Select all

command_t command_list[10000]; //but when set to 10 does

LtG · Post by **LtG** » Thu Jul 13, 2017 3:31 am

Octacone wrote:Is this a joke!?

This actually doesn't crash:
Code: Select all
command_t command_list[10000]; //but when set to 10 does

You might be able to use that to your advantage, disassemble both and compare, see what the difference is. Why one crashes but the other doesn't..

Note, the code generated may vary because the compiler decides doing something for 10k elements is more efficient in a different way.. Of course that's just a guess.

Octacone · Post by **Octacone** » Thu Jul 13, 2017 3:36 am

LtG wrote:
Octacone wrote:Is this a joke!?

This actually doesn't crash:
Code: Select all
command_t command_list[10000]; //but when set to 10 does
You might be able to use that to your advantage, disassemble both and compare, see what the difference is. Why one crashes but the other doesn't..

Note, the code generated may vary because the compiler decides doing something for 10k elements is more efficient in a different way.. Of course that's just a guess.

Great, give me a moment to figure out objdump switches.

simeonz · Post by **simeonz** » Thu Jul 13, 2017 3:48 am

Octacone wrote:Is this a joke!?

This actually doesn't crash:
Code: Select all
command_t command_list[10000]; //but when set to 10 does

In this manner, if you access the first element, you are accessing some location 1MB below the stack. This will jump over the unmapped memory gap and into some other memory, which in your case appears to be mapped. It is still buggy, but doesn't trigger crash immediately. If you enable "-fstack-check", it should crash. The option tells the compiler to probe all the allocated stack pages, whenever the stack frame becomes too big. (This switch is useful to prevent user mode exploits, where a function with very big frame can be used to jump over the guard page of the stack into heap territory.)

The "-fstack-usage" output from the previous discussion is suspicious somehow, because even 10 pointers should be 80 bytes. Is this array statically or stack allocated.

OSDev.org

(Fixed) Array Triple Fault

(Fixed) Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault

Re: Undefined Array Triple Fault