Page 1 of 2

[Closed] Fake Faults

Posted: Mon Jul 10, 2017 12:56 pm
by Octacone
I am literary thinking of deleting my entire project and smashing my PC into pieces.
What is going on:

Code: Select all

command_t command_list[10];
This freaking thing triggers a page fault out of nowhere. It has nothing to do with paging. Also it is not even a real page fault! If it was real it would have shown as a red panic screen with all the registers printed. But, it is not showing anything, just a regular triple fault that shows that red screen for a second(barely) with no registers printed in the console output window.
Something ever more bizarre!
This:

Code: Select all

command_t* command_list[10];
does not cause an exception. But, when I try to modify it, it surely does. (even if I do *command_list = (command_t*) Heap.Malloc(sizeof(command_t) * 10)).

What does the f̶o̶x̶ Bochs say? Nothing, 3rd exception with no resolution.

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 1:11 pm
by Brendan
Hi,

I'd recommend forgetting about whatever causes the first page fault and focusing on whatever causes the second problem (finding out why the page fault handler crashes).
Octacone wrote:This freaking thing triggers a page fault out of nowhere. It has nothing to do with paging.
Are you sure? Some kinds of bugs (like not invalidating the TLB correctly) can be elusive. ;)


Cheers,

Brendan

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 1:12 pm
by iansjack
Sounds like a stack overrun. Debugging will confirm that.

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 1:22 pm
by simeonz
This is array of pointers.

Code: Select all

command_t* command_list[10];
This is pointer to an array.

Code: Select all

command_t(* command_list)[10];
I'm mentioning this, because it is a point of confusion sometimes. The rule for declaration types is to read from right to left within a given precedence level, where a parenthesis increases the precedence as usual.

That being said, fixing the declaration wont fix your problem.

Code: Select all

*command_list = (command_t*) Heap.Malloc(sizeof(command_t) * 10);
assigns to the first "pointer" in the array, which should be ok, but the location of the array is apparently faulty for some reason. So see the other suggestions for that.

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 1:39 pm
by Octacone
Brendan wrote:Hi,

I'd recommend forgetting about whatever causes the first page fault and focusing on whatever causes the second problem (finding out why the page fault handler crashes).
Octacone wrote:This freaking thing triggers a page fault out of nowhere. It has nothing to do with paging.
Are you sure? Some kinds of bugs (like not invalidating the TLB correctly) can be elusive. ;)


Cheers,

Brendan
I actually completely forgot about that. :oops: Not sure when it is needed doe. Since I am only allocating memory addresses and mapping them, not freeing/modifying them. If I modify a page directory aka change something in it, it won't automatically update, right? So I need to wipe the TLB and "rebuild" it. How could I have forgotten this? Edit: What address do I need to pass to "invlpg", can't figure this one out.

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 1:44 pm
by Octacone
iansjack wrote:Sounds like a stack overrun. Debugging will confirm that.
Hmm. 32 KB is not enough? Could be, possibly.

As far as debugging goes, GDB is not my friend. I throws an error when doing "stepi". I can only go as far as one step. Then some crazy symbol error happens. Will try to replicate and post it.
Edit: it is not "stepi", I just can't get it to break on Shell.Initialize();, not found, but I loaded an appropriate symbol table.
Edit 2: I wasn't lazy so I explored something, when it comes to C++ and GDB I need to specify a related namespace. So I managed to replicate it:

Code: Select all

(gdb) break Basic_OS::Shell_Class::Initialize
Breakpoint 3 at 0x107160: file Sources/Kernel/Shell.cpp, line 136.
(gdb) c
Continuing.
/build/gdb-cXfXJ3/gdb-7.11.1/gdb/inline-frame.c:171: internal-error: inline_frame_this_id: Assertion `!frame_id_eq (*this_id, outer_frame_id)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) 

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 1:49 pm
by Octacone
simeonz wrote:This is array of pointers.

Code: Select all

command_t* command_list[10];
This is pointer to an array.

Code: Select all

command_t(* command_list)[10];
I'm mentioning this, because it is a point of confusion sometimes. The rule for declaration types is to read from right to left within a given precedence level, where a parenthesis increases the precedence as usual.

That being said, fixing the declaration wont fix your problem.

Code: Select all

*command_list = (command_t*) Heap.Malloc(sizeof(command_t) * 10);
assigns to the first "pointer" in the array, which should be ok, but the location of the array is apparently faulty for some reason. So see the other suggestions for that.
It is definitely confusing sometimes. I have actually already tried this. Something else is causing it for sure. In fact, I don't even need any of this. The problem lies somewhere else since command_t command_list[10] should be okay by itself, it does not need to by dynamically allocated since it is a fixed size array.

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 2:12 pm
by simeonz
Note that if the stack was exhausted, as iansjack suggested, you cannot use dynamic allocation. Any further function calls will trigger the issue, because they will try to create more call frames. Even printing a message will get you toast. If your NMIs are configured to use separate stack through IST, you can arrange to print the old stack pointer from there, but if they use the normal kernel stack, you will get recurring fault.

Re: Pulling my hair out! "Fake" Faults

Posted: Mon Jul 10, 2017 3:49 pm
by BrightLight
Octacone wrote:If I modify a page directory aka change something in it, it won't automatically update, right? So I need to wipe the TLB and "rebuild" it. How could I have forgotten this? Edit: What address do I need to pass to "invlpg", can't figure this one out.
You run the INVLPG instruction on each 4 KB-aligned address you modified the mapping for. For example, if you map/unmap or modify three pages starting at address 0x2000, you'd do something like:

Code: Select all

mov eax, 0x2000	; first page
invlpg [eax]
add eax, 4096	; second page
invlpg [eax]
add eax, 4096	; third page
invlpg [eax]
EDIT: Don't put a shell inside your kernel. Now that you have proper paged memory management, the next thing to work on is multitasking or userspace.

Re: Pulling my hair out! "Fake" Faults

Posted: Tue Jul 11, 2017 2:43 pm
by Octacone
omarrx024 wrote:
Octacone wrote:If I modify a page directory aka change something in it, it won't automatically update, right? So I need to wipe the TLB and "rebuild" it. How could I have forgotten this? Edit: What address do I need to pass to "invlpg", can't figure this one out.
You run the INVLPG instruction on each 4 KB-aligned address you modified the mapping for. For example, if you map/unmap or modify three pages starting at address 0x2000, you'd do something like:

Code: Select all

mov eax, 0x2000	; first page
invlpg [eax]
add eax, 4096	; second page
invlpg [eax]
add eax, 4096	; third page
invlpg [eax]
EDIT: Don't put a shell inside your kernel. Now that you have proper paged memory management, the next thing to work on is multitasking or userspace.
That solves my dilemma, thanks! I can't quite do anything user space related yet. Still have to gather some knowledge before I can get into that. I can't function without a shell, definitely need one.

Re: Pulling my hair out! Fake Faults

Posted: Wed Jul 12, 2017 2:24 pm
by Octacone
I increased the stack size, didn't fix anything. (64 KB is enough, tried 512 KB still the same). Also implemented TLB invalidation, still the same.
What is going on? When I disable paging it goes away, but why? Everything is mapped properly, it is not a higher half kernel or anything. Also there are no page faults or anything, just triple faults (restarting).

Edit:
When I change:

Code: Select all

typedef struct command_t
{
	char name[32];
	char description[128];
	void (*function_pointer)();
}command_t;
to

Code: Select all

typedef struct command_t
{
	char name;
	char description;
	void (*function_pointer)();
}command_t;
It goes away. Any clues?

Re: Pulling my hair out! Fake Faults

Posted: Wed Jul 12, 2017 5:18 pm
by simeonz
Probably won't help much, but you can try "-fstack-usage". You can also turn on the "-Wstack-usage=len" option in general. Having a working gdb is the best method as you can break on function entry, but you said there is an issue with that.

To be honest, I'm not sufficiently competent in these matters, but if your page fault handler does not use a separate exception stack, it seems logical that it should triple fault.

Moved: Pulling my hair out! Fake Faults

Posted: Thu Jul 13, 2017 12:36 am
by Octacone
simeonz wrote:Probably won't help much, but you can try "-fstack-usage". You can also turn on the "-Wstack-usage=len" option in general. Having a working gdb is the best method as you can break on function entry, but you said there is an issue with that.

To be honest, I'm not sufficiently competent in these matters, but if your page fault handler does not use a separate exception stack, it seems logical that it should triple fault.
I enabled those two switches and everything seems reasonable:

Code: Select all

xxx	32	dynamic,bounded
xxx	32	dynamic,bounded
xxx	4	static
xxx	32	dynamic,bounded
xxx	4	static
xxx	32	dynamic,bounded
xxx	32	dynamic,bounded
xxx	4	static
xxx	48	dynamic,bounded
xxx	32	dynamic,bounded
xxx = some function, removed for easier readability
I only have 1 stack for the entire OS. Nobody told me about the exception handler one.

Moved to: http://forum.osdev.org/viewtopic.php?f= ... 64#p277564

Re: Moved: Pulling my hair out! Fake Faults

Posted: Thu Jul 13, 2017 1:38 am
by iansjack
If you only have one stack than a corrupted stack, or a stack overrun, is always going to lead to a triple-fault, masking the original GPF or PF. Set a separate stack to be used exclusively by the page fault handler at least. Your current problem may not be a stack overrun but it certainly sounds like some form of stack corruption.

Actually, I think your first move should be to solve your problem with gdb. Without proper debugging facilities life is always going to be tough. Alternatively, you could use Bochs with its own debugger.

Moved: Pulling my hair out! Fake Faults

Posted: Thu Jul 13, 2017 2:52 am
by Octacone
iansjack wrote:If you only have one stack then a corrupted stack, or a stack overrun, is always going to lead to a triple-fault, masking the original GPF or PF. Set a separate stack to be used exclusively by the page fault handler at least. Your current problem may not be a stack overrun but it certainly sounds like some form of stack corruption.

Actually, I think your first move should be to solve your problem with gdb. Without proper debugging facilities life is always going to be tough. Alternatively, you could use Bochs with its own debugger.
GDB is now working properly, thanks to LtG.
How exactly am I supposed to do it. Any references?

Can you please reply to: http://forum.osdev.org/viewtopic.php?f= ... 69#p277569 since the main discussion is over there? Topic moved.