move_stack() fails for some unknown reason!!!

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

move_stack() fails for some unknown reason!!!

Post by mariuszp »

Hi.

I just implemented scheduling in my kernel. I used JamesM's kernel dev tuts to give me an idea of how this works. (Pratically the whole kernel is somewhat based on the tutorial). I pretty much copied his move_stack() function. Now, I tested the scheduler and it gave me a page fault, like you could see on the "What your Os goes carzy..." thread. I debugged it and it revealed that the whole thing crashed because of move_stack(). Here's the code:

Code: Select all

void move_stack(void * new_stack_start, u32int size)
{
	u32int i;
	for (i = (u32int) new_stack_start;
		i >= ((u32int) new_stack_start-size);
		i -= 0x1000)
	{
		alloc_frame( get_page(i, 1, current_directory), 0 /* User mode */, 1 /* Is writable */ );
	};
	u32int pd_addr;
	asm volatile("mov %%cr3, %0" : "=r" (pd_addr));
	asm volatile("mov %0, %%cr3" : : "r" (pd_addr));
	u32int old_stack_pointer; asm volatile("mov %%esp, %0" : "=r" (old_stack_pointer));
	u32int old_base_pointer;  asm volatile("mov %%ebp, %0" : "=r" (old_base_pointer));
	u32int offset            = (u32int)new_stack_start - initial_esp;
	u32int new_stack_pointer = old_stack_pointer + offset;
	u32int new_base_pointer  = old_base_pointer  + offset;
	memcpy((void*)new_stack_pointer, (void*)old_stack_pointer, initial_esp-old_stack_pointer);
	for(i = (u32int)new_stack_start; i > (u32int)new_stack_start-size; i -= 4)
	{
	   u32int tmp = * (u32int*)i;
	   // If the value of tmp is inside the range of the old stack, assume it is a base pointer
	   // and remap it. This will unfortunately remap ANY value in this range, whether they are
	   // base pointers or not.
	   if (( old_stack_pointer < tmp) && (tmp < initial_esp))
	   {
	     tmp = tmp + offset;
	     u32int *tmp2 = (u32int*)i;
	     *tmp2 = tmp;
	   }
	};
	asm volatile("mov %0, %%esp" : : "r" (new_stack_pointer));
	asm volatile("mov %0, %%ebp" : : "r" (new_base_pointer));
};
Anyone know where the problem could be, perhaps?
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: move_stack() fails for some unknown reason!!!

Post by gerryg400 »

I debugged it and it revealed that the whole thing crashed because of move_stack().
How did debugging reveal that ?
If a trainstation is where trains stop, what is a workstation ?
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: move_stack() fails for some unknown reason!!!

Post by Brendan »

Hi,
mariuszp wrote:I just implemented scheduling in my kernel. I used JamesM's kernel dev tuts to give me an idea of how this works. (Pratically the whole kernel is somewhat based on the tutorial). I pretty much copied his move_stack() function. Now, I tested the scheduler and it gave me a page fault, like you could see on the "What your Os goes carzy..." thread. I debugged it and it revealed that the whole thing crashed because of move_stack().
For arbitrary code, it's impossible to move the stack without risking modifying data on the stack that looks like a pointer, leaving pointers (e.g. in global variables) pointing to the old stack, etc. In this case, if you need to move a stack your design is broken. Don't do it.

For some rare special cases it is possible. For example, for OS initialisation code written in assembly, where you know that there are no pointers pointing to anything on the stack. This case doesn't apply to you - you're using a compiler so all bets are off. Other cases include shifting a stack somewhere then shifting it back to where it was before it's used again (which seems pointless to me), or using a special language (e.g. a managed/interpreted/whatever language where you know about all pointers).

The last case is where you shift the stack to different physical pages, but you use paging (or maybe segmentation) to make sure it remains at the same virtual address. This is fine.

I found the code in JamesM's tutorial. Section 9.3. Creating a new stack says:

"Currently, we have been using an undefined stack. What does that mean? well, GRUB leaves us, stack-wise, in an undefined state. The stack pointer could be anywhere. In all practical situations, GRUB's default stack location is large enough for our startup code to run without problems. However, it is in lower memory (somewhere around 0x7000 physical), which causes us problems as it'll be 'linked' instead of 'copied' when a page directory is changed (because the area from 0x0 - approx 0x150000 is mapped in the kernel_directory). So, we really need to move the stack."

This is insanely bad. If someone uses any other multi-boot compliant boot loader to boot your OS, or if a new/different version of GRUB is released, then there's no guarantee that the stack you've been using will be sane (your "stack" might be in ROM or in video memory or overwriting your code or overwriting data in the multi-boot structure that you need later or...). Before using any stack you should setup a valid stack. This means (for example) allowing some space in your ".bss" for a valid stack and having some assembly language startup code where one of the first instructions executed sets ESP.

Basically, you're moving the stack (which is "bad") because you failed to setup a valid stack to begin with (which is also "bad"). Do it right and both problems go away.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: move_stack() fails for some unknown reason!!!

Post by JamesM »

"Currently, we have been using an undefined stack. What does that mean? well, GRUB leaves us, stack-wise, in an undefined state. The stack pointer could be anywhere. In all practical situations, GRUB's default stack location is large enough for our startup code to run without problems. However, it is in lower memory (somewhere around 0x7000 physical), which causes us problems as it'll be 'linked' instead of 'copied' when a page directory is changed (because the area from 0x0 - approx 0x150000 is mapped in the kernel_directory). So, we really need to move the stack."

This is insanely bad. If someone uses any other multi-boot compliant boot loader to boot your OS, or if a new/different version of GRUB is released, then there's no guarantee that the stack you've been using will be sane (your "stack" might be in ROM or in video memory or overwriting your code or overwriting data in the multi-boot structure that you need later or...). Before using any stack you should setup a valid stack. This means (for example) allowing some space in your ".bss" for a valid stack and having some assembly language startup code where one of the first instructions executed sets ESP.

Basically, you're moving the stack (which is "bad") because you failed to setup a valid stack to begin with (which is also "bad"). Do it right and both problems go away.
No arguments here. I'm far too busy to rewrite the documentation for those tutorials to match the new, better code at http://jamesm-tutorials.googlecode.com/. Just too much work!
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: move_stack() fails for some unknown reason!!!

Post by JamesM »

berkus wrote:Maybe time to offer monetary compensations for writing docs!

j/k
;)
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: move_stack() fails for some unknown reason!!!

Post by mariuszp »

gerryg400 wrote:
I debugged it and it revealed that the whole thing crashed because of move_stack().
How did debugging reveal that ?
Because I got my init_sched() to print a message before and after moving the stack. The first message apperead, and then i got page fault and the second message never appeared.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: move_stack() fails for some unknown reason!!!

Post by gerryg400 »

That doesn't mean it crashed because of move_stack(). Which instruction faulted ? What was the page fault address ? Was that address correct ? If correct, why was it not mapped correctly ? If it was incorrect, why was it incorrect ? When did it become bad ? You will need to do some real debugging here. You need to review your testing to unit test all your low level functionality to be sure it's correct. This type of bug (where the problem occurs the same way every time you run your program) is absolutely the easiest bug in osdev to identify and fix.

My advice is to identify this problem and fix it. It may not be move_stack(), it may be a latent bug somewhere else. Once you have found the bug, take Brendan's advice and don't use that function at all, even if it is not the cause, because it is fundamentally dangerous.
If a trainstation is where trains stop, what is a workstation ?
User avatar
piranha
Member
Member
Posts: 1391
Joined: Thu Dec 21, 2006 7:42 pm
Location: Unknown. Momentum is pretty certain, however.
Contact:

Re: move_stack() fails for some unknown reason!!!

Post by piranha »

Does the mapping complete? Is your VMM mapping function working right? Have you tested it?

What values are you passing to the function? What is the value of initial_esp? Are you sure it is set correctly?

-JL
SeaOS: Adding VT-x, networking, and ARM support
dbittman on IRC, @danielbittman on twitter
https://dbittman.github.io
mariuszp
Member
Member
Posts: 587
Joined: Sat Oct 16, 2010 3:38 pm

Re: move_stack() fails for some unknown reason!!!

Post by mariuszp »

berkus wrote:As a person who used JamesM tutorial to start, I can tell that move_stack() function was indeed invalid - calling it caused stack corruption, and it was the first function I totally got rid of before moving on.
Hey, you were right...

After I got rid of initial_esp and move_stack() and allocated the stack in the asm file:

Code: Select all

stack: resb 0x4000
...and then set esp to stack-0x4000 it started working right.
Post Reply