Page 1 of 1

strange Page Fault

Posted: Sat Apr 09, 2005 10:01 am
by amirsadig
before a while my kernel work fine. today I just make a small in my kernel, which should not change does not do any pointer operations. after this change user application crash with page fault. take alook to the attached picture. the first one at 0x8048080 which I expect. I don't load all code to memory, I load it when this page fault occure. kernel handle this page fault and return back to user mode. now my application try to call strlen with a defined string, which located at data segment:

Code: Select all

write(1, "\x1B[2J", strlen("\x1B[2J"));
 80480a3:   83 ec 04                sub    $0x4,%esp
 80480a6:   83 ec 08                sub    $0x8,%esp
 80480a9:   68 d7 8b 04 08          push   $0x8048bd7
 80480ae:   e8 ad 04 00 00          call   8048560 <strlen>
 80480b3:   83 c4 0c                add    $0xc,%esp
 80480b6:   50                      push   %eax
 80480b7:   68 d7 8b 04 08          push   $0x8048bd7
 80480bc:   6a 01                   push   $0x1
 80480be:   e8 08 02 00 00          call   80482cb <write>
 80480c3:   83 c4 10                add    $0x10,%esp
then I got page fault at 0x804B000, which out of code/data/bss segment. the application has not extend its heep yet.
CPU tell me this intructions has made the fault:

Code: Select all

08048560 <strlen>:
 8048560:   55                      push   %ebp
 8048561:   89 e5                   mov    %esp,%ebp
 8048563:   8b 55 08                mov    0x8(%ebp),%edx
 8048566:   89 d0                   mov    %edx,%eax
 8048568:   80 3a 00                cmpb   $0x0,(%edx)
 804856b:   74 09                   je     8048576 <strlen+0x16>
 804856d:   8d 76 00                lea    0x0(%esi),%esi
 8048570:   40                      inc    %eax
 8048571:   80 38 00                cmpb   $0x0,(%eax)
 8048574:   75 fa                   jne    8048570 <strlen+0x10>
 8048576:   29 d0                   sub    %edx,%eax
 8048578:   5d                      pop    %ebp
 8048579:   c3                      ret    
 804857a:   8d b6 00 00 00 00       lea    0x0(%esi),%esi
I can't figure why "mov %esp,%ebp" cased page fault at 0x804B000!!!

I am sure there is something in my kernel has cased this. the application I written work on linux like a charm, that mean something on my kernel make damage.
for any HINTS I will be appreciated!

[tt]
CODE start at 8048000
DATA start at 8049be0
error copy segment 8049be0 offset BE
Exception #14 (pagefault)
EDI=0 ESI=0 EBP=804aff4 ESP=d033dfe0
EBX=0 EDX=0 ECX=0 EAX=1
DS=2B ES=2B FS=2B GS=2B
int=0E err=06 EIP=8048561 CVS=33
uSP=804afbc uSS=2B
[/tt]

Re:strange Page Fault

Posted: Mon Apr 11, 2005 3:59 am
by Pype.Clicker
any chance you're getting out of your stack's space ?

Re:strange Page Fault

Posted: Mon Apr 11, 2005 5:45 am
by amirsadig
after long debugging, I have notice the problem happend during task switch. I set normaly 4k stack for each task.

I will describe my task structure. I use TSS to implement task switch, I give each task a 4k user stack (ESP) and 4k kernel stack (ESP0). I think this enough for my small applications which doesn't call allot of functions so that it need a big stack.

I have two application, the first one is a small shell, which wait for a command to execute. the second app is also a small app which ask you to enter your name and then reprint your name as you write and exit when you type "quit".

now I made some clean to code but still crash, when it crash on both one it try to access ebp register. but when you see the dump the ebp point to correct stack address put how it crash on that pointer I don't know?

I will make further debugging to get more info..

Re:strange Page Fault

Posted: Mon Apr 11, 2005 7:09 am
by Pype.Clicker
maybe you could like to show a few bytes at EIP to make sure the loaded binary actually matches the expected content.

Re:strange Page Fault

Posted: Mon Apr 11, 2005 12:36 pm
by amirsadig
I have made now a debug. I have set break point before the EIP which make the fault. BOCHS stop at my break points and I take alot to the code and it OK, then I stepped some steps and the code works. then my code call some operating system calls and then I go in my break points, now the memory is damaged the virtual address no more has entry on page table, as BOCHS tell that no there is no physicall access and thus can't read the instructions.

Re:strange Page Fault

Posted: Mon Apr 11, 2005 4:47 pm
by Pype.Clicker
so it looks like something was aliased to the page tables ? may that be due to some non-initialized pointer ?

Re:strange Page Fault

Posted: Tue Apr 12, 2005 2:51 am
by amirsadig
I will double check my source code. normaly I compile using -Wall -Werror, which force compiler to break when any warrning occure, like a variable is used before it initialized.

Re:strange Page Fault

Posted: Tue Apr 12, 2005 3:44 am
by Pype.Clicker
i meant, there's probably something nasty being performed from the system call that trashed page tables (somehow) ... maybe you could like to inspect those tables before-and-after and see what could have changed (dump_cpu should give you PDBR and xp ... should allow you to inspect tables)

then if you manage to set a memory-watch to those tables and see when things go wrong ...

Re:strange Page Fault

Posted: Tue Apr 12, 2005 8:33 am
by DruG5t0r3
I've made that same mistake where my kmalloc wasn't really working and overwrote my ldt with some 0s...took me some time to realise it.

Re:strange Page Fault

Posted: Wed Apr 13, 2005 10:22 am
by amirsadig
I have found the basic fault but not all. the faults is that I reload the CR3 each time I change something in the page table to flush it. but I forgot that, when I create a task and I map its code/data segements I create a new page table and here I reload the CR3 with this fresh created page table, the problem is that I reload inside another task which has different CR3 and thus during task switch the CPU save the wrong CR3 for its.

now the page fault come in other part of the code. now it is related to STACK. the content of the user stack got corrupted after execute syscall, but this not happend just after the first call but some syscall calls.

I am trying to figure out what happend!!

Re:strange Page Fault

Posted: Wed Apr 13, 2005 1:36 pm
by amirsadig
ahh,
I have catched the problem. it is realy hards to find it. the compiler has create a wrong code for my keyboard irq handler, because I have defined it in away it has generate a code, which touch the stack.
normally my irqs got a pointers to the saved registers in the stack during exception(interrupt)/task switch as parameter. for those irq's handlers, that doesn't need them I define them as void xxx_irq(void). I have forgot to set (void) for keyboard_irq, this forgot void make me a disaster for the system call sys_read, which wait for the keyboard. so after return from system call the local variables on the user applications are no more in the right position on the stack and thus got a strange value which cased page faults.

now I will go one file by file and documents & clean every thing, I will do that before adding any new features to my OS.
:D

Re:strange Page Fault

Posted: Thu Apr 14, 2005 1:40 am
by Pype.Clicker
amirsadig wrote: ahh,
I have catched the problem.
I truly hope you actually caught it, because your explanation seems very odd to me.
now I will go one file by file and documents & clean every thing, I will do that before adding any new features to my OS.
:D
That'll probably be a WiseThing (tm).

Re:strange Page Fault

Posted: Thu Apr 14, 2005 11:05 am
by amirsadig
I truly hope you actually caught it, because your explanation seems very odd to me.
I also hope that. I agree that I also not sure about solutions, what make now sure that the problem is not 100% clear, that I now for a few minutes make the same mistake by functions prototype and the code also work. this make me mad, because yesterday when I do that I got page fault. and this was happend alot of time and page fault got away when I correct the prototype of the function, therefore I said what explain yesterday. now it work even in BOCHS VMWARE and real PC.

as I said I should make a complate lookup in my code to clean it.