Newbie: Problem with fork() in Tutorial ?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Lynton
Posts: 4
Joined: Sat Jun 06, 2009 5:05 pm

Newbie: Problem with fork() in Tutorial ?

Post by Lynton »

Hi,
I'm very new to OS development and theory but have used Linux since the mid 90's and now feel I want to find out more about what makes an OS tick. I am very willing to learn so I thought the best place to start would be to look at other peoples code along with some nice tutorials plus lots of books and Google searches.

I found JamesM's tutorial and have downloaded the code and can compile it quite happily using GCC on Linux. The resultant kernel runs and all seems well except fork() which gives me a page fault for some reason.

Now, I know I could just go away and figure it out myself as part of the (steep) learning curve but figure I'd be better off looking at a kernel that doesn't have any major problems caused by something I have maybe done wrong.

The error I get in Bochs is this:-

Page fault! ( present ) at 0x0x123890ab - EIP: 0x102a31
PANIC(Page fault) at paging.c:230

Could one of you kind people enlighten me as to what/where the problem may be ?

In main.c I have:-
......................
......................
// Don't trample our module with placement accesses, please!
placement_address = initrd_end;

// Start paging.
initialise_paging();

// Start multitasking.
initialise_tasking();

// Initialise the initial ramdisk, and set it as the filesystem root.
fs_root = initialise_initrd(initrd_location);

// Create a new process in a new address space which is a clone of this.
int ret = fork();

monitor_write("fork() returned ");
....................
....................
As soon as the int ret = fork(); is executed I get the Page Fault. If I comment out the line and retry things looks better and the VFS files and their contents are printed to the screen.

The Bochs log is shown below:-

00014779648i[BIOS ] Booting from 0000:7c00
00016397340i[BIOS ] int13_harddisk: function 41, unmapped device for ELDL=80
00016402117i[BIOS ] int13_harddisk: function 08, unmapped device for ELDL=80
00016406776i[BIOS ] *** int 15h function AX=00c0, BX=0000 not yet supported!
00512960000i[ ] cpu loop quit, shutting down simulator
00512960000i[CPU0 ] CPU is in protected mode (active)
00512960000i[CPU0 ] CS.d_b = 32 bit
00512960000i[CPU0 ] SS.d_b = 32 bit
00512960000i[CPU0 ] EFER = 0x00000000
00512960000i[CPU0 ] | RAX=0000000000000000 RBX=0000000000000200
00512960000i[CPU0 ] | RCX=00000000000b8000 RDX=00000000000003d5
00512960000i[CPU0 ] | RSP=00000000dffffd5c RBP=00000000dffffd64
00512960000i[CPU0 ] | RSI=0000000000053c9d RDI=0000000000053c9e
00512960000i[CPU0 ] | R8=0000000000000000 R9=0000000000000000
00512960000i[CPU0 ] | R10=0000000000000000 R11=0000000000000000
00512960000i[CPU0 ] | R12=0000000000000000 R13=0000000000000000
00512960000i[CPU0 ] | R14=0000000000000000 R15=0000000000000000
00512960000i[CPU0 ] | IOPL=0 id vip vif ac vm rf nt of df if tf sf ZF af PF cf
00512960000i[CPU0 ] | SEG selector base limit G D
00512960000i[CPU0 ] | SEG sltr(index|ti|rpl) base limit G D
00512960000i[CPU0 ] | CS:0008( 0001| 0| 0) 00000000 000fffff 1 1
00512960000i[CPU0 ] | DS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00512960000i[CPU0 ] | SS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00512960000i[CPU0 ] | ES:0010( 0002| 0| 0) 00000000 000fffff 1 1
00512960000i[CPU0 ] | FS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00512960000i[CPU0 ] | GS:0010( 0002| 0| 0) 00000000 000fffff 1 1
00512960000i[CPU0 ] | MSR_FS_BASE:0000000000000000
00512960000i[CPU0 ] | MSR_GS_BASE:0000000000000000
00512960000i[CPU0 ] | RIP=00000000001008c1 (00000000001008c1)
00512960000i[CPU0 ] | CR0=0xe0000011 CR1=0x0 CR2=0x00000000123890ab
00512960000i[CPU0 ] | CR3=0x00482000 CR4=0x00000000
00512960000i[CPU0 ] >> jmp .+0xfffffffe (0x001008c1) : EBFE
00512960000i[CMOS ] Last time is 1244330815 (Sun Jun 7 00:26:55 2009)
00512960000i[XGUI ] Exit
00512960000i[ ] restoring default signal behavior
00512960000i[CTRL ] quit_sim called with exit code 1

Thanks in advance

Lynton
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: Newbie: Problem with fork() in Tutorial ?

Post by NickJohnson »

The general purpose of JamesM's tutorial is to get you to understand how a kernel is not only put together, but also built. As such, the instructions are precise and useful, but the product is no where near a fully functional kernel. If you want to actually write an OS, you probably first want to work through the tutorial step by step to gain an understanding of the concepts. However, it is not intended as a base for a real kernel - you will probably end up making small modifications to it to get a feel for OS development, but then start completely from scratch on your own design and implementation.

If you want to take a look inside a functional kernel, there are open source OS projects out there (Linux (esp. older versions), MINIX, L4 etc.), as well as quite a few written by the developers here (which may or may not be well documented). I'm sure they would be flattered if you took a look at their code; for now, however, it would probably be easier for you to work with the tutorial code, until you get a good grip on things.

And your problem may stem from the fact that you're not using a cross compiler, so things are generally set up to work on your real machine, not the emulated one in Bochs. There is a good guide on how to do this in the wiki: http://wiki.osdev.org/GCC_Cross-Compiler
Lynton
Posts: 4
Joined: Sat Jun 06, 2009 5:05 pm

Re: Newbie: Problem with fork() in Tutorial ?

Post by Lynton »

OK thanks for the reply.
I've now built the Cross Compiler using the gcc-core-4.3.4 and binutils-2.19 as per step 1 in the Wiki article.
I then added a cc=i586-elf-gcc flag to the Makefile for the JamesM Kernel.
The compilation works perfectly but I still end up with the page fault unfortunately.
I'll do some more investigating with my limited knowledge but do you still think it is a cross compiler issue?

Many Thanks

Lynton
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Re: Newbie: Problem with fork() in Tutorial ?

Post by pcmattman »

Can you add extra output function calls in between the other functions so you know where it's failing? For all we know, it may not even be faulting due to fork().
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: Newbie: Problem with fork() in Tutorial ?

Post by Combuster »

CR2=0x00000000123890ab
Anybody else thinks that's a weird number?
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
NickJohnson
Member
Member
Posts: 1249
Joined: Tue Mar 24, 2009 8:11 pm
Location: Sunnyvale, California

Re: Newbie: Problem with fork() in Tutorial ?

Post by NickJohnson »

Hmm... I think that's the magic number used by fork() to distinguish the parent from the child. The weird part is that it's a magic number returned by read_eip(), but you're obviously not jumping to it to get the fault. Is this code just from the tarball on the tutorial, or did you manually copy it? If it's the latter, please post the fork() function.
Lynton
Posts: 4
Joined: Sat Jun 06, 2009 5:05 pm

Re: Newbie: Problem with fork() in Tutorial ?

Post by Lynton »

OK
I admit the code I am working with has been hacked around with very slightly but not in a way that should break anything. Eg. I have moved all the .h files to an include folder!
Just to make sure I didn't break anything though I unpacked the original user_mode.tar.gz tarball and recompiled and ran that with no problems. ie. the "Welcome to user world message" appears.
I then modified the main.c file to include the int ret=fork(); line and recompiled and straight away I get the exact same page fault without touching or moving any of the other files from the original tarball.
So.....I either have a genuine problem with fork() or my compiler is doing something weird.

Anyways here is the fork() function as found in task.c.

int fork()
{
// We are modifying kernel structures, and so cannot be interrupted.
asm volatile("cli");

// Take a pointer to this process' task struct for later reference.
task_t *parent_task = (task_t*)current_task;

// Clone the address space.
page_directory_t *directory = clone_directory(current_directory);

// Create a new process.
task_t *new_task = (task_t*)kmalloc(sizeof(task_t));
new_task->id = next_pid++;
new_task->esp = new_task->ebp = 0;
new_task->eip = 0;
new_task->page_directory = directory;
current_task->kernel_stack = kmalloc_a(KERNEL_STACK_SIZE);
new_task->next = 0;

// Add it to the end of the ready queue.
// Find the end of the ready queue...
task_t *tmp_task = (task_t*)ready_queue;
while (tmp_task->next)
tmp_task = tmp_task->next;
// ...And extend it.
tmp_task->next = new_task;

// This will be the entry point for the new process.
u32int eip = read_eip();

// We could be the parent or the child here - check.
if (current_task == parent_task)
{
// We are the parent, so set up the esp/ebp/eip for our child.
u32int esp; asm volatile("mov %%esp, %0" : "=r"(esp));
u32int ebp; asm volatile("mov %%ebp, %0" : "=r"(ebp));
new_task->esp = esp;
new_task->ebp = ebp;
new_task->eip = eip;
// All finished: Reenable interrupts.
asm volatile("sti");

// And by convention return the PID of the child.
return new_task->id;
}
else
{
// We are the child - by convention return 0.
return 0;
}

}


If I add a "monitor_write" line in fork() as an aid to diagnosis I find monitor_write("Hello") displays but monitor_write("Hello2") does not so I guess the page fault occurs somewhere here.

// Take a pointer to this process' task struct for later reference.
task_t *parent_task = (task_t*)current_task;
monitor_write("Hello");
// Clone the address space.
page_directory_t *directory = clone_directory(current_directory);
monitor_write("Hello2");

Does any of that help to isolate where the problem might be ?

Thanks

Lynton
CoryXie
Posts: 3
Joined: Sun Jul 12, 2009 6:50 am

Re: Newbie: Problem with fork() in Tutorial ?

Post by CoryXie »

I had the same problem with you, but I solved it by doing some hack.
You just need to change in task.c where the kernel_stack is allocated:
in initialise_tasking:
change
current_task->kernel_stack = kmalloc_a(KERNEL_STACK_SIZE);
to
current_task->kernel_stack = kmalloc(KERNEL_STACK_SIZE);

in fork():
change
new_task->page_directory = directory;
current_task->kernel_stack = kmalloc_a(KERNEL_STACK_SIZE);
to
new_task->page_directory = directory;
new_task->kernel_stack = kmalloc(KERNEL_STACK_SIZE);

You should do two changes in fork() as above, because the code is
obviously allocating new task strcut and assigning its stack area, not the
current stack stack;

I still have not figured why kmalloc_a will cause the page fault problem;

The value 0x123890ab is the KHEAP_MAGIC, but I don't know why this
value is changed to be in CR2.

Maybe we can dig more, or maybe JamesM can post some insights on this.

BTW, the Tutorial is very good for me to learn the basic things, so I'd like to
say thanks to JamesM!

Thanks,
Cory
bpaterni
Posts: 12
Joined: Sat May 30, 2009 4:38 pm

Re: Newbie: Problem with fork() in Tutorial ?

Post by bpaterni »

I had a similar problem except my kernel would triple fault when I tried to clone_directory(). After a few hours of inserting infinite loops and making 'debugging' calls to kmalloc_ap() I finally tracked it down to the find_smallest_hole function where it searches for a hole big enough to hold the page-aligned size. Here's what I did:

Code: Select all

		if(page_align) {
#if 0
			unsigned int location = (unsigned int)header;
			signed int offset = 0;
			if((location+sizeof(headerT)) & 0xFFFFF000)
				offset = 0x1000 - (location+sizeof(headerT) % 0x1000);
			signed int hole_size = (signed int)header->size - offset;
			if(hole_size >= (signed int)size)
				break;
#endif
#if 1
			unsigned int preview_location = PAGE_ALIGN_ADDPAGE((unsigned int)header);
			unsigned int offset = preview_location - (unsigned int)header;
			if(header->size >= offset)
				if(header->size-offset >= size)
					break;
#endif
		}
where

Code: Select all

#define PAGE_ALIGN_ADDPAGE(addr) ((addr)+(0x1000)-(addr&0xFFF))
I'm not sure how it fixes the problem though, because essentially it does the same thing as JamesM's tutorial code; just in a more obvious and understandable manner
User avatar
gzaloprgm
Member
Member
Posts: 141
Joined: Sun Sep 23, 2007 4:53 pm
Location: Buenos Aires, Argentina
Contact:

Re: Newbie: Problem with fork() in Tutorial ?

Post by gzaloprgm »

I think I found a problem that is repeated at least 4 times in JamesMolloy's Heap Tutorial:

Code: Select all

if((location+sizeof(headerT)) & 0xFFFFF000)
The if tries to say "is location+sizeof(headerT) aligned within a 4096 byte boundary?", but instead it doesn't work!

It always return 1 unless location+sizeof(headerT) < 4096.

Check this example:

0x00000000 & 0xFFFFF000 = 0
0x00000001 & 0xFFFFF000 = 0
0x00000002 & 0xFFFFF000 = 0
...
0x00000FFF & 0xFFFFF000 = 0
0x00001000 & 0xFFFFF000 = 1
0x00001001 & 0xFFFFF000 = 1
...
0xFFFFFFFF & 0xFFFFF000 = 1

I am very sure it should be

Code: Select all

if((something & 0xFFF) != 0) It's not aligned
Cheers, gzaloprgm
Visit https://gzalo.com : my web site with electronic circuits, articles, schematics, pcb, calculators, and other things related to electronics.
Lynton
Posts: 4
Joined: Sat Jun 06, 2009 5:05 pm

Re: Newbie: Problem with fork() in Tutorial ?

Post by Lynton »

Hi,
You guys are obviously well ahead of me with this :D
I have looked at the changes you both outline but to be honest after having had several weeks absent from OS programming due to work commitments I cannot replicate the problem any longer.
I did have to rebuild my Linux machine where I do the compiling so maybe now I have a different version of Bochs or something.
On my original Linux install I do remember also trying QEMU which didn't seem to exhibit the problem so maybe it was Bochs being a bit strange?
Anyway thanks for the help with this and the code mods as I think they will come in useful in the very near future as I progress with this.

Lynton
bpaterni
Posts: 12
Joined: Sat May 30, 2009 4:38 pm

Re: Newbie: Problem with fork() in Tutorial ?

Post by bpaterni »

ha, you are correct, gzaloprgm. Now, aside from a small typo, my kernel now works with the tutorial code enabled. Thanks for pointing that out.

And hopefully the op doesn't mind if I hijack the thread now because I'm having another problem with fork()/switch_task(). Which is that my kernel doesn't seem to be switching back to process 1 when timer_callback is called like it shows in james' multitasking tutorial at the bottom of the page. In other words, I'm only getting the output of process 2. I'm not sure either of how to go about solving this since it gets really hairy around read_eip(). If someone smarter than me would like to take a gander at the code, it's available at github
User avatar
Creature
Member
Member
Posts: 548
Joined: Sat Dec 27, 2008 2:34 pm
Location: Belgium

Re: Newbie: Problem with fork() in Tutorial ?

Post by Creature »

gzaloprgm wrote:I think I found a problem that is repeated at least 4 times in JamesMolloy's Heap Tutorial:

Code: Select all

if((location+sizeof(headerT)) & 0xFFFFF000)
The if tries to say "is location+sizeof(headerT) aligned within a 4096 byte boundary?", but instead it doesn't work!

It always return 1 unless location+sizeof(headerT) < 4096.

Check this example:

0x00000000 & 0xFFFFF000 = 0
0x00000001 & 0xFFFFF000 = 0
0x00000002 & 0xFFFFF000 = 0
...
0x00000FFF & 0xFFFFF000 = 0
0x00001000 & 0xFFFFF000 = 1
0x00001001 & 0xFFFFF000 = 1
...
0xFFFFFFFF & 0xFFFFF000 = 1

I am very sure it should be

Code: Select all

if((something & 0xFFF) != 0) It's not aligned
Cheers, gzaloprgm
That is indeed a problem, JamesM has said it in another forum topic before (can't remember where). It's supposed to be:

Code: Select all

if(address & 0x00000FFF)
{
   address &= 0xFFFFF000;
   address += PAGE_SIZE;
}
to align an address.
bpaterni wrote:ha, you are correct, gzaloprgm. Now, aside from a small typo, my kernel now works with the tutorial code enabled. Thanks for pointing that out.

And hopefully the op doesn't mind if I hijack the thread now because I'm having another problem with fork()/switch_task(). Which is that my kernel doesn't seem to be switching back to process 1 when timer_callback is called like it shows in james' multitasking tutorial at the bottom of the page. In other words, I'm only getting the output of process 2. I'm not sure either of how to go about solving this since it gets really hairy around read_eip(). If someone smarter than me would like to take a gander at the code, it's available at github
When I read about your problem, I immediately checked out your ISR.c function and I was right (had the same issue before): you're acknowledging the IRQ AFTER calling the IRQ handler, when multitasking, the processes would get switched before the IRQ is ever acknowledged, which would never happen.

Make sure the IRQ is acknowledged BEFORE you go into the ISR handler.

Hope this helps and that this is indeed the problem,
Creature
When the chance of succeeding is 99%, there is still a 50% chance of that success happening.
frank
Member
Member
Posts: 729
Joined: Sat Dec 30, 2006 2:31 pm
Location: East Coast, USA

Re: Newbie: Problem with fork() in Tutorial ?

Post by frank »

Creature wrote:When I read about your problem, I immediately checked out your ISR.c function and I was right (had the same issue before): you're acknowledging the IRQ AFTER calling the IRQ handler, when multitasking, the processes would get switched before the IRQ is ever acknowledged, which would never happen.

Make sure the IRQ is acknowledged BEFORE you go into the ISR handler.
I would say that's not the correct way to do it. If you acknowledge the IRQ before the IRQ handler can run to deal with the device then you may end up with another interrupt. Take for example the hard drive. When the hard drive needs something it raises its interrupt line. The only way to disable that interrupt line is to read from the status register. Therefore if you were to acknowledge the IRQ before reading from the status register then the PIC would see the interrupt line as still raised and it will refire the IRQ.
User avatar
Creature
Member
Member
Posts: 548
Joined: Sat Dec 27, 2008 2:34 pm
Location: Belgium

Re: Newbie: Problem with fork() in Tutorial ?

Post by Creature »

frank wrote:
Creature wrote:When I read about your problem, I immediately checked out your ISR.c function and I was right (had the same issue before): you're acknowledging the IRQ AFTER calling the IRQ handler, when multitasking, the processes would get switched before the IRQ is ever acknowledged, which would never happen.

Make sure the IRQ is acknowledged BEFORE you go into the ISR handler.
I would say that's not the correct way to do it. If you acknowledge the IRQ before the IRQ handler can run to deal with the device then you may end up with another interrupt. Take for example the hard drive. When the hard drive needs something it raises its interrupt line. The only way to disable that interrupt line is to read from the status register. Therefore if you were to acknowledge the IRQ before reading from the status register then the PIC would see the interrupt line as still raised and it will refire the IRQ.
I agree, that's why the first time I learned about it, I switched it the other way around. I thought "normally you have to acknowledge the IRQ after you're done handling it, don't you?" but then got problems with multitasking (which is still broke now apparently, even when loading and saving the same process, so I must've messed something seriously up).
When the chance of succeeding is 99%, there is still a 50% chance of that success happening.
Post Reply