The important part to understand is that fork() creates a copy of the current process, assuming there only is a single thread in the process (the one calling fork). This is how it works:
- Allocate a new process structure, this is the child process.
- Allocate a new address space for use by the child process and map the kernel in it, too.
- Iterate over all the user-space memory mappings of the parent process and map new memory at the same location in the child process and copy the data so each address space is identical.
- Copy other miscellaneous process attributes into the child process, such as signal handlers. Note that a perfect copy is not made, for instance any pending signals is not pending in the child process.
- Allocate a new thread structure associated with the child process, this is the child thread.
- Locate where the system call will return to and set the registers of the child thread so it resumes execution there (making a copy of the original thread) except the register containing the system call result is zero (marking it as the child).
- Mark the child thread as runnable, the current process has now been forked.
- Return the pid of the child process (marking the parent thread as such).
If there are multiple threads in the current process, they are not copied as well, only the current thread. You can think of it as a primitive that creates a copy of the current thread (except the return value) in a new thread in a mostly identical process. Of course, you return -1 and undo the current work if this fails at any point.
Naturally, you can speed things up using copy-on-write where the address space contents is not copied until it is written to in any of the two thread. This is somewhat complicated and I recommend against doing it in your first OS until you got a basic fork working, you don't need this optimization for a long while anyway in my experience (I still don't have it). Keep in mind that in many cases, you don't actually do a full copy, such as mmap'd shared files and all that. There's a heap of special cases you'll want to get to eventually.
My kernel doesn't have a fork() system call as such, I have a more general thread_create() system call that creates a thread with the given registers in potentially a new thread. A user-space fork() fills in the registers desired for the new thread (identical, except a single value) and creates that thread in a cloned process, this gives the same result but with a stronger primitive. Linux does the same with the clone() system call, though mine is better designed.
I suspect you don't know paging well, you'll want to do that.