OSDev.org

Posted: **Wed Feb 14, 2024 8:38 pm**

Hello again folks! I've been recently attempting to implement the fork() command, and have successfully gotten to a point where I can fork once and have two processes. However, if I try to fork again, it appears that the stack section I create for the process and copy from the parent now contains only zeroes - despite me never writing anything to the stack explicitly.

For example, here are the stacks in physical memory after running fork(), where esp is marked with a left arrow (<):

Code: Select all

**** PARENT (first fork) esp: 0xC020EF94 ****
0x20EF80: 0xC0205014
0x20EF84: 0xC020EF80
0x20EF88: 0xC0211000
0x20EF8C: 0x0
0x20EF90: 0xC0204E45
0x20EF94: 0xC0209EA8 <
0x20EF98: 0xC020EF94
0x20EF9C: 0xC020F000
0x20EFA0: 0xC020EFCC
0x20EFA4: 0xC0204E45
0x20EFA8: 0x4000
**** CHILD (first fork -> PID 1) esp: 0xBFFFFF94 ****
0x803F80: 0xC0204FD8
0x803F84: 0x20B000
0x803F88: 0x800000
0x803F8C: 0x0
0x803F90: 0xC0204E45
0x803F94: 0xC0209EA8 <
0x803F98: 0xC020EF94
0x803F9C: 0xC020F000
0x803FA0: 0xC020EFCC
0x803FA4: 0xC0204E45
0x803FA8: 0x4000
**** PARENT (second fork) esp: 0xC020EF94 ****
0x20EF80: 0xC0205014
0x20EF84: 0xC020EF80
0x20EF88: 0xC0211000
0x20EF8C: 0x4000
0x20EF90: 0xC0204E45
0x20EF94: 0xC0200030 <
0x20EF98: 0xC020EF94
0x20EF9C: 0xC020F000
0x20EFA0: 0xC020EFCC
0x20EFA4: 0xC0204E45
0x20EFA8: 0x4000
**** CHILD (second fork -> PID 2) esp: 0xBFFFFF94 ****
0x807F80: 0xC0204FD8
0x807F84: 0x20B000
0x807F88: 0x804000
0x807F8C: 0x4000
0x807F90: 0xC0204E45
0x807F94: 0xC0200030 <
0x807F98: 0xC020EF94
0x807F9C: 0xC020F000
0x807FA0: 0xC020EFCC
0x807FA4: 0xC0204E45
0x807FA8: 0x4000

This all seems fine - given an downwards-growing stack, as long as the values above the pointer in memory (below in the text-based model) haven't changed, I believe it should properly pop the return value, while keeping the return PID in eax. And for the first fork(), this appears to work perfectly!

Code: Select all

**** SCHEDULER esp (PID=1): 0xBFFFFF94 ****
0x803F80: 0xC0204FD8
0x803F84: 0x20B000
0x803F88: 0x804000
0x803F8C: 0x4000
0x803F90: 0xC0204E45
0x803F94: 0xC0200030 <
0x803F98: 0xC020EF94
0x803F9C: 0xC020F000
0x803FA0: 0xC020EFCC
0x803FA4: 0xC0204E45
0x803FA8: 0x4000
**** CHILD (FORKED) esp: 0xBFFFFF94 ****
0x803F80: 0xC0204E6D
0x803F84: 0xBFFFFF80
0x803F88: 0xC021F000
0x803F8C: 0x4000
0x803F90: 0xC0204E45
0x803F94: 0xC0200030 <
0x803F98: 0xC020EF94
0x803F9C: 0xC020F000
0x803FA0: 0xC020EFCC
0x803FA4: 0xC0204E45
0x803FA8: 0x4000

However, when running a second time, the stack ends up turning to zeroes:

Code: Select all

**** SCHEDULE esp (PID=2): 0xBFFFFF94 ****
0x807F80: 0x0
0x807F84: 0x0
0x807F88: 0x0
0x807F8C: 0x0
0x807F90: 0x0
0x807F94: 0x0 <
0x807F98: 0x0
0x807F9C: 0x0
0x807FA0: 0x0
0x807FA4: 0x0
0x807FA8: 0x0
**** CHILD (FORKED) esp: 0xBFFFFF94 ****
0x807F80: 0xC0204E6D
0x807F84: 0xBFFFFF80
0x807F88: 0xC022A000
0x807F8C: 0x0
0x807F90: 0x0
0x807F94: 0x0 <
0x807F98: 0x0
0x807F9C: 0x0
0x807FA0: 0x0
0x807FA4: 0xC0204E45
0x807FA8: 0x0

This, of course, results in a 'ret' instruction jumping to NULL - and a page fault. Additionally, there appears to be a significant amount of change between the scheduler running and its return to the child process, of which I am unsure of the origin. My question, then, lies in where this comes from - is this likely something coming from the hardware that's overwriting these values, or something within my own code? I've been trying to crack this issue for around a week and a half now, and I don't see anything in the manuals about the instructions used that would cause such a thing.

Here are some relevant code snippets, for context - and I should probably make the repo public too - the branch for this issue is at https://github.com/Minater247/XanaduOS/tree/fix_fork. I understand this is very complicated so I'm not expecting anyone to read the whole thing - but hopefully this helps for understanding control flow, what the stack is doing, and the like.

1. Fork

Code: Select all

extern uint32_t read_eip();
uint32_t fork() {
    asm volatile ("cli");

    process_t *new_process = create_task(NULL, current_process->stack_size, clone_page_directory(current_pd), 0, 0, 0);

    new_process->status = TASK_STATUS_FORKED; //don't run yet (set to TASK_STATUS_FORKED when we're ready to run it)

    process_t *parent_process = current_process;

    uint32_t eip = read_eip();

    asm volatile ("cli");

    if (current_process == parent_process) {
        uint32_t esp, ebp;
        asm volatile ("mov %%esp, %0" : "=r" (esp));
        asm volatile ("mov %%ebp, %0" : "=r" (ebp));

        //copy the stack from the current process and update pointers
        uint32_t old_stack_offset = parent_process->stack_pos - esp;

        //copy all pages
        uint32_t smaller_stack = (current_process->stack_size < new_process->stack_size) ? current_process->stack_size : new_process->stack_size;

        for (uint32_t i = 0; i < smaller_stack; i += 0x1000) {
            uint32_t phys = virt_to_phys(parent_process->stack_pos - i - 0x1000, parent_process->pd);
            uint32_t new_phys = virt_to_phys(new_process->stack_pos - i - 0x1000, new_process->pd);
            phys_copypage(phys, new_phys);
        }

        new_process->esp = new_process->stack_pos - old_stack_offset;
        new_process->ebp = new_process->esp + (ebp - esp);

        new_process->entry_or_return = eip;

        asm volatile ("sti");
        return new_process->pid;
    } else {
        asm volatile ("sti");
        return 0;
    }
}

2. create_task

Code: Select all

process_t *create_task(void *entry_point, uint32_t stack_size, page_directory_t *pd, int argc, char **argv, char **envp) {
    process_t *new_process = (process_t *)kmalloc(sizeof(process_t));

    asm volatile ("cli");

    if (current_process->pid != 0) {
        //free the previous stack's pages
        for (uint32_t i = 0; i < current_process->stack_size; i += 0x1000) {
            free_page(virt_to_phys(current_process->stack_pos - i, pd), pd);
        }
    }

    uint32_t stack = 0xC0000000 - stack_size;
    for (uint32_t i = 0; i < stack_size; i += 0x1000) {
        page_table_entry_t first = first_free_page();
        alloc_page_kmalloc(stack + i, first.pd_entry * 0x400000 + first.pt_entry * 0x1000, true, false, true, pd);
    }
    stack = 0xC0000000;

    new_process->pid = next_pid++;
    new_process->next = NULL;
    new_process->num_fds = current_process->num_fds;
    new_process->max_fds = 256;
    new_process->pd = pd;
    new_process->status = TASK_STATUS_INITIALIZED;
    new_process->esp = (uint32_t)stack;
    new_process->ebp = (uint32_t)stack;
    new_process->stack_pos = stack; //static location of the stack top in memory
    new_process->stack_size = stack_size;
    new_process->argc = argc;
    new_process->argv = argv;
    new_process->envp = envp;
    memset(new_process->fds, 0, sizeof(new_process->fds));
    for (int i = 0; i < 256; i++) {
        if (current_process->fds[i] != NULL) {
            new_process->fds[i] = copy_descriptor(current_process->fds[i], i);
        } else {
            new_process->fds[i] = NULL;
        }
    }

    // Add to linked list
    process_t *curr_process = head_process;
    while (curr_process->next != NULL) {
        curr_process = curr_process->next;
    }
    curr_process->next = new_process;

    new_process->entry_or_return = (uint32_t)entry_point;

    return new_process;
}

3. Scheduler

Code: Select all

void timer_interrupt_handler(uint32_t ebp, uint32_t esp)
{
	outb(0x20, 0x20); //this isn't a regular IRQ handler, this is coming directly tables_asm, so we need to send an EOI to the PIC

	process_t *old_process = current_process;

	if (old_process->status == TASK_STATUS_RUNNING)
	{
        // Already running, so save context
		old_process->ebp = ebp;
		old_process->esp = esp;
	}

	process_t *new_process = old_process->next;
    if (new_process == NULL)
    {
        new_process = head_process; // If we ran off the end of the list, go back to the beginning
    }
    while (new_process->status != TASK_STATUS_RUNNING && new_process->status != TASK_STATUS_INITIALIZED && new_process->status != TASK_STATUS_FORKED)
    {
        new_process = new_process->next;
        if (new_process == NULL)
        {
            new_process = head_process; // If we ran off the end of the list, go back to the beginning
        }
    }

    current_process = new_process;

    terminal_printf("pswitch %d -> %d\n", old_process->pid, new_process->pid);
    terminal_printf("New stack physLoc: 0x%x\n", virt_to_phys(new_process->stack_pos - 0x1000, new_process->pd));

	if (new_process->status == TASK_STATUS_INITIALIZED)
	{
		new_process->status = TASK_STATUS_RUNNING;
        // First time running, so set up stack and jump to entry point
        asm volatile ("mov %0, %%cr3" : : "r" (new_process->pd->phys_addr));
        current_pd = new_process->pd;
        uint32_t stack = new_process->stack_pos;
        if (new_process->argv != NULL) {
            uint32_t argc = 0;
            uint32_t argv_size = 0;
            argc = 0;
            while (new_process->argv[argc] != NULL) {
                argv_size += strlen(new_process->argv[argc]) + 1;
                argc++;
            }

            stack -= (argc + 1) * sizeof(char *); // argv + NULL
            char **new_argv = (char **)stack;
            stack -= argv_size; // argv strings
            uint32_t new_argv_pos = stack;
            //copy the arguments
            for (uint32_t i = 0; i < argc; i++) {
                new_argv[i] = (char *)new_argv_pos;
                strcpy((char *)new_argv_pos, new_process->argv[i]);
                new_argv_pos += strlen(new_process->argv[i]) + 1;
            }
            new_argv[argc] = NULL;
            new_process->argv = new_argv;
        }
        if (new_process->envp != NULL) {
            uint32_t envp_size = 0;
            uint32_t envc = 0;
            if (new_process->envp != NULL) {
                while (new_process->envp[envc] != NULL) {
                    envp_size += strlen(new_process->envp[envc]) + 1;
                    envc++;
                }
            }

            stack -= (envc + 1) * sizeof(char *); // envp + NULL
            char **new_envp = (char **)stack;
            stack -= envp_size; // envp strings
            uint32_t new_envp_pos = stack;

            //copy the environment
            for (uint32_t i = 0; i < envc; i++) {
                new_envp[i] = (char *)new_envp_pos;
                strcpy((char *)new_envp_pos, new_process->envp[i]);
                new_envp_pos += strlen(new_process->envp[i]) + 1;
            }
            new_envp[envc] = NULL;
            new_process->envp = new_envp;
        }

        //adjust the new process' esp
        new_process->esp = stack;

        asm volatile ("mov %0, %%esp" : : "r" (new_process->esp));
        asm volatile ("mov %0, %%ebp" : : "r" (new_process->ebp));
        asm volatile ("sti");
        //set the function up as an function returning an int
        int return_code = init_program(new_process->argc, new_process->argv, new_process->envp, (void *)new_process->entry_or_return);
        asm volatile ("cli");
        new_process->entry_or_return = return_code;
        //remove the process from the scheduler
        free_process(new_process);
        kpanic("Something went wrong with the scheduler!");
	} else if (new_process->status == TASK_STATUS_FORKED) {
        // We're just returning to the parent process' address, so set esp/ebp and jump to entry
        asm volatile ("mov %0, %%cr3" : : "r" (new_process->pd->phys_addr));
        current_pd = new_process->pd;

        new_process->status = TASK_STATUS_RUNNING;
        asm volatile ("mov %0, %%esp" : : "r" (new_process->esp));
        asm volatile ("mov %0, %%ebp" : : "r" (new_process->ebp));
        asm volatile ("sti");
        asm volatile ("jmp *%0" : : "r" (new_process->entry_or_return));
    }

    asm volatile ("mov %0, %%cr3" : : "r" (new_process->pd->phys_addr));
    current_pd = new_process->pd;

    asm volatile ("mov %0, %%esp" : : "r" (new_process->esp));
    asm volatile ("mov %0, %%ebp" : : "r" (new_process->ebp));
    asm volatile ("popa");
    asm volatile ("sti");
    asm volatile ("iret");


	kpanic("Something went wrong with the scheduler!");
}

4. main

Code: Select all

void kernel_main() {
	asm volatile ("cli");

	boot_initialize();

	fopen("/dev/kbd0", "r"); // stdin
	fopen("/dev/trm", "r+"); // stdout
	fopen("/dev/trm", "r+"); // stderr

	asm volatile ("sti");

	uint32_t pid = fork();

	terminal_printf("Fork returned %d for PID (f1)%d\n", pid, getpid());

	if (pid == 0) {
		terminal_printf("Hello from child!\n");
		terminal_printf("First fork PID is %d\n", getpid());
		while (true);
	} else {
		terminal_printf("Hello from parent!\n");
		terminal_printf("After fork, PID is %d\n", getpid());
	}

	//fork again!
	terminal_printf("prefork PID: %d\n", getpid());
	pid = fork();
	terminal_printf("Fork returned %d for PID (f2)%d\n", pid, getpid());

	if (pid == 0) {
		terminal_printf("Hello from child 2!\n");
		terminal_printf("Second fork PID is %d\n", getpid());
		while (true);
	} else {
		terminal_printf("Hello from parent 2!\n");
		terminal_printf("After second fork, PID is %d\n", getpid());
	}

	asm volatile ("cli");
	process_t *new_process = process_load_elf("/mnt/ramdisk/bin/xansh.elf");
	terminal_printf("Process loaded with PID %d\n", new_process->pid);

	while (new_process->status != TASK_STATUS_FINISHED) {}
	terminal_printf("\nProcess finished with code 0x%x\n", new_process->entry_or_return);

	terminal_printf("Kernel is finished running. Press q to page fault!\n");

    char buf;
	file_descriptor_t *kbd = stdin;
	while (true) {
		int read = fread(&buf, 1, 1, kbd);
		if (read != 0) {
			if (buf == 'q') {
				terminal_printf("%c", *(char *)0xA0000000);
			}
		}
	}
}

Any help will be very much appreciated! And if there is anything I left out, I will try and respond quickly.

Posted: **Thu Feb 15, 2024 12:13 am**

minater247 wrote:is this likely something coming from the hardware that's overwriting these values, or something within my own code?

Definitely your code. Were you following a kernel development tutorial?

minater247 wrote:1. Fork

The fork() system call duplicates the userspace caller's context, not the kernel's context. The new kernel thread's context has to be created entirely from scratch.

minater247 wrote:3. Scheduler

You can't switch stacks in inline assembly. Here's a pretty good example of what task switching should look like. Task switching should not require a timer interrupt.

minater247 wrote:4. main

A typical kernel has a single address space, which makes fork() impossible. Why do you want to use fork() inside your kernel?

Posted: **Thu Feb 15, 2024 1:57 am**

Definitely your code.

Alright, good to know. Wanted to make sure it wasn't some oddity of the stack I hadn't seen, thank you!

Were you following a kernel development tutorial?

Actually no in this case, I've seen that page, and while I took some ideas from it, absolutely none of the code came from that set of tutorials. The only place I took any semblance of code from was Bran's IDT/GDT tutorials.

The fork() system call duplicates the userspace caller's context, not the kernel's context. The new kernel thread's context has to be created entirely from scratch.

Oh, I see - so this is more fitting for always being a syscall, then. This was meant to be run on userspace processes, where the kernel copies the stack over and jumps to the process on schedule, but reading through this I'm guessing the first non-kernel thread shouldn't be a fork of the kernel's thread which switches to userspace. And, related:

Why do you want to use fork() inside your kernel?

That's why, to make the first userspace process, while making the fork() function in such a way that any process may be cloned. Although it seems that's not really how it's meant to be done, then.

You can't switch stacks in inline assembly. [...] Task switching should not require a timer interrupt.

I'm not sure I understand why? The reason I used inline assembly was because jumping to actual external assembly required using the stack (which was in an undefined state), and to my present knowledge it just inserts the assembly instruction at the given location? What would be the difference between this and writing the entire function in assembly? I had used C for accessing the process struct's fields simply.

And on the the status of interrupts, why not? Doesn't every pre-emptive scheduler use some sort of timer interrupt to run task scheduling? In my case I've been using the PIT to call the timer_interrupt function to schedule the next process, but is this a bad way to do this for some reason?

Posted: **Thu Feb 15, 2024 9:50 am**

minater247 wrote:
You can't switch stacks in inline assembly. [...] Task switching should not require a timer interrupt.
I'm not sure I understand why? The reason I used inline assembly was because jumping to actual external assembly required using the stack (which was in an undefined state), and to my present knowledge it just inserts the assembly instruction at the given location? What would be the difference between this and writing the entire function in assembly? I had used C for accessing the process struct's fields simply.

It's because the optimizer should run as late as possible in the compilation process. I don't exactly understand why, but I remember computer science articles strongly warning against premature optimization. By the time the optimizer gets to work, there's no longer any way to tell which code was generated by the compiler and which is inline assembly. And optimizers these days will absolutely mangle code they don't understand.

Besides, compilers these days try to mother you. A coder I knew once wrote some bad C code, expecting the compiler to have a minimalistic implementation which would just work. We expected C to be minimalistic. GCC silently inserted code to move an entire array twice -- not minimalistic at all -- just to be "safe". (I no longer agree with GCC or my old friend, but that's another story. I'm just explaining compiler behaviour.)

Posted: **Thu Feb 15, 2024 10:47 am**

eekee wrote:but I remember computer science articles strongly warning against premature optimization.

That's a whole different kind of premature optimization those articles were talking about. They mean programmers squeezing every drop of performance out of code even if the code isn't a bottleneck. Optimizers really have only one place they can run - after the AST is generated but before assembly is output.

The reason why you shouldn't switch stacks in C is simple - switching stacks results in switching to a new call frame, and who knows what will happen because of that.

Posted: **Thu Feb 15, 2024 9:27 pm**

minater247 wrote:I'm not sure I understand why?

Because the compiler generates code that assumes inline assembly will never do that. (Also, because the compiler developers said so.)

minater247 wrote:The reason I used inline assembly was because jumping to actual external assembly required using the stack (which was in an undefined state),

Compiled C code requires the stack to always be in a defined state. If the stack is undefined, you can't use C.

minater247 wrote:and to my present knowledge it just inserts the assembly instruction at the given location?

It also tells the compiler what that inline assembly does so the compiler can optimize around it. If your inline assembly changes something the compiler relies on without telling the compiler, the compiler will generate broken code.

minater247 wrote:What would be the difference between this and writing the entire function in assembly?

You wouldn't write the entire function in assembly, just the part that switches stacks.

minater247 wrote:And on the the status of interrupts, why not? Doesn't every pre-emptive scheduler use some sort of timer interrupt to run task scheduling? In my case I've been using the PIT to call the timer_interrupt function to schedule the next process, but is this a bad way to do this for some reason?

Preemptive schedulers use lots of different things to run task switching, not just timer interrupts. You need to write your task switching code so it can be called from (almost) anywhere in your kernel, so you can have tasks that yield and tasks that block in addition to tasks that get interrupted by a timer.

Posted: **Fri Feb 16, 2024 5:54 am**

nexos wrote:
eekee wrote:but I remember computer science articles strongly warning against premature optimization.
That's a whole different kind of premature optimization those articles were talking about. They mean programmers squeezing every drop of performance out of code even if the code isn't a bottleneck. Optimizers really have only one place they can run - after the AST is generated but before assembly is output.

Thanks, you're probably right about premature optimization. About optimizers though, I recall a very different arrangement. Years ago, GCC would generate assembly language, run GAS to assemble the code, then optimize the resultant machine code. Similarly, Ken Thompson's C compiler outputs machine code, (it's architecture-specific,) but the optimizer is in the linker. There's also the question of how whole-program optimization works if it can't run very late in the build process.

If you like, I could try to find a version of GCC which does this. I probably still have it and its texinfo files in an old Linux VM; I just don't want to look it up right now.

Posted: **Fri Feb 16, 2024 7:10 am**

Hmm interesting. I know modern GCC converts to AST into GIMPLE, and then the optimizer works on GIMPLE. I can understand this arrangement working better as GIMPLE is designed for being optimized, whereas machine code is not. More than likely though GCC got that idea from the LLVM guys.

Posted: **Fri Feb 16, 2024 9:18 am**

nexos wrote:I can understand this arrangement working better as GIMPLE is designed for being optimized, whereas machine code is not. More than likely though GCC got that idea from the LLVM guys.

Unless time travel is involved, this is unlikely, since the GIMPLE optimization stuff is far older than LLVM. And the documentation claims as primary inspiration this paper from 1992. Way before the ecgs war, let alone LLVM.

Posted: **Fri Feb 16, 2024 10:17 am**

Ok, thanks for the clarification. I wasn't entirely sure.

OSDev.org

Return section of stack zeroed when copying: HW or OS cause?

Return section of stack zeroed when copying: HW or OS cause?

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca

Re: Return section of stack zeroed when copying: HW or OS ca