Page 1 of 1

User mode problems

Posted: Sat Aug 27, 2011 4:31 am
by zity
Hi
I've have lately implemented processes in my system and it all seems to be working properly in kernel space, but unfortunately not in user space. The problem arises when I execute an application from within another application. The first application runs as expected but as soon as I spawn a second application (processes) from the first process, the first process crashes while waiting for the second process to be up and running. I have thought of different problems and made several minor fixes related to the stack and the segment selectors, but without any change. And now I've run out of ideas.

The code below is used to spawn a new processes. The status variable should of course be thread/process specific but until I get it working with just two processes I'll leave it as is.

Code: Select all

static volatile int status;

void process_to_usermode(uint64_t rip)
{
   asm volatile("cli");
   asm volatile("mov $0x23, %ax");
   asm volatile("mov %ax, %ds");
   asm volatile("mov %ax, %es");
   asm volatile("mov %ax, %fs");
   asm volatile("mov %ax, %gs");
   asm volatile("mov %rsp, %rax");
   asm volatile("push $0x23");
   asm volatile("push %rax");
   asm volatile("pushf");
   asm volatile("pop %rax");
   asm volatile("or $0x200, %rax");
   asm volatile("push %rax");
   asm volatile("push $0x1B");
   asm volatile("push %%rdx" :: "d"(rip));
   asm volatile("iretq");

// IT WORKS IF I USE A SIMPLE JUMP AND LEAVE THEM RUNNING IN KERNEL MODE (CODE BELOW)
//       void (*init)();
//       init = (void*)rip;
//       init();
}

void process_init(args_t *args)
{
   uint64_t addr, entry;

   addr = elf64_load(args->path, 0);
   if(!addr) // File not loaded
   {
      status = 0;
      printf("ELF loading failed!\n");
      process_exit(1);
   }

   entry = elf64_prepare(addr);
   if(!entry) // Linking failed
   {
      status = 0;
      printf("ELF linking failed!\n");
      process_exit(1);
   }

   // Allocate a new stack in user space
   if(!alloc_page(get_page(USER_STACK_ADDR, 1)))
   {
      printf("Failed to allocate memory for stack!\n");
      process_exit(1);
   }
   memset((void*)USER_STACK_ADDR, 0, USER_STACK_SIZE);
   asm volatile("movq %0, %%rsp" :: "r"(USER_STACK_ADDR+USER_STACK_SIZE));
   
   status = 2; // All good!
   
   kfree((void*)addr);
   process_to_usermode(entry);
}

int process_spawn(char *path, char *str, char *cwd, int modal)
{
   stream_t *file = vfs_open(path, "r");
   int value;

   if(file)
   {
      vfs_close(file);
      args_t *args = (args_t*)kmalloc(sizeof(args_t));
      strcpy(args->path, path);
      strcpy(args->param, str);
      strcpy(args->cwd, cwd);
      
      if(modal)
      {
         args->parent = smp_get_thread();
         args->modal = modal;
      }
      else
      {
         args->parent = 0;
         args->modal = 0;
      }

      status = 1;
      uint64_t pml4 = vmm_create_user_space();
      new_thread(&process_init, args, pml4);
      while(status == 1); // THE FIRST PROCESS CRASHES HERE WHILE WAITING ON THE SECOND PROCESS
      value = status;
      
      if(!value)
      {
         return 0;
      }
      else
      {
         if(args->modal)
         {
            // This is a temporary solution (the parent process should be put to sleep in this case)
            status = 1; 
            while(status == 1);
         }

         return 1;
      }
   }

   return 0;
}
When the first process crashes it is usually from a Page Fault at 0x0 because RIP = 0x0. Sometimes it causes a GP instead with a random (but rather large ~0x1000) GDT descriptor as error code. The error seems to occur right after the while(status == 1); loop ends.

I really don't know what else to tell, I'm pretty lost. Please ask if your need more information.
All help is appreciated!

Re: User mode problems

Posted: Sat Aug 27, 2011 4:47 am
by xenos
First of all, instead of having lots of asm volatile blocks, I would put all assembler code into a single asm volatile block.

Have you checked which instruction is causing the PF / GPF?

Re: User mode problems

Posted: Sat Aug 27, 2011 6:39 am
by zity
The assembly code works as expected. The first process ends up in user mode.
I have very little experience with debuggers, so I don't really know how to determine what instruction caused the error. But I might be able to figure it out at some point.

Re: User mode problems

Posted: Sat Aug 27, 2011 7:41 am
by xenos
How do you test your code? On a real computer or with a PC simulator? I recommend using Bochs - it has many nice logging and debugging features and makes tracking errors rather simple.

Re: User mode problems

Posted: Tue Aug 30, 2011 12:32 am
by zity
I use bochs to test my system, but I haven't really needed the debugging tools until now. I enabled debugging report and I found a little bug in my vmm, but that's another story. I have been busy the last couple of days but I managed to pin down the error. It seems to be an error in my scheduler. If I disable scheduling when the second process is executed both processes keeps running as expected on two different cores without any errors.

Guess I'll have to take a closer look at my scheduler.

Re: User mode problems

Posted: Tue Aug 30, 2011 1:41 pm
by zity
So apparently the system crashes the first time the scheduler switches from the child process back to the parent. I've been working on this project for two years now and never have I stumbled upon a more irritating bug.

My system changes tasks in the following way.
  • When the PIT handler is called I send an IPI to all cores and the PIT handler returns.
  • All cores enter the scheduler in order to change task.
  • The scheduler is protected by a spin-lock such that no cores enter the scheduler simultaneously.
  • Every core is assigned a new task and returns with a new RSP pointer.
Every core is assigned a unique TSS with a valid RSP0 value, pointing to a 4 KiB stack. All other values are zeroed.

Anyone seeing a flaw in this design?

Re: User mode problems

Posted: Tue Aug 30, 2011 2:52 pm
by Combuster
zity wrote:Anyone seeing a flaw in this design?
That's some severe lock abuse in there, but it should just make your system unscaleable, but not cause a crash if properly implemented.