A Better Way of Loading Processes

AJ · Post by AJ » Tue Jul 03, 2007 8:07 am

Hi,

I am taking the time to go through my scheduler and task management systems (it's time to look for something a little more robust) and was re-visiting my process-initialisation code. I just wondered how people generally tackle task initialisation. At present, I do the following (section which probably needs re-coding highlighted...):

* Load the object file 'as-is' in to RAM.
* Extract relocation data and store in a 'unified format' understood by my kernel (so ELF, COFF, PE etc.. end up looking the same).
* Create a new page table, which includes the top 1GB containing my kernel.
------------Optimisation Needed Below?---------------------
* Switch to new process space having disabled the scheduler.
* Relocate the executable, set up stack(s).
* Switch back to kernel space.
* Add the task to the currently running list.
* Re-enable the scheduler.
-------------------Optimisation Ends?------------------------------
* Run the task when it is due for scheduling.

The bits I don't like are having to disable the scheduler for so long and having to relocate the entire exe rather than paging on-demand. I experimented with not disabling the scheduler at all, but that would mean the additional overhead of having to save the outgoing CR3 for task switches (otherwise, CR3 gets trashed if a task switch occurs during initialisation).

As for paging on demand, that's a bit beyond my kernel at present (I do it for heaps, but not for loadable data).

Any ideas how I can optimise things a bit (or not have to switch to the new task space)?

Cheers,
Adam

frank · Post by **frank** » Tue Jul 03, 2007 8:10 am

In my OS instead of switching the address spaces I just map the application pages into the kernel space mapping temporarily. Then when the application is done being set up I can just unmap those pages. No address space switching.

AJ · Post by AJ » Tue Jul 03, 2007 8:42 am

Hi,

I used to do that too, but of course you have to be very careful about only setting up one task at a time (launching 2 proceses at once may not come up too often...). I guess that may be a better option than what I am currently doing?...

Thanks for the answer,
Adam

frank · Post by **frank** » Tue Jul 03, 2007 8:58 am

You could use a mutex to protect the code that launches tasks. That way you are guaranteed to avoid launching more than one process at once. Of course you should probably do that anyways regardless of the method you chose. I just know that my way has no problems with leaving the scheduler running. Of course just remember that the choice is up to you.

Brendan · Post by **Brendan** » Tue Jul 03, 2007 9:11 am

Hi,

AJ wrote:I am taking the time to go through my scheduler and task management systems (it's time to look for something a little more robust) and was re-visiting my process-initialisation code. I just wondered how people generally tackle task initialisation.

I tend to use something like this:

* Ask the linear memory manager to create a new address space containing the kernel's pages and nothing else
* Create the framework for the new process (allocate a "process ID" and set some information about the process)
* Create an initial thread for the process (kernel stack, scheduler priority, etc)
* Make the new thread "ready to run"
* Send a "startup" message to the new thread containing details
* Return back to the caller

When the scheduler gives the new thread some CPU time, it starts running the kernel's "process startup" routine. This goes like this:

* Wait until the "startup" message is received
* Parse the "startup" message (including getting the executable's file name)
* Attempt to open the executable file
* Attempt to load the executable file's header into the address space
* Check stuff!
* Load the rest of the executable file into the address space (unless I support memory mapped files - never got that far yet though)
* Send a "started OK" message to the thread that started the process
* Jump to the executable file's entry point (actually an IRET to switch back to CPL=3)

Of course this means my kernel function to start a new process looks relatively fast because there's no waiting for disk I/O or task switches involved (although a low priority thread may be pre-empted by the process it started). It's also good for multi-CPU and NUMA, as the new process can start running on the CPU it's meant to be run on (not necessarily the same CPU as the thread that started the process).

Cheers,

Brendan

AJ · Post by AJ » Tue Jul 03, 2007 9:30 am

I like the sound of that method, but the whole CPL0 --> CPL3 thing threw me a bit.

If I understand you correctly, you would need to set up two ring 0 stacks - one to switch to the kernel-mode portion of code and then one to switch to the user-mode portion of code. These two stacks could be set up next to each other in RAM. When the kernel-mode code finishes executing and needs the iret to start the CPL3 code, it would get the scheduler to adjust its ESP, resulting in the next iret switching to user mode.

I'm going to start playing with it now...

Thanks,
Adam

Brendan · Post by **Brendan** » Tue Jul 03, 2007 11:47 am

Hi,

AJ wrote:If I understand you correctly, you would need to set up two ring 0 stacks - one to switch to the kernel-mode portion of code and then one to switch to the user-mode portion of code. These two stacks could be set up next to each other in RAM. When the kernel-mode code finishes executing and needs the iret to start the CPL3 code, it would get the scheduler to adjust its ESP, resulting in the next iret switching to user mode.

All my kernels have always been interruptable and preemptable. Every thread has it's own kernel stack (and possibly another user-level stack, if it's not a "kernel thread"). This means that if there's 1234 threads then there's 1234 kernel stacks, and spawning any new thread for any reasons involves allocating a new kernel stack for it. In this case, spawning a process like I do is simple.

Some people use a "do the task switch when returning to CPL=3" trick, which seems simpler and means that you can have a single kernel thread (or one per CPU) shared by everything, but it also means you can't have a preemptable kernel and can't have kernel threads. In this case, spawning a process like I do will be a hard...

Cheers,

Brendan

AJ · Post by AJ » Tue Jul 03, 2007 2:13 pm

Hi,

I currently have a separate PL0 stack for each task and plan to keep it that way. I have a lot of design to do on paper first before I properly implement my 'final' task manager, but given your first answer, I am now thinking along the lines of creating a 'task setup process' in kernel space.

Once process setup is complete (the new task space, PD, stack etc...) are set up, this kernel-stack will then call a 'privilege demotion' interrupt. The task then gets put on the sleep queue. When the scheduler gets round to putting it in the run queue again (depending on priority), it will change the segment selectors to RPL3. I am thinking that a space will be left on the stack for ESP3 and SS3 during the original process creation so that the scheduler can easily 'demote' any process to ring 3. Although it sounds convoluted, I'm sure it won't be too bad to implement...

As you can see, I still have some thinking ahead

Thanks again for the help and ideas,
Adam

Brendan · Post by **Brendan** » Tue Jul 03, 2007 9:24 pm

Hi,

AJ wrote:Once process setup is complete (the new task space, PD, stack etc...) are set up, this kernel-stack will then call a 'privilege demotion' interrupt.

Why? The kernel can just push some data onto it's own stack, then use IRET to "return" to the CS:EIP that it just pushed onto it's stack. You don't need any interrupt (you just pretend there was one).

AJ wrote:The task then gets put on the sleep queue.

Why? At this point the new thread was running, and can continue running...

AJ wrote:When the scheduler gets round to putting it in the run queue again (depending on priority), it will change the segment selectors to RPL3.

I use the same DS, ES, FS, GS segments for both CPL=0 and CPL=3, and never change them. The IRET changes SS and CS as part of switching from CPL=0 to CPL=3. If you use different DS, ES, FS, GS segments for CPL=0 and CPL=3, then you can load the CPL=3 segments into the segment registers before you do the IRET.

AJ wrote:I am thinking that a space will be left on the stack for ESP3 and SS3 during the original process creation so that the scheduler can easily 'demote' any process to ring 3. Although it sounds convoluted, I'm sure it won't be too bad to implement...

Think of it like this...

The kernel API's "create process" function allocates some RAM for some data structures, stuffs some data into those data structures and returns to whoever called it (no blocking, no task switching, etc). It doesn't really need anything special, except for a "put Thread X on a scheduler run queue" function (that should be a generic scheduler function that's used when any blocked thread is unblocked for any reason).

The kernel's "process startup code" (that the new process' initial thread runs) doesn't really need anything special either - some code to get an executable file into an address space (standard file I/O and some memory management), and a (pretend) IRET.

Cheers,

Brendan

OSDev.org

A Better Way of Loading Processes

A Better Way of Loading Processes

Re: A Better Way of Loading Processes