Lazy address space switch schdulers

OSwhatever · Post by **OSwhatever** » Fri Aug 20, 2010 2:48 pm

Hello

I wonder how you have solved minimizing switching process (virtual address space switch) when scheduling. For example, if a process has several threads with the same priority you might want to run as many threads within that process for the time slot being before switching to another process.

How do you do it?

Do you have global ready queues with processes and then ready queues for threads in every process?
Do you instead sort the ready queue so that threads with same process gets inserted together with other threads running in the same process?
Do you let the process handle their threads themselves (kernel has no concept of threads). User processes do their own scheduling.

OSwhatever · Post by **OSwhatever** » Wed Aug 25, 2010 4:38 am

Does you OS have any concept of threads or is that something that the process must take of itself?

OSwhatever · Post by **OSwhatever** » Wed Aug 25, 2010 6:39 pm

I'm doing the very opposite, the scheduling primitive is the thread and not the process. The process is only a container for all the threads. I'm not really satisfied with this as the scheduling would be better on process basis. It would easier to do fairer scheduling and coalescence all threads for the process.

I would let the process handle it's own threads in user space, though I want to have thread preemption and the question remains how solve that. Also if the priority of threads are different you only have competition inside that process. Now, I'm not sure how cruisal that is to real world implementations, for example a very high priority thread that is competing with all the threads, not only inside process threads. Priority inheritage would not have the same meaning in such system.

gerryg400 · Post by **gerryg400** » Wed Aug 25, 2010 7:58 pm

I'm doing the very opposite, the scheduling primitive is the thread and not the process. The process is only a container for all the threads. I'm not really satisfied with this as the scheduling would be better on process basis. It would easier to do fairer scheduling and coalescence all threads for the process.

Doen't your scheduler know the pid of each thread from the thread struct ? If so can't it scan the ready list for a thread with the same pid. Or better still maintain a linked list of ready threads from the same process ?

OSwhatever · Post by **OSwhatever** » Thu Aug 26, 2010 7:17 am

gerryg400 wrote:
I'm doing the very opposite, the scheduling primitive is the thread and not the process. The process is only a container for all the threads. I'm not really satisfied with this as the scheduling would be better on process basis. It would easier to do fairer scheduling and coalescence all threads for the process.
Doen't your scheduler know the pid of each thread from the thread struct ? If so can't it scan the ready list for a thread with the same pid. Or better still maintain a linked list of ready threads from the same process ?

That is the obvious solution I've thought about. A ready list with threads for each process (a ready list of every priority, for every process actually). Then also these same lists are inserted in the scheduler ready lists representing processes that are ready. So basically the scheduler ready lists points to lists themselves with ready threads. Currently in my system each thread has a "pid" as an effect that the scheduler primitive are threads. Process Pids are not available since I haven't found any use for them yet.

This type of structure would give you a lot of possibilities how you would want to schedule the threads.

OSwhatever · Post by **OSwhatever** » Fri Aug 27, 2010 3:56 pm

berkus wrote:I read "CPU Inheritance Scheduling" whitepaper after which I decided a recursive system allowing threads to schedule another threads is a good idea, so what I have now is more or less implementation of this idea - kernel only knows of the "highest level threads" which are the apps' main thread, and this thread is free to do whatever scheduling it needs to do. Going deeper (scheduled threads scheduling their own threads) does not seem necessary at this point, so I probably won't even go at it.

If the kernel only knows about the main app thread, how are system calls handled from threads created by the user process?

Combuster · Post by **Combuster** » Fri Aug 27, 2010 3:58 pm

Its simple: all threads are treated as one and the same. Mind you: there are no subthread-specific calls to the kernel since you do that stuff in userspace anyway. So the kernel doesn't care and doesn't need to care.

OSwhatever · Post by **OSwhatever** » Fri Aug 27, 2010 4:57 pm

Combuster wrote:Its simple: all threads are treated as one and the same. Mind you: there are no subthread-specific calls to the kernel since you do that stuff in userspace anyway. So the kernel doesn't care and doesn't need to care.

How do you for example read from a file in one thread? One way or another you need to involve the kernel for that request.

If the kernel see all threads "as one", does that means "the big kernel lock" for every process, so you loose concurrency if you choose that method.

Combuster · Post by **Combuster** » Fri Aug 27, 2010 5:33 pm

Who needs the big kernel lock anyway? Just because linux does it doesn't mean you should make the same mistake.

OSwhatever · Post by **OSwhatever** » Fri Aug 27, 2010 6:46 pm

berkus wrote:
OSwhatever wrote:How do you for example read from a file in one thread? One way or another you need to involve the kernel for that request.
I'd be very curious to hear which part exactly REQUIRES kernel involvement here?

Depends on the system.

If file system code is in kernel you obviously have to go there for the operation.

If you have a file system server somewhere you need to use the IPC to request the operation. Just using the IPC means you have go to kernel.

OSwhatever · Post by **OSwhatever** » Fri Aug 27, 2010 6:56 pm

Combuster wrote:Who needs the big kernel lock anyway? Just because linux does it doesn't mean you should make the same mistake.

Linux has no general big kernel lock. That was removed years ago.

If you don't have kernel stacks for every thread you don't have concurrency in the kernel and only one thread can be in the kernel at one time so from the view of the process, it is like a big kernel lock.

The kernel stack is normally created during thread creation. If the kernel was not involved in the stack creation, there is no specific kernel stack for that thread.

This only applies for kernels that wants kernel stacks. You can actually use the user stack for the thread into the kernel but that requires that it checks pointer, checks if the area is mapped and so on (if you want full protection that is).

I'm sure there are more ways to solve this, preallocating a set of kernel stacks for example.

alethiophile · Post by **alethiophile** » Tue Aug 31, 2010 10:00 pm

I'd be very curious to hear which part exactly REQUIRES kernel involvement here?

Presumably your filesystem is on a disk somewhere? If it's purely a ramfs, then you can do it in user; on a disk, you need kernel.

gerryg400 · Post by **gerryg400** » Wed Sep 01, 2010 1:23 am

Linux has no general big kernel lock. That was removed years ago.

Years ago ? The linux kernel, 2.6.34-rc2 had hundreds of instances of the BKL.

Creature · Post by **Creature** » Wed Sep 01, 2010 4:17 am

OSwhatever wrote:
Combuster wrote:This only applies for kernels that wants kernel stacks. You can actually use the user stack for the thread into the kernel but that requires that it checks pointer, checks if the area is mapped and so on (if you want full protection that is).

I'm sure there are more ways to solve this, preallocating a set of kernel stacks for example.

My threads don't have anything but a "local" stack mapped in the parent process' address space. This stack is not mapped into the kernel's address space in any way nor is it shared, it is part of the process' private heap. This of course causes difficulties when the thread's stack is active and one needs to access parts of the kernel. Although not very efficient (I haven't actually done any measurements), I've solved the issue by reserving a small bit of the shared memory region. The memory of the thread's stack is then copied to that region, and that region is set as the active stack until any activities that need the thread's stack have been performed. Finally, the modified stack is copied back.

Although it works, I can imagine that copying 8 kB (it is usually less, since only the "used" part of the stack is copied) twice each time the thread's stack needs to be used in the kernel's address space isn't very efficient. It was more of a test to see whether I could actually pull this off properly. Unfortunately, I recently noticed it's still somewhat unstable (e.g. VirtualBox seems to give problems with processes, but not with threads), but that may be due to a bug somewhere.

OSDev.org

Lazy address space switch schdulers

Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers

Re: Lazy address space switch schdulers