Killing a waiting thread

FlashBurn · Post by **FlashBurn** » Fri Mar 05, 2010 8:19 am

So as the topic says how are you killing a waiting thread?

I just wanted to write my taskKill function, but then I came across the problem of waiting threads. I don´t know how I could do this, because either I have to know for what thread is waiting and write code so that I can take it out of a semaphore (e.g.) or I have to find a way to let him finish his code and then kill him, but this also gives me the problem how can I know that he is finished?

I can kill a thread that is ready, sleeping or that asks the kernel to kill himself. I also can´t kill a thread which is running on another cpu (the same problem as with the waiting thread).

Creature · Post by **Creature** » Fri Mar 05, 2010 9:21 am

This really isn't my area of expertise (but then again, nothing really is), but what if you were to have threads that take 1 or more function arguments, with one of them being a "volatile bool *ready" or something similar (which is set to false at the start), you could give each thread object a boolean variable which is passed by that pointer and then the thread will set the object pointed to by the pointer to true when it is finished?

FlashBurn · Post by **FlashBurn** » Fri Mar 05, 2010 9:40 am

This solves one problem of this way, but then there is still another to be solved

So the thread set this var to true, but it still has time to run and it goes into another semaphore and the same thing starts over and over again

I thought that a flag which indicates that the thread should be killed would be good, but then I need check for this flag when a thread wants to go waiting (I mean not the scheduler but the code which puts the thread into a waiting queue) and if the flag is set, it wont put it into the queue, but would call the scheduler that he kills the thread.

I think the problem that a thread has a semaphore and then goes crazy and needs to be killed is a problem which will be never solved or which will be hard to solve. So I will concentrate on killing a thread which is waiting and kill it when it has finished all things.

gravaera · Post by **gravaera** » Fri Mar 05, 2010 10:05 am

Hi:

The very fact that a thread holds a lock means that it is currenlty accessing a shared resource, or executing code that accesses hardware registers, or some other critical operation. As a rule, you you do not want to kill thread that holds a lock, or else the critical resource is very likely to be left in an unusable state.

To kill a task which holds a lock, you must then have a 'schedulerIsWaiting' flag on every task. Before a CPU pulls a new task, it checks this flag. If the flag is set, then the cpu pulls another task and executes it. This flag is used to allow the scheduler to set it at any time, and lock off a task from being pulled by cpus.

If a task is currently being executed, and the scheduler needs to modify it, the scheduler sets this flag (no lock is needed to set the flag: only the scheduler can set it, or unset it, and it can be easily designed such that race conditions cannot occur on it), and when the CPU that is executing the task gets a timer irq, it will of course check the current task's quantum for expiry.

If the quantum has expired, then the cpu will pull a new task. But here you also insert a check for the 'schedulerIsWaiting' flag. If the flag is set, AND the lockCount for the process == 0, then the CPU will pull a new task, even if the task's quantum has not expired.

Read that as, if the scheduler wants the thread, but the thread is currently being executed, AND the thread holds a lock, then continue executing the thread until its lockCount == 0, and then release it. Then the scheduler has exclusive access to the thread since other CPUs won't pull it if the 'schedulerIsWaiting' flag is set. Thus you effectivey stop the scheduler from being able to modify a process (kill, move to another node in a non CC-NUMA setup, or another computer in a distrubuted environment where that is supported), until the process holds no locks.

The 'lockCount' is simply a counter placed into every thread's control block (assuming that you schedule threads and not processes and their threads at once), which is incremented every time a thread acquires a waitlock or sleeplock, and decremented whenever a thread releases one.

The overhead for the increment/decrement on every lock acquire/release is slightly cumbersome, but justifiable.

In other words, before a task is killed, the scheduler must set the 'schedulerIsWaiting' flag on all the threads of the process it wants to kill, and then spin on every thread in the process in sequence waiting for the lockCount to reach zero. After it has spun on the last thread in the process, the scheduler knows it now has exclusive access to both the process and all of its threads, and it may do as it pleases.

So we actually don't kill processes/threads that hold locks.

--All the best,
gravaera

FlashBurn · Post by **FlashBurn** » Fri Mar 05, 2010 10:34 am

Yeah, but there you will have the same problem what I mentioned above, what if the thread acquires a semaphore on and on again between the timer irqs?

But I will implement the idea with the counter for hold locks.

I just wanted to press "submit" as another problem came to my mind. The problem with killing a thread is not only with a lock, but also when it waits for something. The problem here is that it makes the code very difficult to save for what and where a thread is waiting and then to take it out of the waiting queue and inform other threads which maybe are waiting for the thread to be killed.

Owen · Post by **Owen** » Fri Mar 05, 2010 10:48 am

Both POSIX and Win32 threads say "Don't kill a thread". PThreads, in particular, says "a thread killed holding a lock leaves the lock in an undefined state", and most implementations will leave it locked by the now dead thread (With the added fun that it's illegal to destroy a locked mutex).

In general, the only time your OS should kill a thread is when, for one reason or other (e.g. exit()) the process is exiting. In this case, whatever resource the thread was working on is going to go away anyway (And if the resource isn't - e.g, a file - then the process is misbehaved). Well designed processes should probably tell their worker threads to quit and then join them before exiting. The primary reason besides the process asking itself to terminate for a process to do so is if it's done something wrong (e.g. SIGSEGV). In this case, the process is in a bad state anyway.

Killing processes has the added complexity that shared mutexes often need to be set to a known state (Though some cases, suck as PThreads interprocess mutexes, don't!), generally unlocked.

It's important that processed don't become unkillable. If something asks for a process to be killed (Rather than politely asking it to terminate), it generally has good reasons.

gravaera · Post by **gravaera** » Fri Mar 05, 2010 3:34 pm

Hi:

FlashBurn wrote:Yeah, but there you will have the same problem what I mentioned above, what if the thread acquires a semaphore on and on again between the timer irqs?

...? This is not really a problem. A well written scheduler will not encounter too much wait time for a thread to release the locks. And the problem of a thread re-acquiring locks is again, not really a problem.

When you write a fully (as much as is possible) re-entrant kernel, you ensure that you disable IRQs on the local processor so that the thread does not receive the timer IRQ while executing in a critical section. Therefore the chances of you being able to completely fairly schedule threads is really dependent upon the percentage of the time that this process spends outside of critical paths in the kernel.

You could argue then, that this allows threads to gain more than their quantum in execution time. And it surely does: if a thread enters the kernel in its last quantum for the run (assume a run is the number of times the thread should be scheduled per second), and then acquires a lock just before the timer IRQ arrives for the scheduler to force the CPU to pull a new thread, you have effectively experienced the same problem as the one you outlined above. IRQs are disabled in the critical section, so the thread gets a whole other quantum.

Yet Linux, NT, etc never seem to wane under that kind of situation. You have, assuming 1000 'ticks' per second, 1000 tries for the scheduler to check a thread's lockCount. The chances of a thread exceeding say, even three quantums without actually getting to a 'zero locks held' state are very low. And in fact, they are one to one with the chances of the scheduler being able to reschedule a thread to a CPU. So if you have problems with your scheduler having to wait long periods for a thread to reach a non-lock-holding state, it's not due to the method in question. It's actually caused by how you use locks in your kernel.

--All the best,
gravaera

FlashBurn · Post by **FlashBurn** » Fri Mar 05, 2010 3:50 pm

gravaera wrote: So if you have problems with your scheduler having to wait long periods for a thread to reach a non-lock-holding state, it's not due to the method in question. It's actually caused by how you use locks in your kernel.

I don´t have this situation at the moment, but I´m planning for the future

At the moment I can´t imagine why a thread should acquire a semaphore, work on the critical code, release the semaphore and acquire a new semaphore and so on ...

But because I can´t imagine a situation it doesn´t mean it wont exist one

So I came to the same conclusion as you, there shouldn´t be a situation like this (it makes my life easier).

As we are discuss the problems with semaphores another problem went under. The waiting queues. At the moment a thread can wait for a port to receive a msg, for a task to finish and for a thread to finish and now I´m working on my pipes and there you can also wait. The problem is now if I only have these few cases then I could save the kind of structure the thread is waiting for and a ptr to the structure so that I can get him out of the queue (or waiting state). But what is if you have to many cases a thread could wait for and you don´t want to write code for every case to get a thread out of the list? Then you would also need to wait till the thread gets out of the waiting state and this could be a while (e.g. a thread is waiting for user input, but the another thread got an error and wants the task to exit).

Brendan · Post by **Brendan** » Sat Mar 06, 2010 8:01 pm

Hi,

FlashBurn wrote:The problem is now if I only have these few cases then I could save the kind of structure the thread is waiting for and a ptr to the structure so that I can get him out of the queue (or waiting state). But what is if you have to many cases a thread could wait for and you don´t want to write code for every case to get a thread out of the list? Then you would also need to wait till the thread gets out of the waiting state and this could be a while (e.g. a thread is waiting for user input, but the another thread got an error and wants the task to exit).

First, I use a "thread state" variable to keep track of what state each thread is in (running, ready to run, waiting for message, sleeping, etc). Second, when I kill a thread I don't kill a thread immediately - instead, I set the thread's CS:EIP to the address of a "terminate this thread" kernel function and put the thread in the "ready to run" state. This is because it can take some time to free all the resources a thread may have left allocated, especially if it's the last thread in the process and the entire process needs to be cleaned up (e.g. emptying message queues, closing file descriptors, removing memory mapped files, freeing pages, etc).

If the thread being killed is in the "running" state, and if the thread is running on the current CPU, then you call kernel's "terminate this thread" function directly. If the thread being killed is in the "running" state but the thread is running on a different CPU, then you use an IPI to get the other CPU to call kernel's "terminate this thread" function.

If the thread being killed is in the "ready to run" state then you only need to change it's CS:EIP to the kernel's "terminate this thread" function, so it kills itself next time it gets CPU time. Note: If the scheduler uses multiple queues, you don't need to care which queue the thread may be on in this case.

If the thread is in the "waiting for a message" (or"waiting for a pipe") state, then you don't need a queue. Whenever any thread receives a message (or bytes on a pipe) you check if the "thread state" is "waiting for a message", and if it is you change it's state to "ready to run" and put it on the scheduler's queue. In this case, if the thread is killed it can go directly from the "waiting for message/pipe" state to the "ready to run" state like normal (the only difference is that you change it's CS:EIP). There's also (potentially) other states (my last scheduler had a "paused" state, mostly intended for debugging) which don't involve any sort of queue.

That only leaves the "sleeping" state, which is probably the only case where the thread needs to be removed from some sort of queue.

Cheers,

Brendan

FlashBurn · Post by **FlashBurn** » Sun Mar 07, 2010 2:37 am

Brendan wrote: First, I use a "thread state" variable to keep track of what state each thread is in (running, ready to run, waiting for message, sleeping, etc). Second, when I kill a thread I don't kill a thread immediately - instead, I set the thread's CS:EIP to the address of a "terminate this thread" kernel function and put the thread in the "ready to run" state. This is because it can take some time to free all the resources a thread may have left allocated, especially if it's the last thread in the process and the entire process needs to be cleaned up (e.g. emptying message queues, closing file descriptors, removing memory mapped files, freeing pages, etc).

I use the same system. When I create a thread it is initialized so that when the main thread function does a return it calls this kernel function (this is right for kernel and user threads) and this function does the clean-up especially when its the last thread it needs to clean-up the whole task.
I also have thread states, but I just have one state "waiting", not many states for things like "waiting for msg", "waiting for pipe" and so on.

I will try to use a flag "thread needs to be killed" and then everytime a threads (which was waiting or had a lock) looks, when he releases the lock, if he should be killed and then calls the threadKill function. To get a thread out of my sleeping queue is no problem (it´s the easiest thing to do

). So I´m left with user threads.

As I don´t know at the moment how I can and will do user locks I just set this "thread needs to be killed" flag and then wait that the scheduler kicks these threads out.

So I just have to figure out how I can take "crazy" threads out of a waiting state for a msg (a pipe is easy, because there I needn´t to manipulate a queue).

Gigasoft · Post by **Gigasoft** » Fri Mar 12, 2010 1:22 pm

In my OS design, user-initiated waits are canceled when the waiting thread is terminated. The system keeps a list of pending operations to be done before a thread returns to user mode. When termination of the thread is requested, the operation will be postponed until it is going to return to user mode. In a multiprocessor system, if the thread is running, the processor running the thread will be interrupted so that it will reach the termination function. If the thread is in an user-mode callback, the system immediately returns from the callback, and reissues the termination operation.

OSDev.org

Killing a waiting thread

Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread

Re: Killing a waiting thread