In the semaphore partion, its up() and down() operations are implemented as inline functon.
Code: Select all
static inline void down(struct semaphore * sem)
{
__asm__ __volatile__(
"# atomic down operation\n\t"
LOCK "decl %0\n\t" /* --sem->count */
"js 2f\n"
"1:\n"
".section .text.lock,\"ax\"\n"
"2:\tcall __down_failed\n\t"
"jmp 1b\n"
".previous"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
static inline void up(struct semaphore * sem)
{
__asm__ __volatile__(
"# atomic up operation\n\t"
LOCK "incl %0\n\t" /* ++sem->count */
"jle 2f\n"
"1:\n"
".section .text.lock,\"ax\"\n"
"2:\tcall __up_wakeup\n\t"
"jmp 1b\n"
".previous"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
Code: Select all
void down(struct semaphore *sem)
{
unsigned long flags;
spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
sem->count--;
else
__down(sem);
spin_unlock_irqrestore(&sem->lock, flags);
}
void up(struct semaphore *sem)
{
unsigned long flags;
spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
__up(sem);
spin_unlock_irqrestore(&sem->lock, flags);
}
The inline version works really fast when the code 'fall through', namely, when the down() gets a ticket immediately or up() gives back the tick while no waiters.
The cpu only needs execute a 'lock dec' or 'lock inc' instruction in that case, and that case is more commonly. After all, the race condition doesn't occur very often.
I want to know, what's the reason that made kernel developers can tolerate the overhead of the latter?