linux kernel: why up() and down() are no longer inlined ?
Posted: Tue Oct 25, 2016 12:52 am
Hi, friends. I am reading the source code of linux 2.4.
In the semaphore partion, its up() and down() operations are implemented as inline functon.
But in linux 2.6.32, it's no longer inlined.
The code itself is much more graceful, but it's not as fast as linux-2.4 any more.
The inline version works really fast when the code 'fall through', namely, when the down() gets a ticket immediately or up() gives back the tick while no waiters.
The cpu only needs execute a 'lock dec' or 'lock inc' instruction in that case, and that case is more commonly. After all, the race condition doesn't occur very often.
I want to know, what's the reason that made kernel developers can tolerate the overhead of the latter?
In the semaphore partion, its up() and down() operations are implemented as inline functon.
Code: Select all
static inline void down(struct semaphore * sem)
{
__asm__ __volatile__(
"# atomic down operation\n\t"
LOCK "decl %0\n\t" /* --sem->count */
"js 2f\n"
"1:\n"
".section .text.lock,\"ax\"\n"
"2:\tcall __down_failed\n\t"
"jmp 1b\n"
".previous"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
static inline void up(struct semaphore * sem)
{
__asm__ __volatile__(
"# atomic up operation\n\t"
LOCK "incl %0\n\t" /* ++sem->count */
"jle 2f\n"
"1:\n"
".section .text.lock,\"ax\"\n"
"2:\tcall __up_wakeup\n\t"
"jmp 1b\n"
".previous"
:"=m" (sem->count)
:"c" (sem)
:"memory");
}
Code: Select all
void down(struct semaphore *sem)
{
unsigned long flags;
spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
sem->count--;
else
__down(sem);
spin_unlock_irqrestore(&sem->lock, flags);
}
void up(struct semaphore *sem)
{
unsigned long flags;
spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
__up(sem);
spin_unlock_irqrestore(&sem->lock, flags);
}
The inline version works really fast when the code 'fall through', namely, when the down() gets a ticket immediately or up() gives back the tick while no waiters.
The cpu only needs execute a 'lock dec' or 'lock inc' instruction in that case, and that case is more commonly. After all, the race condition doesn't occur very often.
I want to know, what's the reason that made kernel developers can tolerate the overhead of the latter?