Stack corruption on SMP with semaphore

FlashBurn · Post by **FlashBurn** » Tue Sep 13, 2005 1:41 am

I?m debugging my code for weeks now, but I had no success

I have 2 threads, which are printing a number. If I run this code in 1 cpu and 2 cpu Bochs it is working! But If I run this in 8 cpu Bochs the stack gets corrupted and a wrong eip is poped from the stack. This only occures if I use my semaphore code. When I?m using my spinlock code it works perfectly!

The problem occures after a little time, then Bochs says to me ">>PANIC<< fetch_raw_descriptor: LDTR.valid=0".

Maybe someone could give me a hint what I should check?! If you need any code to look at, only ask!

Pype.Clicker · Post by **Pype.Clicker** » Tue Sep 13, 2005 5:13 am

that sounds like if some task was overwriting another task's stack or something similar ...

Probably posting the semaphore code and roughly sketching the memory organization could help.

Hint 1: printing the stack pointer at each iteration might help. If it changes, you might have something odd with your suspend/resume code.

Hint 2: are you correctly spinlocking before you modify the semaphore contents ?

FlashBurn · Post by **FlashBurn** » Tue Sep 13, 2005 10:19 am

Ok, this is my semaphore code:

Code: Select all

;----------------------------
PROC semaphore_acquire_smp, sem
;----------------------------
BEGIN
;----------------------------
;   make sure we are the only one to work on the semaphore
   cli

   CALL spinlock_acquire, dword[sem]

   lock sub dword[esi+semaphore_t.count],1
   jz .end
;----------------------------
;   look if we have to wait or if we can go right away
   cmp dword[esi+semaphore_t.count],0
   jg .end
;----------------------------
;   we have to wait
   APIC_GET_ID eax
   mov ebx,[cpu_ptr+4*eax]
   mov edi,[esi+semaphore_t.threads]
   mov eax,[ebx+cpu_t.schdl_act_thread]

   test edi,edi
   jz .first

   mov ebx,[edi+thread_t.prev]
   xor ecx,ecx
   mov [eax+thread_t.prev],ebx
   mov [eax+thread_t.next],ecx
   mov [edi+thread_t.prev],eax
   mov [ebx+thread_t.next],eax

   jmp .scheduler
;----------------------------
;   we are the first thread
align 4
.first:
   mov [esi+semaphore_t.threads],eax

   mov [eax+thread_t.prev],eax
   mov [eax+thread_t.next],edi
;----------------------------
;   scheduler have to know that this thread wants to wait
align 4
.scheduler:
   or dword[eax+thread_t.flags],THREAD_WAIT or THREAD_RESCHEDULE

   CALL spinlock_release, dword[sem]

   CALLINT scheduler_reschedule_smp

   sti

.end_wait:
   RETURN
;----------------------------
align 4
.end:
   CALL spinlock_release, dword[sem]

   sti

   RETURN
ENDP
;----------------------------

;----------------------------
PROC semaphore_release, sem
;----------------------------
BEGIN
;----------------------------
;   make sure we are the only one to work on the semaphore
   cli

   CALL spinlock_acquire, dword[sem]

   lock add dword[esi+semaphore_t.count],1
;----------------------------
;   look if we need to awake a thread
   cmp dword[esi+semaphore_t.count],0
   jg .end
;----------------------------
;   we have to awake the thread on the top of the queue
   mov eax,[esi+semaphore_t.threads]
   mov ebx,[eax+thread_t.next]
   mov ecx,[eax+thread_t.prev]

   test ebx,ebx
   jz .last
;----------------------------
;   put the 2nd thread onto the top of the queue and put the last thread onto the 2nd threads prev ptr
   mov [ebx+thread_t.prev],ecx
   mov [esi+semaphore_t.threads],ebx

   jmp .scheduler
;----------------------------
;   there is no more thread on the queue
align 4
.last:
   mov [esi+semaphore_t.threads],ebx
;----------------------------
;   scheduler needs to awaken the thread in eax
.scheduler:
   and dword[eax+thread_t.flags],not THREAD_WAIT

   push eax

   CALL spinlock_release, dword[sem]

   sti

   CALL scheduler_add_scheduler         ;par is in pushed eax
;----------------------------
.end_awaken:
   RETURN
;----------------------------
align 4
.end:
   CALL spinlock_release, dword[sem]

   sti

   RETURN
ENDP
;----------------------------

This is the CALLINT macro:

Code: Select all

;----------------------------
macro CALLINT proc
{
   pushfd
   push cs

   call proc
}
;----------------------------

I don?t know which piece of code could make this failure.

FlashBurn · Post by **FlashBurn** » Wed Sep 14, 2005 12:05 pm

Ok, maybe it?s needed to also have a look at my scheduler code! I attach it, because it is to much to post it.

This is my thread struc:

Code: Select all

struc thread_t
{
   .queue_prev      rd 1
   .queue_next      rd 1
   .prev         rd 1
   .next         rd 1
   .owner         rd 1
   .pd            rd 1
   .esp0         rd 1
   .esp3         rd 1
   .flags         rd 1
   .priority      rd 1
   .dyn_prio      rd 1
   .time2run      rd 1
   .deadline      rd 1
   .free_start      rd 1
   .free_end      rd 1
   .name         rb 32
   .ptr2fpu      rd 1
   .fpu         rb 524
   .size_t         rb 0
}

And this is the cpu struc:

Code: Select all

struc cpu_t
{
   .timer            rd 2
   .schdl_act_thread   rd 1
   .scheduler_flags    rd 1
   .scheduler_esp      rd 1
   .fs               rd 1
   .gs               rd 1
   .tss            rd 1
   .idle_thread      rd 1
   .size_t            rb 0
}

Pype.Clicker · Post by **Pype.Clicker** » Thu Sep 15, 2005 2:37 am

i'm not 100% sure it could occur, but what i fear is that there is some race condition between the "spinlock release" and the call to the scheduler. It will depend on how you actually use the semaphore (lack time to make full sense of your ASM stuff, sorry) but let's say:

- thread A acquire spinlock, it sets itself to "i'm going to sleep on the semaphore"
- thread A releases spinlock and enters the scheduler
- thread B now grab the spinlock, access the semaphore and see it should wake up thread A before the scheduler is done releasing putting thread A to sleep

Probably each thread itself should be protected by a spinlock that the scheduler releases only _after_ it's done switching off and that it will take before it switches to that thread.

Candy · Post by **Candy** » Thu Sep 15, 2005 5:38 am

Pype.Clicker wrote: Probably each thread itself should be protected by a spinlock that the scheduler releases only _after_ it's done switching off and that it will take before it switches to that thread.

You could also store "suspended wakeups" which is a variable that you increment whenever a wakeup is done while the thread is still awake. When it falls asleep you instantly wake it up and decrease the counter.

I think that should work against such a race condition.

FlashBurn · Post by **FlashBurn** » Thu Sep 15, 2005 6:41 am

Thanks for your replies! This is a point I didn?t think of and I will test it today evening. Maybe I should overthink how I handle this.

FlashBurn · Post by **FlashBurn** » Sun Sep 18, 2005 1:53 am

Ok, it took some days till I found time to do some coding

But I have a solution - it?s not the best - I set a flag "THREAD_SEMAPHORE" and write the address of the spinlock into the wait field of the thread struc and when the scheduler sees the flag it will release the lock!

I will also test the variant from Pype with the spinlock. But this has to have time.

FlashBurn · Post by **FlashBurn** » Mon Sep 19, 2005 6:57 am

I used Pype?s solution, because I didn?t like mine. Now everything wqith semaphores is working and I can go on coding more important things, like the slab allocator.

OSDev.org

Stack corruption on SMP with semaphore

Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore

Re:Stack corruption on SMP with semaphore