Stack corruption on SMP with semaphore
Stack corruption on SMP with semaphore
I?m debugging my code for weeks now, but I had no success
I have 2 threads, which are printing a number. If I run this code in 1 cpu and 2 cpu Bochs it is working! But If I run this in 8 cpu Bochs the stack gets corrupted and a wrong eip is poped from the stack. This only occures if I use my semaphore code. When I?m using my spinlock code it works perfectly!
The problem occures after a little time, then Bochs says to me ">>PANIC<< fetch_raw_descriptor: LDTR.valid=0".
Maybe someone could give me a hint what I should check?! If you need any code to look at, only ask!
I have 2 threads, which are printing a number. If I run this code in 1 cpu and 2 cpu Bochs it is working! But If I run this in 8 cpu Bochs the stack gets corrupted and a wrong eip is poped from the stack. This only occures if I use my semaphore code. When I?m using my spinlock code it works perfectly!
The problem occures after a little time, then Bochs says to me ">>PANIC<< fetch_raw_descriptor: LDTR.valid=0".
Maybe someone could give me a hint what I should check?! If you need any code to look at, only ask!
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Stack corruption on SMP with semaphore
that sounds like if some task was overwriting another task's stack or something similar ...
Probably posting the semaphore code and roughly sketching the memory organization could help.
Hint 1: printing the stack pointer at each iteration might help. If it changes, you might have something odd with your suspend/resume code.
Hint 2: are you correctly spinlocking before you modify the semaphore contents ?
Probably posting the semaphore code and roughly sketching the memory organization could help.
Hint 1: printing the stack pointer at each iteration might help. If it changes, you might have something odd with your suspend/resume code.
Hint 2: are you correctly spinlocking before you modify the semaphore contents ?
Re:Stack corruption on SMP with semaphore
Ok, this is my semaphore code:
This is the CALLINT macro:
I don?t know which piece of code could make this failure.
Code: Select all
;----------------------------
PROC semaphore_acquire_smp, sem
;----------------------------
BEGIN
;----------------------------
; make sure we are the only one to work on the semaphore
cli
CALL spinlock_acquire, dword[sem]
lock sub dword[esi+semaphore_t.count],1
jz .end
;----------------------------
; look if we have to wait or if we can go right away
cmp dword[esi+semaphore_t.count],0
jg .end
;----------------------------
; we have to wait
APIC_GET_ID eax
mov ebx,[cpu_ptr+4*eax]
mov edi,[esi+semaphore_t.threads]
mov eax,[ebx+cpu_t.schdl_act_thread]
test edi,edi
jz .first
mov ebx,[edi+thread_t.prev]
xor ecx,ecx
mov [eax+thread_t.prev],ebx
mov [eax+thread_t.next],ecx
mov [edi+thread_t.prev],eax
mov [ebx+thread_t.next],eax
jmp .scheduler
;----------------------------
; we are the first thread
align 4
.first:
mov [esi+semaphore_t.threads],eax
mov [eax+thread_t.prev],eax
mov [eax+thread_t.next],edi
;----------------------------
; scheduler have to know that this thread wants to wait
align 4
.scheduler:
or dword[eax+thread_t.flags],THREAD_WAIT or THREAD_RESCHEDULE
CALL spinlock_release, dword[sem]
CALLINT scheduler_reschedule_smp
sti
.end_wait:
RETURN
;----------------------------
align 4
.end:
CALL spinlock_release, dword[sem]
sti
RETURN
ENDP
;----------------------------
;----------------------------
PROC semaphore_release, sem
;----------------------------
BEGIN
;----------------------------
; make sure we are the only one to work on the semaphore
cli
CALL spinlock_acquire, dword[sem]
lock add dword[esi+semaphore_t.count],1
;----------------------------
; look if we need to awake a thread
cmp dword[esi+semaphore_t.count],0
jg .end
;----------------------------
; we have to awake the thread on the top of the queue
mov eax,[esi+semaphore_t.threads]
mov ebx,[eax+thread_t.next]
mov ecx,[eax+thread_t.prev]
test ebx,ebx
jz .last
;----------------------------
; put the 2nd thread onto the top of the queue and put the last thread onto the 2nd threads prev ptr
mov [ebx+thread_t.prev],ecx
mov [esi+semaphore_t.threads],ebx
jmp .scheduler
;----------------------------
; there is no more thread on the queue
align 4
.last:
mov [esi+semaphore_t.threads],ebx
;----------------------------
; scheduler needs to awaken the thread in eax
.scheduler:
and dword[eax+thread_t.flags],not THREAD_WAIT
push eax
CALL spinlock_release, dword[sem]
sti
CALL scheduler_add_scheduler ;par is in pushed eax
;----------------------------
.end_awaken:
RETURN
;----------------------------
align 4
.end:
CALL spinlock_release, dword[sem]
sti
RETURN
ENDP
;----------------------------
Code: Select all
;----------------------------
macro CALLINT proc
{
pushfd
push cs
call proc
}
;----------------------------
Re:Stack corruption on SMP with semaphore
Ok, maybe it?s needed to also have a look at my scheduler code! I attach it, because it is to much to post it.
This is my thread struc:
And this is the cpu struc:
This is my thread struc:
Code: Select all
struc thread_t
{
.queue_prev rd 1
.queue_next rd 1
.prev rd 1
.next rd 1
.owner rd 1
.pd rd 1
.esp0 rd 1
.esp3 rd 1
.flags rd 1
.priority rd 1
.dyn_prio rd 1
.time2run rd 1
.deadline rd 1
.free_start rd 1
.free_end rd 1
.name rb 32
.ptr2fpu rd 1
.fpu rb 524
.size_t rb 0
}
Code: Select all
struc cpu_t
{
.timer rd 2
.schdl_act_thread rd 1
.scheduler_flags rd 1
.scheduler_esp rd 1
.fs rd 1
.gs rd 1
.tss rd 1
.idle_thread rd 1
.size_t rb 0
}
- Pype.Clicker
- Member
- Posts: 5964
- Joined: Wed Oct 18, 2006 2:31 am
- Location: In a galaxy, far, far away
- Contact:
Re:Stack corruption on SMP with semaphore
i'm not 100% sure it could occur, but what i fear is that there is some race condition between the "spinlock release" and the call to the scheduler. It will depend on how you actually use the semaphore (lack time to make full sense of your ASM stuff, sorry) but let's say:
- thread A acquire spinlock, it sets itself to "i'm going to sleep on the semaphore"
- thread A releases spinlock and enters the scheduler
- thread B now grab the spinlock, access the semaphore and see it should wake up thread A before the scheduler is done releasing putting thread A to sleep
Probably each thread itself should be protected by a spinlock that the scheduler releases only _after_ it's done switching off and that it will take before it switches to that thread.
- thread A acquire spinlock, it sets itself to "i'm going to sleep on the semaphore"
- thread A releases spinlock and enters the scheduler
- thread B now grab the spinlock, access the semaphore and see it should wake up thread A before the scheduler is done releasing putting thread A to sleep
Probably each thread itself should be protected by a spinlock that the scheduler releases only _after_ it's done switching off and that it will take before it switches to that thread.
Re:Stack corruption on SMP with semaphore
You could also store "suspended wakeups" which is a variable that you increment whenever a wakeup is done while the thread is still awake. When it falls asleep you instantly wake it up and decrease the counter.Pype.Clicker wrote: Probably each thread itself should be protected by a spinlock that the scheduler releases only _after_ it's done switching off and that it will take before it switches to that thread.
I think that should work against such a race condition.
Re:Stack corruption on SMP with semaphore
Thanks for your replies! This is a point I didn?t think of and I will test it today evening. Maybe I should overthink how I handle this.
Re:Stack corruption on SMP with semaphore
Ok, it took some days till I found time to do some coding But I have a solution - it?s not the best - I set a flag "THREAD_SEMAPHORE" and write the address of the spinlock into the wait field of the thread struc and when the scheduler sees the flag it will release the lock!
I will also test the variant from Pype with the spinlock. But this has to have time.
I will also test the variant from Pype with the spinlock. But this has to have time.
Re:Stack corruption on SMP with semaphore
I used Pype?s solution, because I didn?t like mine. Now everything wqith semaphores is working and I can go on coding more important things, like the slab allocator.