Page 1 of 1

syscall getting interrupted by timer!!! why???

Posted: Mon Sep 07, 2020 4:39 am
by ITchimp
Ok, I moved my fork into syscall, by creating a syscall_fork stub...
the syscall_fork was treated as a normal interrupt call (as opposed to trap call)

but during my debugging for weird page faults... I discover that my 0x80 syscall is interrupted
by timer_handler, I have removed the cli/sti inline assembly call because cli and sti was in the
stub as shown below... yet the timer handler (and subsequent context switching is still invoked)
why? I thought the isr128 stub has cli and sti in it already..!!!!!
isr128:
cli
push $0
push $128
jmp isr_common_stub

Re: syscall getting interrupted by timer!!! why???

Posted: Mon Sep 07, 2020 7:31 am
by bzt
ITchimp wrote:Ok, I moved my fork into syscall, by creating a syscall_fork stub...
the syscall_fork was treated as a normal interrupt call (as opposed to trap call)
That means you no longer need cli.
ITchimp wrote:but during my debugging for weird page faults... I discover that my 0x80 syscall is interrupted
by timer_handler, I have removed the cli/sti inline assembly call because cli and sti was in the
stub as shown below...
Issuing cli as soon as possible is a wise choice, however shouldn't be needed with an interrupt gate.
ITchimp wrote:yet the timer handler (and subsequent context switching is still invoked)
why? I thought the isr128 stub has cli and sti in it already..!!!!!
Actually you shouldn't have sti at all. When you return from an ISR with iret, that pops the flags (including the interrupt flag). I assume a timer IRQ happened, but it wasn't handled because of the cleared interrupt flag, therefore as soon as the code reached sti, the interrupt was triggered. That's why you shouldn't use sti at all. Let the iret handle that, then the interrupt will be triggered after the code returned, so that in the new interrupt stack frame the user space position will be stored.

Cheers,
bzt

Re: syscall getting interrupted by timer!!! why???

Posted: Wed Sep 09, 2020 3:41 am
by ITchimp
I have got some really weird bugs... the base pointer changes mysterious to garbage on the execution path...
from c00f6f4c to 001048e8, by executing leave...

and leave is not supposed to change ebp, it should copy ebp value into esp and then pop ebp... what could go wrong!!!


<pre>(0) [0x0000000000104917] 0008:0000000000104917 (unk. ctxt): mov ebx, dword ptr ss:[ebp-4] ; 8b5dfc
<bochs:28>
Next at t=867530415
rax: 0x00000000_00000000 rcx: 0x00000000_001048e8
rdx: 0x00000000_001bb000 rbx: 0x00000000_001045e7
rsp: 0x00000000_c00f6f24 rbp: 0x00000000_c00f6f4c
rsi: 0x00000000_00067ec4 rdi: 0x00000000_00053c9e
r8 : 0x00000000_00000000 r9 : 0x00000000_00000000
r10: 0x00000000_00000000 r11: 0x00000000_00000000
r12: 0x00000000_00000000 r13: 0x00000000_00000000
r14: 0x00000000_00000000 r15: 0x00000000_00000000
rip: 0x00000000_0010491a
eflags 0x00000006: id vip vif ac vm rf nt IOPL=0 of df if tf sf zf af PF cf
(0) [0x000000000010491a] 0008:000000000010491a (unk. ctxt): leave ; c9
<bochs:29>
Next at t=867530416
rax: 0x00000000_00000000 rcx: 0x00000000_001048e8
rdx: 0x00000000_001bb000 rbx: 0x00000000_001045e7
rsp: 0x00000000_c00f6f50 rbp: 0x00000000_001048e8
rsi: 0x00000000_00067ec4 rdi: 0x00000000_00053c9e
r8 : 0x00000000_00000000 r9 : 0x00000000_00000000
r10: 0x00000000_00000000 r11: 0x00000000_00000000
r12: 0x00000000_00000000 r13: 0x00000000_00000000
r14: 0x00000000_00000000 r15: 0x00000000_00000000
rip: 0x00000000_0010491b
eflags 0x00000006: id vip vif ac vm rf nt IOPL=0 of df if tf sf zf af PF cf
(0) [0x000000000010491b] 0008:000000000010491b (unk. ctxt): ret ; c3
<bochs:30>
Next at t=867530417
rax: 0x00000000_00000000 rcx: 0x00000000_001048e8
rdx: 0x00000000_001bb000 rbx: 0x00000000_001045e7
rsp: 0x00000000_c00f6f54 rbp: 0x00000000_001048e8
rsi: 0x00000000_00067ec4 rdi: 0x00000000_00053c9e
r8 : 0x00000000_00000000 r9 : 0x00000000_00000000
r10: 0x00000000_00000000 r11: 0x00000000_00000000
r12: 0x00000000_00000000 r13: 0x00000000_00000000
r14: 0x00000000_00000000 r15: 0x00000000_00000000
rip: 0x00000000_c00f6f24
eflags 0x00000006: id vip vif ac vm rf nt IOPL=0 of df if tf sf zf af PF cf
(0) [0x000000000020cf24] 0008:00000000c00f6f24 (unk. ctxt): inc esp ; 44
<bochs:31> s
Next at t=867530418
rax: 0x00000000_00000000 rcx: 0x00000000_001048e8
rdx: 0x00000000_001bb000 rbx: 0x00000000_001045e7
rsp: 0x00000000_c00f6f55 rbp: 0x00000000_001048e8
rsi: 0x00000000_00067ec4 rdi: 0x00000000_00053c9e
r8 : 0x00000000_00000000 r9 : 0x00000000_00000000
r10: 0x00000000_00000000 r11: 0x00000000_00000000
r12: 0x00000000_00000000 r13: 0x00000000_00000000
r14: 0x00000000_00000000 r15: 0x00000000_00000000
rip: 0x00000000_c00f6f25
</pre>

Re: syscall getting interrupted by timer!!! why???

Posted: Wed Sep 09, 2020 5:31 am
by bzt
ITchimp wrote:I have got some really weird bugs... the base pointer changes mysterious to garbage on the execution path...
from c00f6f4c to 001048e8, by executing leave...
You have a corrupted stack problem.
ITchimp wrote:and leave is not supposed to change ebp
Yes, it's supposed to change ebp! This is how it works: in each function, ebp should point to the stack where the function's local variables start. Obviously when you return from a function, you must restore ebp to the caller function's local variables start.

In function prologue, ebp is pushed on the stack, and set to esp. This way ebp gets the top of the stack for the function call. This is called "creating a stack frame".

In function epilogue, all local variables should be removed from the stack (by setting esp to the stack top stored in ebp), then the previous ebp popped. This is called "leaving a stack frame". Because the ebp value that was pushed in the prologue is popped, the top of the stack now must point to the caller's address (and will be popped by the "ret" instruction).

You can do this with a pair of push+mov and mov+pop instructions, but some architecture (like x86_32) has special instructions to do so, like your "leave" instruction for example. If this doesn't work, that means only one thing: your stack is corrupted!

To debug, dump the stack AFTER the function prologue, and BEFORE the function epilogue (in other words, before the "leave" instruction). The two stack dumps must be identical to work correctly. (FYI: bochs debug has a "print-stack" command). If you change the stack in an interrupt handler (like for a context switch) then make sure that the new stack contains a stack frame pointer and a return address as well, otherwise you can't return from that function.

Cheers,
bzt

Re: syscall getting interrupted by timer!!! why???

Posted: Wed Sep 09, 2020 5:29 pm
by ITchimp
That is awesome!!! BZT can I get a direct brain download from you????

kernel stack location anomaly

Posted: Sun Sep 13, 2020 2:58 am
by ITchimp
I am having issue with task switch...

when I am in the timer_handler... with bochs, I type info tss

the ss0:esp0 is 0x10:0xc0119000..

but when I typed print-stack

the bottom of the stack is 0xc00f7000... why isn't the stack starting from 0xc0119000... as I set it to be?