OSDev.org

The Place to Start for Operating System Developers
It is currently Sat Apr 27, 2024 12:03 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 79 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject: Re: Confused about context switch
PostPosted: Thu Oct 19, 2023 4:12 pm 
Offline
Member
Member
User avatar

Joined: Wed Oct 27, 2004 11:00 pm
Posts: 874
Location: WA
KrotovOSdev wrote:
I think the problem is stack overflow. When interrupt handler is called it calls resched() which leads to creating a stack frame. If I'm right (but i maybe not) I have somehow to bypass it. How can I do this?

1) Why are you using CLI? this is dangerous, and should never be done. Instead, use a interrupt gate (which automatically disables interrupts for you), otherwise, it is possible to receive an interrupt after calling your handler, but before executing that first instruction (the CLI) -- this can lead to stack overflow

2) Why do you have an STI instruction? The STI instruction should never be used. Ever. There is no place (beyond your bootloader) when an STI instruction should be used. Again, you should be using an interrupt gate which automatically disables interrupts when it is called (and will automatically restore them as part of the IRET). -- this can cause stack overflow

_________________
## ---- ----- ------ Intel Manuals
OSdev wiki


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Thu Oct 19, 2023 7:36 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
Code:
sched_time_handler:

Interrupt handlers should not start with CLI. If you need interrupts disabled, use an interrupt gate. If you're not using an interrupt gate, interrupts may arrive before the CLI and overflow your stack.

You push EBP but there's no corresponding pop, so the IRET does not pop the correct return address.

You call a C function without saving the registers that may be clobbered by a C function and without clearing the direction flag.

STI before IRET does nothing. IRET pops EFLAGS from the stack, and the stored EFLAGS value determines whether interrupts will be enabled.

Code:
kernel_pf_handler:

Exception handlers should not start with CLI. If you need interrupts disabled, use an interrupt gate. If you're not using an interrupt gate, interrupts may arrive before the CLI and overflow your stack.

You pop the error code into EBX, overwriting the interrupted program's state.

You push EBX without a corresponding pop, causing POPAD and IRET to pop the wrong values from the stack.

You call a C function without clearing the direction flag.

STI before IRET does nothing.

KrotovOSdev wrote:
I think the problem is stack overflow.

Your code has several problems. It's hard to say which one is causing the crash without debugging.

KrotovOSdev wrote:
When interrupt handler is called it calls resched() which leads to creating a stack frame.

Rescheduling should not involve creating anything, just switching from one task to another. (And you probably should not switch tasks on every timer interrupt!)

KrotovOSdev wrote:
If I'm right (but i maybe not) I have somehow to bypass it. How can I do this?

If you don't want to switch tasks on every timer interrupt, keep track of how many timer interrupts have arrived since the last time you've switched tasks.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Fri Oct 20, 2023 1:05 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3195
JAAman wrote:
KrotovOSdev wrote:
I think the problem is stack overflow. When interrupt handler is called it calls resched() which leads to creating a stack frame. If I'm right (but i maybe not) I have somehow to bypass it. How can I do this?

1) Why are you using CLI? this is dangerous, and should never be done. Instead, use a interrupt gate (which automatically disables interrupts for you), otherwise, it is possible to receive an interrupt after calling your handler, but before executing that first instruction (the CLI) -- this can lead to stack overflow

2) Why do you have an STI instruction? The STI instruction should never be used. Ever. There is no place (beyond your bootloader) when an STI instruction should be used. Again, you should be using an interrupt gate which automatically disables interrupts when it is called (and will automatically restore them as part of the IRET). -- this can cause stack overflow


Disagree. OSes that do the whole scheduling process with interrupts disabled end up getting poor interrupt latency. A better design is to only disable interrupts when really necessary, primarily in parts of the IRQ and in spinlocks. Other than that, interrupts should be enabled. That means that the scheduler need locks, and task switches must be postponed until all IRQs are handled.

OTOH, in a multicore OS, sti/cli should only be used in spinlocks and as parts of IRQs. They are no good for protecting code from reentrance problems.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Fri Oct 20, 2023 1:09 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3195
Octocontrabass wrote:
If you don't want to switch tasks on every timer interrupt, keep track of how many timer interrupts have arrived since the last time you've switched tasks.


Timers should not be built by counting tics in IRQs. The PIT can be programmed in one-shot mode and so can be used as a timer. Timers is what you should use for preemption. Other alternative timers is the APIC timer and HPET.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Fri Oct 20, 2023 2:35 pm 
Offline
Member
Member

Joined: Sat Aug 12, 2023 1:48 am
Posts: 40
Location: Nizhny Novgorod, Russia
Octocontrabass wrote:
Code:
sched_time_handler:

Interrupt handlers should not start with CLI. If you need interrupts disabled, use an interrupt gate. If you're not using an interrupt gate, interrupts may arrive before the CLI and overflow your stack.

You push EBP but there's no corresponding pop, so the IRET does not pop the correct return address.

You call a C function without saving the registers that may be clobbered by a C function and without clearing the direction flag.

STI before IRET does nothing. IRET pops EFLAGS from the stack, and the stored EFLAGS value determines whether interrupts will be enabled.

I've tried to push EBP to pass it as a function parameter to free stack frame manually but it didn't help.
Of course, I've deleted all STI and CLI instructions, looks I got the Interrupt gate idea wrong.
Octocontrabass wrote:
Code:
kernel_pf_handler:

You pop the error code into EBX, overwriting the interrupted program's state.

You push EBX without a corresponding pop, causing POPAD and IRET to pop the wrong values from the stack.

How can I pass error code as a function parameter?
Octocontrabass wrote:
You call a C function without clearing the direction flag.

Do I have to clear DF or what? Or just push EFLAGS (I've just added this).
Octocontrabass wrote:
Your code has several problems. It's hard to say which one is causing the crash without debugging.

Rescheduling should not involve creating anything, just switching from one task to another. (And you probably should not switch tasks on every timer interrupt!)

If you don't want to switch tasks on every timer interrupt, keep track of how many timer interrupts have arrived since the last time you've switched tasks.[]

It does not crash but throws exception. How can I not to create anything if it calls C functions and when interrupt happens, CPU puts some values on the stack? I switch tasks every 50 ms (or every 5 timer IRQs).
Maybe the problem with the second task. I have only kernel and idle task now so maybe I should load drivers first?


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Fri Oct 20, 2023 3:00 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
KrotovOSdev wrote:
I've tried to push EBP to pass it as a function parameter to free stack frame manually but it didn't help.

Why are you trying to free the stack frame? You need that stack frame to resume the thread.

KrotovOSdev wrote:
How can I pass error code as a function parameter?

You can do something like this:
Code:
push [esp+32]
call function_name_here ; void function_name_here( uint32_t error_code );
add esp, 4

But then you only have the error code and nothing else. Usually you want access to most of the values on the stack, so you'd push a pointer to the stack and define a struct that matches your stack layout:
Code:
push esp
call function_name_here ; void function_name_here( struct stack_frame * context );
add esp, 4


KrotovOSdev wrote:
Do I have to clear DF or what? Or just push EFLAGS (I've just added this).

You need to clear DF. You don't need to push EFLAGS, the CPU automatically pushes it when entering an interrupt handler, and IRET pops it.

KrotovOSdev wrote:
It does not crash but throws exception.

Close enough. You can use the values on the stack to examine what the CPU was doing when the exception occurred.

KrotovOSdev wrote:
How can I not to create anything if it calls C functions and when interrupt happens, CPU puts some values on the stack?

I don't understand. When an interrupt happens, the CPU puts some values on the stack, and then you use IRET to remove those values and return to the program. That happens with or without task switching.

KrotovOSdev wrote:
Maybe the problem with the second task. I have only kernel and idle task now so maybe I should load drivers first?

You don't need drivers. As long as both tasks can run individually without causing exceptions, you should be able to switch between them without exceptions.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sun Oct 22, 2023 1:42 pm 
Offline
Member
Member

Joined: Sat Aug 12, 2023 1:48 am
Posts: 40
Location: Nizhny Novgorod, Russia
Octocontrabass wrote:
Why are you trying to free the stack frame? You need that stack frame to resume the thread.
OK, I need. I thought it can overflow the stack.
Octocontrabass wrote:
Code:
push esp
call function_name_here ; void function_name_here( struct stack_frame * context );
add esp, 4

I've added this and now I can see all registers. Nice! Thank you

Done
Octocontrabass wrote:
KrotovOSdev wrote:
Maybe the problem with the second task. I have only kernel and idle task now so maybe I should load drivers first?

You don't need drivers. As long as both tasks can run individually without causing exceptions, you should be able to switch between them without exceptions.

Looks like I got it wrong. My "best" scheduling algorithm is priority-based. So it never preempts kernel task until it yields the CPU. It turns out that kernel task switches to itself. Maybe it causes stack overflow.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sun Oct 22, 2023 4:07 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
KrotovOSdev wrote:
It turns out that kernel task switches to itself. Maybe it causes stack overflow.

Performing a task switch that doesn't switch to a different task is a waste of time, but it shouldn't overflow the stack.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Fri Oct 27, 2023 10:22 am 
Offline
Member
Member

Joined: Sat Aug 12, 2023 1:48 am
Posts: 40
Location: Nizhny Novgorod, Russia
Octocontrabass wrote:
KrotovOSdev wrote:
It turns out that kernel task switches to itself. Maybe it causes stack overflow.

Performing a task switch that doesn't switch to a different task is a waste of time, but it shouldn't overflow the stack.

I know it but my main task is to perform a context switching and load some drivers.
Trying to fix stack overflow day (unknown)...


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Fri Oct 27, 2023 6:37 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
How do you know it's a stack overflow and not something else?


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sat Oct 28, 2023 11:07 am 
Offline
Member
Member

Joined: Sat Aug 12, 2023 1:48 am
Posts: 40
Location: Nizhny Novgorod, Russia
Octocontrabass wrote:
How do you know it's a stack overflow and not something else?

If I use QEMU with "-d int" argument, my OS causes many many page faults.
This were happening before I rewrote ISR entry points in assembly. Now it throws GPF exception on the second timer interrupt. I guess that the problem may be with saving registers. I'm working on it.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sat Oct 28, 2023 2:36 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
KrotovOSdev wrote:
If I use QEMU with "-d int" argument, my OS causes many many page faults.

How do you know the page faults are caused by a stack overflow? What does your page fault handler do in response to the page faults?


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sun Oct 29, 2023 11:16 am 
Offline
Member
Member

Joined: Sat Aug 12, 2023 1:48 am
Posts: 40
Location: Nizhny Novgorod, Russia
Octocontrabass wrote:
How do you know the page faults are caused by a stack overflow? What does your page fault handler do in response to the page faults?

I reserve some memory for kernel needs and, of course, for stack. Page fault may notice kernel that it needs more memory. Or it also may signal about wrong EIP but I think that wrong EIPl causes another exception exception, isn't it?


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sun Oct 29, 2023 1:39 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
KrotovOSdev wrote:
Page fault may notice kernel that it needs more memory.

Or it means there is a bug that causes your kernel to access an invalid address.

KrotovOSdev wrote:
Or it also may signal about wrong EIP but I think that wrong EIPl causes another exception exception, isn't it?

There's no exception for wrong EIP because the CPU doesn't know when EIP is wrong. You might get a page fault when EIP is wrong, if EIP points to a page that isn't executable.


Top
 Profile  
 
 Post subject: Re: Confused about context switch
PostPosted: Sun Oct 29, 2023 2:19 pm 
Offline
Member
Member

Joined: Sat Aug 12, 2023 1:48 am
Posts: 40
Location: Nizhny Novgorod, Russia
Octocontrabass wrote:
Or it means there is a bug that causes your kernel to access an invalid address.

There's no exception for wrong EIP because the CPU doesn't know when EIP is wrong. You might get a page fault when EIP is wrong, if EIP points to a page that isn't executable.


It may be a bug, I know. I am trying to understand the problem.
According to an address which is near to 1MB, it should be mapped. But now it just causes GPF.

Even gdb doesn't work with my kernel. When I add -g option for GCC and NASM and use GCC remote debugger with qemu, it says that there is no symbol table so I can't set breakpoints...


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 79 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: SemrushBot [Bot] and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group