Confused about context switch

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Confused about context switch

Post by thewrongchristian »

KrotovOSdev wrote: Even gdb doesn't work with my kernel. When I add -g option for GCC and NASM and use GCC remote debugger with qemu, it says that there is no symbol table so I can't set breakpoints...
You need to load the symbol table from your kernel image.

Say, you've started qemu as such:

Code: Select all

$ qemu-system-i386 -s -S -kernel mykernel
When connecting with gdb, you need to give gdb the same kernel image so it can read the symbol table that matches your running kernel:

Code: Select all

(gdb) target remote localhost:1234
(gdb) symbol-file mykernel
Once this is done, you can set breakpoints symbolically:

Code: Select all

(gdb) break _start
(gdb) continue
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

thewrongchristian wrote:
You need to load the symbol table from your kernel image.
Ok, thank you. Now it sounds as I can debug it.
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

I have this code for ISR entry points:

Code: Select all

global sched_time_handler
extern do_timer_int
align 4

sched_time_handler:
    pushad
    push esp
    cld
    call do_timer_int
    add esp, 4
    popad
    iret
And this is the ISR handler itself:

Code: Select all

void do_timer_int(interrupt_frame* context) {
    timer_time_passed += 10;
    printf("%d ", timer_time_passed);

    if (timer_time_passed >= STD_TIMESLICE) {
        timer_time_passed = 0;
        resched();
    }

    send_eoi(0);
}
On the second interrupt it trows GPF. What can be wrong?
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

You need to send the EOI before you switch tasks, but I don't see any problems that would cause #GP. Why do you think the problem is in this part of your code? What is the CPU state when the exception occurs? (QEMU with "-d int" will log the CPU state.)
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote:You need to send the EOI before you switch tasks, but I don't see any problems that would cause #GP. Why do you think the problem is in this part of your code? What is the CPU state when the exception occurs? (QEMU with "-d int" will log the CPU state.)
I send EOI after the if statement because I don't switch tasks on every INT0.
Not the timer ISR throws GPF but the Page Fault handler. The "iret" instruction at the end of ISR throws PF. I think it tries to return to the wrong address but I don't know why it happens after the second timer interrupt.
nullplan
Member
Member
Posts: 1789
Joined: Wed Aug 30, 2017 8:24 am

Re: Confused about context switch

Post by nullplan »

KrotovOSdev wrote:I send EOI after the if statement because I don't switch tasks on every INT0.
You know, you can just send the EOI before the if statement. It doesn't change anything in the CPU, only in the PIC. Also, and this is getting into advanced territory here, you can program the PIT to interrupt you when you actually want to do something. Rather than counting interrupts, just get the interrupt when you need it. Unless you are bumping up against the frequency limit of the PIT, but the lowest the PIT can go is ca. 18Hz, and if you were dividing this by 10, you'd get to 1.8Hz, which would feel real sluggish.
KrotovOSdev wrote:The "iret" instruction at the end of ISR throws PF.
Weird. The code looks alright. Try logging ESP on entry and exit.
Carpe diem!
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:I send EOI after the if statement because I don't switch tasks on every INT0.
You still need to send the EOI before you switch tasks. The new task will not send an EOI, so you will not receive any more timer interrupts.
KrotovOSdev wrote:Not the timer ISR throws GPF but the Page Fault handler. The "iret" instruction at the end of ISR throws PF. I think it tries to return to the wrong address but I don't know why it happens after the second timer interrupt.
What is the #PF error code? What address is in CR2? What are the contents of the stack?
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote:
KrotovOSdev wrote:I send EOI after the if statement because I don't switch tasks on every INT0.
You still need to send the EOI before you switch tasks. The new task will not send an EOI, so you will not receive any more timer interrupts.

What is the #PF error code? What address is in CR2? What are the contents of the stack?
Now I send EOI before if statement. The #PF error code is 0, CR2 value is 0x2badb0f7 which is a bit strange. Looks like something wrong with esp.

Code: Select all

kernel_pf_handler:
    pushad
    push esp
    cld
    call do_page_fault
    add esp, 4
    popad
    iret
Is this right #PF handler entry point? I have the similar for #INT0 but with different functions names.
nullplan
Member
Member
Posts: 1789
Joined: Wed Aug 30, 2017 8:24 am

Re: Confused about context switch

Post by nullplan »

KrotovOSdev wrote:Now I send EOI before if statement. The #PF error code is 0, CR2 value is 0x2badb0f7 which is a bit strange. Looks like something wrong with esp.
So error code 0 means the page fault was caused by a memory read in kernel mode. If you are not using the NX feature, the memory read could also be an instruction fetch. The CR2 address looks suspiciously close to the multiboot magic value of 0x2badb002, which is in EAX when the bootloader passes control to your kernel. Is it possible that you only ever overwrote AL in your kernel, and your ESP at the IRET instruction ends up pointing to the EAX value rather than EIP?

Again, I would print ESP on entry and exit of the handler.
Carpe diem!
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:Looks like something wrong with esp.
Okay. What is ESP? Is the #PF happening before or after you switch tasks?
KrotovOSdev wrote:Is this right #PF handler entry point?
You need to remove the error code from the stack before IRET.
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

Octocontrabass wrote: Okay. What is ESP? Is the #PF happening before or after you switch tasks?

You need to remove the error code from the stack before IRET.
As I expected, I have to remove the error code.

Code: Select all

kernel_pf_handler:
    pushad
    push esp
    cld
    call do_page_fault
    add esp, 4
    popad
    add esp, 4
    iret
The #PF happens without switching tasks. I want to switch tasks every 50 ms (every 5 timer interrupts). Is this a good practice to switch tasks not every single timer interrupt. The #PF happens after the second timer interrupt.
The ESP is 0x44FC when the #PF happens which is okay in my OS. The previous interrupt's (timer interrupt) ESP is 0x44F8. Looks like I miss some important thing:)
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

nullplan wrote:[So error code 0 means the page fault was caused by a memory read in kernel mode. If you are not using the NX feature, the memory read could also be an instruction fetch. The CR2 address looks suspiciously close to the multiboot magic value of 0x2badb002, which is in EAX when the bootloader passes control to your kernel. Is it possible that you only ever overwrote AL in your kernel, and your ESP at the IRET instruction ends up pointing to the EAX value rather than EIP?

Again, I would print ESP on entry and exit of the handler.
I was thinking about it. Maybe I use the stack wrong.
The next strange thing is that if I add while(1) at the end of the kmain() function, timer interrupts work fine. But I still can't switch tasks.
Maybe I know the way how to fix that.
thewrongchristian
Member
Member
Posts: 426
Joined: Tue Apr 03, 2018 2:44 am

Re: Confused about context switch

Post by thewrongchristian »

KrotovOSdev wrote:
nullplan wrote:[So error code 0 means the page fault was caused by a memory read in kernel mode. If you are not using the NX feature, the memory read could also be an instruction fetch. The CR2 address looks suspiciously close to the multiboot magic value of 0x2badb002, which is in EAX when the bootloader passes control to your kernel. Is it possible that you only ever overwrote AL in your kernel, and your ESP at the IRET instruction ends up pointing to the EAX value rather than EIP?

Again, I would print ESP on entry and exit of the handler.
I was thinking about it. Maybe I use the stack wrong.
The next strange thing is that if I add while(1) at the end of the kmain() function, timer interrupts work fine. But I still can't switch tasks.
Maybe I know the way how to fix that.
I would perhaps suggest starting small, create multiple tasks, and switch between them purely cooperatively, without involving timer interrupts. You need to make sure your task switching is solid before trying to add any pre-emption.

In fact, start with interrupts disabled, and handle only faults such as GPF and page faults, which you can explicitly control.

Once you're confident your exception handling code is working correctly, then you can install interrupt handlers. Perhaps start with keyboard handler, so you can control when the interrupts happen as well as providing some input.
Octocontrabass
Member
Member
Posts: 5560
Joined: Mon Mar 25, 2013 7:01 pm

Re: Confused about context switch

Post by Octocontrabass »

KrotovOSdev wrote:The ESP is 0x44FC when the #PF happens which is okay in my OS. The previous interrupt's (timer interrupt) ESP is 0x44F8.
That's a very unusual place to put your stack.
KrotovOSdev wrote:The next strange thing is that if I add while(1) at the end of the kmain() function, timer interrupts work fine.
It sounds like the problem is in the code that calls kmain().
KrotovOSdev
Member
Member
Posts: 40
Joined: Sat Aug 12, 2023 1:48 am
Location: Nizhny Novgorod, Russia

Re: Confused about context switch

Post by KrotovOSdev »

thewrongchristian wrote: Once you're confident your exception handling code is working correctly, then you can install interrupt handlers. Perhaps start with keyboard handler, so you can control when the interrupts happen as well as providing some input.
My keyboard handler works fine. Hm
Octocontrabass wrote: That's a very unusual place to put your stack.

It sounds like the problem is in the code that calls kmain().
I put stack to 0x4500 during the init process, when paging is not enabled. Should I put it to another address?

If kmain() function returns, my code just execute "cli; hlt".
Post Reply