Page 5 of 6

Re: Confused about context switch

Posted: Sun Oct 29, 2023 3:53 pm
by thewrongchristian
KrotovOSdev wrote: Even gdb doesn't work with my kernel. When I add -g option for GCC and NASM and use GCC remote debugger with qemu, it says that there is no symbol table so I can't set breakpoints...
You need to load the symbol table from your kernel image.

Say, you've started qemu as such:

Code: Select all

$ qemu-system-i386 -s -S -kernel mykernel
When connecting with gdb, you need to give gdb the same kernel image so it can read the symbol table that matches your running kernel:

Code: Select all

(gdb) target remote localhost:1234
(gdb) symbol-file mykernel
Once this is done, you can set breakpoints symbolically:

Code: Select all

(gdb) break _start
(gdb) continue

Re: Confused about context switch

Posted: Mon Oct 30, 2023 2:05 pm
by KrotovOSdev
thewrongchristian wrote:
You need to load the symbol table from your kernel image.
Ok, thank you. Now it sounds as I can debug it.

Re: Confused about context switch

Posted: Wed Nov 01, 2023 2:18 pm
by KrotovOSdev
I have this code for ISR entry points:

Code: Select all

global sched_time_handler
extern do_timer_int
align 4

sched_time_handler:
    pushad
    push esp
    cld
    call do_timer_int
    add esp, 4
    popad
    iret
And this is the ISR handler itself:

Code: Select all

void do_timer_int(interrupt_frame* context) {
    timer_time_passed += 10;
    printf("%d ", timer_time_passed);

    if (timer_time_passed >= STD_TIMESLICE) {
        timer_time_passed = 0;
        resched();
    }

    send_eoi(0);
}
On the second interrupt it trows GPF. What can be wrong?

Re: Confused about context switch

Posted: Wed Nov 01, 2023 7:35 pm
by Octocontrabass
You need to send the EOI before you switch tasks, but I don't see any problems that would cause #GP. Why do you think the problem is in this part of your code? What is the CPU state when the exception occurs? (QEMU with "-d int" will log the CPU state.)

Re: Confused about context switch

Posted: Thu Nov 02, 2023 7:22 am
by KrotovOSdev
Octocontrabass wrote:You need to send the EOI before you switch tasks, but I don't see any problems that would cause #GP. Why do you think the problem is in this part of your code? What is the CPU state when the exception occurs? (QEMU with "-d int" will log the CPU state.)
I send EOI after the if statement because I don't switch tasks on every INT0.
Not the timer ISR throws GPF but the Page Fault handler. The "iret" instruction at the end of ISR throws PF. I think it tries to return to the wrong address but I don't know why it happens after the second timer interrupt.

Re: Confused about context switch

Posted: Thu Nov 02, 2023 10:37 am
by nullplan
KrotovOSdev wrote:I send EOI after the if statement because I don't switch tasks on every INT0.
You know, you can just send the EOI before the if statement. It doesn't change anything in the CPU, only in the PIC. Also, and this is getting into advanced territory here, you can program the PIT to interrupt you when you actually want to do something. Rather than counting interrupts, just get the interrupt when you need it. Unless you are bumping up against the frequency limit of the PIT, but the lowest the PIT can go is ca. 18Hz, and if you were dividing this by 10, you'd get to 1.8Hz, which would feel real sluggish.
KrotovOSdev wrote:The "iret" instruction at the end of ISR throws PF.
Weird. The code looks alright. Try logging ESP on entry and exit.

Re: Confused about context switch

Posted: Thu Nov 02, 2023 7:34 pm
by Octocontrabass
KrotovOSdev wrote:I send EOI after the if statement because I don't switch tasks on every INT0.
You still need to send the EOI before you switch tasks. The new task will not send an EOI, so you will not receive any more timer interrupts.
KrotovOSdev wrote:Not the timer ISR throws GPF but the Page Fault handler. The "iret" instruction at the end of ISR throws PF. I think it tries to return to the wrong address but I don't know why it happens after the second timer interrupt.
What is the #PF error code? What address is in CR2? What are the contents of the stack?

Re: Confused about context switch

Posted: Sat Nov 04, 2023 8:33 am
by KrotovOSdev
Octocontrabass wrote:
KrotovOSdev wrote:I send EOI after the if statement because I don't switch tasks on every INT0.
You still need to send the EOI before you switch tasks. The new task will not send an EOI, so you will not receive any more timer interrupts.

What is the #PF error code? What address is in CR2? What are the contents of the stack?
Now I send EOI before if statement. The #PF error code is 0, CR2 value is 0x2badb0f7 which is a bit strange. Looks like something wrong with esp.

Code: Select all

kernel_pf_handler:
    pushad
    push esp
    cld
    call do_page_fault
    add esp, 4
    popad
    iret
Is this right #PF handler entry point? I have the similar for #INT0 but with different functions names.

Re: Confused about context switch

Posted: Sat Nov 04, 2023 1:38 pm
by nullplan
KrotovOSdev wrote:Now I send EOI before if statement. The #PF error code is 0, CR2 value is 0x2badb0f7 which is a bit strange. Looks like something wrong with esp.
So error code 0 means the page fault was caused by a memory read in kernel mode. If you are not using the NX feature, the memory read could also be an instruction fetch. The CR2 address looks suspiciously close to the multiboot magic value of 0x2badb002, which is in EAX when the bootloader passes control to your kernel. Is it possible that you only ever overwrote AL in your kernel, and your ESP at the IRET instruction ends up pointing to the EAX value rather than EIP?

Again, I would print ESP on entry and exit of the handler.

Re: Confused about context switch

Posted: Sat Nov 04, 2023 2:00 pm
by Octocontrabass
KrotovOSdev wrote:Looks like something wrong with esp.
Okay. What is ESP? Is the #PF happening before or after you switch tasks?
KrotovOSdev wrote:Is this right #PF handler entry point?
You need to remove the error code from the stack before IRET.

Re: Confused about context switch

Posted: Sun Nov 05, 2023 2:47 am
by KrotovOSdev
Octocontrabass wrote: Okay. What is ESP? Is the #PF happening before or after you switch tasks?

You need to remove the error code from the stack before IRET.
As I expected, I have to remove the error code.

Code: Select all

kernel_pf_handler:
    pushad
    push esp
    cld
    call do_page_fault
    add esp, 4
    popad
    add esp, 4
    iret
The #PF happens without switching tasks. I want to switch tasks every 50 ms (every 5 timer interrupts). Is this a good practice to switch tasks not every single timer interrupt. The #PF happens after the second timer interrupt.
The ESP is 0x44FC when the #PF happens which is okay in my OS. The previous interrupt's (timer interrupt) ESP is 0x44F8. Looks like I miss some important thing:)

Re: Confused about context switch

Posted: Sun Nov 05, 2023 3:30 am
by KrotovOSdev
nullplan wrote:[So error code 0 means the page fault was caused by a memory read in kernel mode. If you are not using the NX feature, the memory read could also be an instruction fetch. The CR2 address looks suspiciously close to the multiboot magic value of 0x2badb002, which is in EAX when the bootloader passes control to your kernel. Is it possible that you only ever overwrote AL in your kernel, and your ESP at the IRET instruction ends up pointing to the EAX value rather than EIP?

Again, I would print ESP on entry and exit of the handler.
I was thinking about it. Maybe I use the stack wrong.
The next strange thing is that if I add while(1) at the end of the kmain() function, timer interrupts work fine. But I still can't switch tasks.
Maybe I know the way how to fix that.

Re: Confused about context switch

Posted: Sun Nov 05, 2023 12:37 pm
by thewrongchristian
KrotovOSdev wrote:
nullplan wrote:[So error code 0 means the page fault was caused by a memory read in kernel mode. If you are not using the NX feature, the memory read could also be an instruction fetch. The CR2 address looks suspiciously close to the multiboot magic value of 0x2badb002, which is in EAX when the bootloader passes control to your kernel. Is it possible that you only ever overwrote AL in your kernel, and your ESP at the IRET instruction ends up pointing to the EAX value rather than EIP?

Again, I would print ESP on entry and exit of the handler.
I was thinking about it. Maybe I use the stack wrong.
The next strange thing is that if I add while(1) at the end of the kmain() function, timer interrupts work fine. But I still can't switch tasks.
Maybe I know the way how to fix that.
I would perhaps suggest starting small, create multiple tasks, and switch between them purely cooperatively, without involving timer interrupts. You need to make sure your task switching is solid before trying to add any pre-emption.

In fact, start with interrupts disabled, and handle only faults such as GPF and page faults, which you can explicitly control.

Once you're confident your exception handling code is working correctly, then you can install interrupt handlers. Perhaps start with keyboard handler, so you can control when the interrupts happen as well as providing some input.

Re: Confused about context switch

Posted: Sun Nov 05, 2023 1:29 pm
by Octocontrabass
KrotovOSdev wrote:The ESP is 0x44FC when the #PF happens which is okay in my OS. The previous interrupt's (timer interrupt) ESP is 0x44F8.
That's a very unusual place to put your stack.
KrotovOSdev wrote:The next strange thing is that if I add while(1) at the end of the kmain() function, timer interrupts work fine.
It sounds like the problem is in the code that calls kmain().

Re: Confused about context switch

Posted: Fri Nov 10, 2023 11:13 am
by KrotovOSdev
thewrongchristian wrote: Once you're confident your exception handling code is working correctly, then you can install interrupt handlers. Perhaps start with keyboard handler, so you can control when the interrupts happen as well as providing some input.
My keyboard handler works fine. Hm
Octocontrabass wrote: That's a very unusual place to put your stack.

It sounds like the problem is in the code that calls kmain().
I put stack to 0x4500 during the init process, when paging is not enabled. Should I put it to another address?

If kmain() function returns, my code just execute "cli; hlt".