Page 1 of 1

Strang Bugs in my OS

Posted: Thu Nov 01, 2018 9:06 am
by LIC
Hi all,

I am trying to develop a small kernel but run into several very strange bugs that I really don't understand.
The first main one is with isr handler, which is supposed to print the exception error message but prints the memory address 0x00000000 (I checked the asm code).
Then there is some strange behavior with the keyboard: when a key is pressed it prints the corresponding character (everything normal up to here) but then runs into GP fault, except for the 'L' key!!
And the most strange thing is that when I add/remove some files, or when I comment/uncomment some lines of code (even if these lines are executed after the buggy one), it "turns" on or off the bug/bugs I mentioned before...

Here is a link to my complete kernel code: https://github.com/leonard-limon/osdev

Does anyone have an explanation or even a clew to these strange bugs?

Regards

Re: Strang Bugs in my OS

Posted: Thu Nov 01, 2018 10:35 am
by lkurusa
This looks like your code is overwriting something either via a stack overflow or a mis/unchecked pointer. Good luck finding the bug!

Re: Strang Bugs in my OS

Posted: Thu Nov 01, 2018 10:54 am
by LIC
hi and thanks for your reply.
I'm afraid this is not a stack overflow issue because that stack pointer when the bug occurs is 0x1ff7c and the kernel code only goes to roughly 0x8000...
Tell me if you think I am wrong

Re: Strang Bugs in my OS

Posted: Thu Nov 01, 2018 11:08 am
by eryjus
I tend to agree with lkurusa. This sounds like a stack problem to me as well -- such as alignment or overwriting or structure packing discrepancy in asm vs C. For example are you pushing your segment registers in asm and expecting them to be 16 bits in a C structure?

I would get a copy of Bochs and use the internal debugger to step-check your code. If you still need help, some more specifics would help.

Re: Strang Bugs in my OS

Posted: Thu Nov 01, 2018 12:59 pm
by PoisonNinja
Hey, it looks like you’re passing the registers by value to the interrupt handler so when you return the values are corrupted since functions are allowed to modify their parameters.

Maybe try using a pointer to the registers you pushed onto the stack.

Re: Strang Bugs in my OS

Posted: Thu Nov 01, 2018 5:15 pm
by MichaelPetch
I'd start earlier in the process. Given that removing and deleting files may make things work, i'd make sure the bootloader is actually reading the entire kernel into memory. One likely candidate is that your kernel image is larger than the number of 512 byte sectors you load.

This would explain why in your question you say the error message doesn't print. The .data section is placed after .text. If the .data section isn't fully loaded into memory it is probably reading 0x00 from memory which makes the strings appear to have nothing in them and thus not displayed. Printing numbers would work because those are likely printed without references to the .data section. Of course it is also possible that not all of your .text section is loaded into memory so that could cause functions to fail if the instructions aren't loaded.

This is a very common problem with the questions that get asked on Stackoverflow when someone has developed their own bootloader instead of using something like GRUB/multiboot.

There may well be other serious issues in your code, but I think you need to start eliminating the large scale problems before tackling the smaller bugs (like interrupt handling etc)

Likely not related to your issues (but it is still an issue) If you make your own bootloader you should also create a mechanism where you can zero the .bss section out as memory isn't necessarily guaranteed to be filled with zeroes already. Usually you can create a linker script that sets a symbol to the beginning of the BSS section and the end. Your code can then iterate over that memory and zero it out. If you used GRUB/multiboot this is done for you when loading your ELF executable into memory.

Re: Strang Bugs in my OS

Posted: Thu Nov 01, 2018 7:34 pm
by MichaelPetch
Your interrupt routines don't seem to be re-entrant so I think you should for the time being be sending the EOIs after you call them in your irq handler (not before). Your keyboard handler shouldn't be polling port 0x64 in a loop. When you get a keyboard interrupt you can read the keyboard byte right from port 0x60.

Re: Strang Bugs in my OS

Posted: Fri Nov 02, 2018 2:36 am
by nullplan
PoisonNinja wrote:Hey, it looks like you’re passing the registers by value to the interrupt handler so when you return the values are corrupted since functions are allowed to modify their parameters.

Maybe try using a pointer to the registers you pushed onto the stack.
I don't know the 32-bit ABI well enough to know how structure passing works, but in the 64-bit ABI, large structures are passed as their pointer. That is, the caller allocates space for a temporary copy, copies the argument there, then passes in a pointer to the copy. And cleans it up afterwards. Which doesn't happen here, so I assume the parameters are already passed in wrong. Your advice is good, though!
MichaelPetch wrote:Your interrupt routines don't seem to be re-entrant so I think you should for the time being be sending the EOIs after you call them in your irq handler (not before). Your keyboard handler shouldn't be polling port 0x64 in a loop. When you get a keyboard interrupt you can read the keyboard byte right from port 0x60.
Not a problem, as the IF remains at 0 the entire time. PIC can re-issue as many interrupts as it likes, the CPU won't recognize them until the IRET instruction.

Re: Strang Bugs in my OS

Posted: Fri Nov 02, 2018 5:25 am
by LIC
Thank you for all your replies!
Indeed my loader was not loading enough blocks to load all the kernel... I feel a bit dumb right now #-o . Now that the kernel is fully loaded the exception message is showing perfectly!
If you make your own bootloader you should also create a mechanism where you can zero the .bss section out as memory isn't necessarily guaranteed to be filled with zeroes already
I am not sure what you mean by the .bss section, where is this located in memory ?

I still have the keyboard issue though: depending on what character I type, it goes into General Protection Fault or not...

Re: Strang Bugs in my OS

Posted: Fri Nov 02, 2018 6:30 am
by LIC
Ok I looked at the assembler code of my kernel and here's what happens when I call the print or putc function inside my interrupt handler ...

Code: Select all

extern void irq_handler(const registers_t r) {

	// if interrupt was raised by slave PIC send EOI to slave
	if (r.int_no >= 40) {
        outb(0xa0, 0x20);
	}

	// send EOI to master
	outb(0x20, 0x20);

	// if interrupt handler exists, run it
	//if (interrupt_handlers[r.int_no]) {
    //    interrupt_handlers[r.int_no](r);
	//}

	print("clk\n");

}

Code: Select all

pusha
000000B6  1E                push ds
000000B7  06                push es
000000B8  0FA0              push fs
000000BA  0FA8              push gs
000000BC  66B81000          mov ax,0x10
000000C0  8ED8              mov ds,ax
000000C2  8EC0              mov es,ax
000000C4  8EE0              mov fs,ax
000000C6  8EE8              mov gs,ax
000000C8  8925E2100000      mov [0x10e2],esp
000000CE  E8E1140000        call 0x15b4
000000D3  0FA9              pop gs
000000D5  0FA1              pop fs
000000D7  07                pop es
000000D8  1F                pop ds
000000D9  61                popa
000000DA  81C408000000      add esp,0x8
000000E0  FB                sti
000000E1  CF                iret

Code: Select all

000015B4  83EC0C            sub esp,byte +0xc
000015B7  837C244027        cmp dword [esp+0x40],byte +0x27
000015BC  7612              jna 0x15d0
000015BE  83EC08            sub esp,byte +0x8
000015C1  6A20              push byte +0x20
000015C3  68A0000000        push dword 0xa0
000015C8  E822EDFFFF        call 0x1000002ef
000015CD  83C410            add esp,byte +0x10
000015D0  83EC08            sub esp,byte +0x8
000015D3  6A20              push byte +0x20
000015D5  6A20              push byte +0x20
000015D7  E813EDFFFF        call 0x1000002ef
000015DC  C7442420CB2D0000  mov dword [esp+0x20],0x2dcb
000015E4  83C41C            add esp,byte +0x1c
000015E7  E9F0010000        jmp 0x17dc
The last instruction (jmp 0x17dc) jumps to the print function but it sets the argument first (mov [esp+0x20], 0x2dcb) but at esp+0x20 there is the value of GS which is poped before the iret instruction. So when 0x2dcb is poped into GS I obviously get a General protection fault.

Do you know how to tell the compiler to avoid that?

Re: Strang Bugs in my OS

Posted: Fri Nov 02, 2018 6:44 am
by Octocontrabass
Pass the registers_t struct by a pointer instead of by value.

Even though you've defined it as const, that just means you can't write any code that changes its value; the compiler is still free to reuse that stack space for something else. ("Why?" Because the System V ABI says so.)

Re: Strang Bugs in my OS

Posted: Fri Nov 02, 2018 7:48 am
by LIC
Oook, that works now! Thank you for all your replies