Page 1 of 1
Strang Bugs in my OS
Posted: Thu Nov 01, 2018 9:06 am
by LIC
Hi all,
I am trying to develop a small kernel but run into several very strange bugs that I really don't understand.
The first main one is with isr handler, which is supposed to print the exception error message but prints the memory address 0x00000000 (I checked the asm code).
Then there is some strange behavior with the keyboard: when a key is pressed it prints the corresponding character (everything normal up to here) but then runs into GP fault, except for the 'L' key!!
And the most strange thing is that when I add/remove some files, or when I comment/uncomment some lines of code (even if these lines are executed after the buggy one), it "turns" on or off the bug/bugs I mentioned before...
Here is a link to my complete kernel code:
https://github.com/leonard-limon/osdev
Does anyone have an explanation or even a clew to these strange bugs?
Regards
Re: Strang Bugs in my OS
Posted: Thu Nov 01, 2018 10:35 am
by lkurusa
This looks like your code is overwriting something either via a stack overflow or a mis/unchecked pointer. Good luck finding the bug!
Re: Strang Bugs in my OS
Posted: Thu Nov 01, 2018 10:54 am
by LIC
hi and thanks for your reply.
I'm afraid this is not a stack overflow issue because that stack pointer when the bug occurs is 0x1ff7c and the kernel code only goes to roughly 0x8000...
Tell me if you think I am wrong
Re: Strang Bugs in my OS
Posted: Thu Nov 01, 2018 11:08 am
by eryjus
I tend to agree with lkurusa. This sounds like a stack problem to me as well -- such as alignment or overwriting or structure packing discrepancy in asm vs C. For example are you pushing your segment registers in asm and expecting them to be 16 bits in a C structure?
I would get a copy of Bochs and use the internal debugger to step-check your code. If you still need help, some more specifics would help.
Re: Strang Bugs in my OS
Posted: Thu Nov 01, 2018 12:59 pm
by PoisonNinja
Hey, it looks like you’re passing the registers by value to the interrupt handler so when you return the values are corrupted since functions are allowed to modify their parameters.
Maybe try using a pointer to the registers you pushed onto the stack.
Re: Strang Bugs in my OS
Posted: Thu Nov 01, 2018 5:15 pm
by MichaelPetch
I'd start earlier in the process. Given that removing and deleting files may make things work, i'd make sure the bootloader is actually reading the entire kernel into memory. One likely candidate is that your kernel image is larger than the number of 512 byte sectors you load.
This would explain why in your question you say the error message doesn't print. The .data section is placed after .text. If the .data section isn't fully loaded into memory it is probably reading 0x00 from memory which makes the strings appear to have nothing in them and thus not displayed. Printing numbers would work because those are likely printed without references to the .data section. Of course it is also possible that not all of your .text section is loaded into memory so that could cause functions to fail if the instructions aren't loaded.
This is a very common problem with the questions that get asked on Stackoverflow when someone has developed their own bootloader instead of using something like GRUB/multiboot.
There may well be other serious issues in your code, but I think you need to start eliminating the large scale problems before tackling the smaller bugs (like interrupt handling etc)
Likely not related to your issues (but it is still an issue) If you make your own bootloader you should also create a mechanism where you can zero the .bss section out as memory isn't necessarily guaranteed to be filled with zeroes already. Usually you can create a linker script that sets a symbol to the beginning of the BSS section and the end. Your code can then iterate over that memory and zero it out. If you used GRUB/multiboot this is done for you when loading your ELF executable into memory.
Re: Strang Bugs in my OS
Posted: Thu Nov 01, 2018 7:34 pm
by MichaelPetch
Your interrupt routines don't seem to be re-entrant so I think you should for the time being be sending the EOIs after you call them in your irq handler (not before). Your keyboard handler shouldn't be polling port 0x64 in a loop. When you get a keyboard interrupt you can read the keyboard byte right from port 0x60.
Re: Strang Bugs in my OS
Posted: Fri Nov 02, 2018 2:36 am
by nullplan
PoisonNinja wrote:Hey, it looks like you’re passing the registers by value to the interrupt handler so when you return the values are corrupted since functions are allowed to modify their parameters.
Maybe try using a pointer to the registers you pushed onto the stack.
I don't know the 32-bit ABI well enough to know how structure passing works, but in the 64-bit ABI, large structures are passed as their pointer. That is, the caller allocates space for a temporary copy, copies the argument there, then passes in a pointer to the copy. And cleans it up afterwards. Which doesn't happen here, so I assume the parameters are already passed in wrong. Your advice is good, though!
MichaelPetch wrote:Your interrupt routines don't seem to be re-entrant so I think you should for the time being be sending the EOIs after you call them in your irq handler (not before). Your keyboard handler shouldn't be polling port 0x64 in a loop. When you get a keyboard interrupt you can read the keyboard byte right from port 0x60.
Not a problem, as the IF remains at 0 the entire time. PIC can re-issue as many interrupts as it likes, the CPU won't recognize them until the IRET instruction.
Re: Strang Bugs in my OS
Posted: Fri Nov 02, 2018 5:25 am
by LIC
Thank you for all your replies!
Indeed my loader was not loading enough blocks to load all the kernel... I feel a bit dumb right now
. Now that the kernel is fully loaded the exception message is showing perfectly!
If you make your own bootloader you should also create a mechanism where you can zero the .bss section out as memory isn't necessarily guaranteed to be filled with zeroes already
I am not sure what you mean by the .bss section, where is this located in memory ?
I still have the keyboard issue though: depending on what character I type, it goes into General Protection Fault or not...
Re: Strang Bugs in my OS
Posted: Fri Nov 02, 2018 6:30 am
by LIC
Ok I looked at the assembler code of my kernel and here's what happens when I call the print or putc function inside my interrupt handler ...
Code: Select all
extern void irq_handler(const registers_t r) {
// if interrupt was raised by slave PIC send EOI to slave
if (r.int_no >= 40) {
outb(0xa0, 0x20);
}
// send EOI to master
outb(0x20, 0x20);
// if interrupt handler exists, run it
//if (interrupt_handlers[r.int_no]) {
// interrupt_handlers[r.int_no](r);
//}
print("clk\n");
}
Code: Select all
pusha
000000B6 1E push ds
000000B7 06 push es
000000B8 0FA0 push fs
000000BA 0FA8 push gs
000000BC 66B81000 mov ax,0x10
000000C0 8ED8 mov ds,ax
000000C2 8EC0 mov es,ax
000000C4 8EE0 mov fs,ax
000000C6 8EE8 mov gs,ax
000000C8 8925E2100000 mov [0x10e2],esp
000000CE E8E1140000 call 0x15b4
000000D3 0FA9 pop gs
000000D5 0FA1 pop fs
000000D7 07 pop es
000000D8 1F pop ds
000000D9 61 popa
000000DA 81C408000000 add esp,0x8
000000E0 FB sti
000000E1 CF iret
Code: Select all
000015B4 83EC0C sub esp,byte +0xc
000015B7 837C244027 cmp dword [esp+0x40],byte +0x27
000015BC 7612 jna 0x15d0
000015BE 83EC08 sub esp,byte +0x8
000015C1 6A20 push byte +0x20
000015C3 68A0000000 push dword 0xa0
000015C8 E822EDFFFF call 0x1000002ef
000015CD 83C410 add esp,byte +0x10
000015D0 83EC08 sub esp,byte +0x8
000015D3 6A20 push byte +0x20
000015D5 6A20 push byte +0x20
000015D7 E813EDFFFF call 0x1000002ef
000015DC C7442420CB2D0000 mov dword [esp+0x20],0x2dcb
000015E4 83C41C add esp,byte +0x1c
000015E7 E9F0010000 jmp 0x17dc
The last instruction (jmp 0x17dc) jumps to the print function but it sets the argument first (mov [esp+0x20], 0x2dcb) but at esp+0x20 there is the value of GS which is poped before the iret instruction. So when 0x2dcb is poped into GS I obviously get a General protection fault.
Do you know how to tell the compiler to avoid that?
Re: Strang Bugs in my OS
Posted: Fri Nov 02, 2018 6:44 am
by Octocontrabass
Pass the registers_t struct by a pointer instead of by value.
Even though you've defined it as const, that just means you can't write any code that changes its value; the compiler is still free to reuse that stack space for something else. ("Why?" Because the
System V ABI says so.)
Re: Strang Bugs in my OS
Posted: Fri Nov 02, 2018 7:48 am
by LIC
Oook, that works now! Thank you for all your replies