#GF triggered by Keypress on IRET (IDT, IRQ)

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
travisjayday
Posts: 4
Joined: Sun Nov 26, 2017 2:06 pm

#GF triggered by Keypress on IRET (IDT, IRQ)

Post by travisjayday »

I'm in quite the dilemma, for this is beginning to drive me crazy.

As indicated by the title, I've installed an IDT with CPU exception handlers and a keyboard interrupt service routine. The CPU exceptions (e.g. int $0x00) work like a charm but I'm having issues with the keyboard interrupt. At this point, I've removed all the code in the keyboard interrupt service routine except for the iret instruction.

In emulators (Virtualbox, Qemu, Bochs) everything works as expected. OS boots, installs IDT, and on a keypress nothing happens (as expected because the ISR for 0x21 consists only of the iret instruction). HOWEVER, on real hardware, this iret instruction triggers a #GF (General Protection Fault)... Only on real hardware...

Because the Protection Fault only gets triggered on real hardware, it's really hard to debug (I'm not very experienced in OSdev). I've deduced / tracked down the following symptoms:

- #GF gets triggerd when the iret instruction occurs of the 0x21 keyboard handler service routine (I've remapped the PIT such that keyboard interrupt is the 0x21 entry in the IDT)
- I have reason to believe the stack gets corrupted for some reason. That would be the only thing that could make sense. But the question is: why?
- If the stack gets corrupted, the iret will pop invalid values and jump to invalid memory, thus causing a #GF
- Here's the kicker: this behaviour is not consistent. Sometimes (after a few reboots), everything works as expected (no #GF). How is that possible?

This is driving me crazy because I can't find any good information on this. It's hard to debug for me because it's happening on real machines only. Tips or help or general advice is greatly appreciated (at this point, anything is appreciated :lol: )

Here is the (relavent) code:

Installing keyboard IRQ handler in IDT (C compiled with GCC on Linux)

Code: Select all

void init_keyboard()
{
	register_isr_gate(0x21, (void*) isr_keyboard_handler); 
}
Keyboard Interrupt Service Routine (x86 Intel-style Assembly)

Code: Select all

global isr_keyboard_handler
isr_keyboard_handler: 
        ; everything left blank 
	iretd
Enabling Hardware Interrupts

Code: Select all

void enable_hardware_interrupts()
{
	asm volatile(
	"movb	$0b11111101, %al # // 0 = enabled, 1 = disabled, irq 0 = lsb, irq 8 = msb\n"\
	"out	%al, $0x21 \n"\
	"out	%al, $0xA1 \n"); 

}
Jumped here after installing GDT and initializing protected mode (cr0 register)

Code: Select all

protect_mode: 
	mov	ax, DATA_DESC - NULL_DESC       ; 0x10
	mov	ds, ax	; update data segment
	mov	ss, ax
	mov es, ax
	mov	fs, ax
	
        ; stack will grow downard
	mov	ebp, 0x7C00
	mov	esp, 0x7C00

	; setup interrupts
	call 	reprogram_pic	; resets bios defaults for hardware interrupts, mapping them to IRD > 32	
	lidt	[idt_descr]	; remembers the base adres and size of idt

	call	kernel_main       ; start C kernel
	hlt
Reprogramming the PIC so that keyboard interrupt maps to 0x21

Code: Select all

; reprograms the PIC such that it remaps old bios interrupts
; Taken from an early Linux Kernel
reprogram_pic: 
	mov	al,0x11		; initialization sequence
	out	0x20,al		; send it to 8259A-1
	dw	0x00eb,0x00eb		; jmp $+2, jmp $+2
	out	0xA0,al		; and to 8259A-2
	dw	0x00eb,0x00eb
	mov	al,0x20		; start of hardware int's master pic (0x20)
	out	0x21,al
	dw	0x00eb,0x00eb
	mov	al,0x28		; start of hardware int's 2 slave pic (0x28)
	out	0xA1,al
	dw	0x00eb,0x00eb
	mov	al,0x04		; 8259-1 is master
	out	0x21,al
	dw	0x00eb,0x00eb
	mov	al,0x02		; 8259-2 is slave
	out	0xA1,al
	dw	0x00eb,0x00eb
	mov	al,0x01		; 8086 mode for both
	out	0x21,al
	dw	0x00eb,0x00eb
	out	0xA1,al
	dw	0x00eb,0x00eb
	mov	al,0xFF		; mask off all interrupts for now
	out	0x21,al
	dw	0x00eb,0x00eb
	out	0xA1,al

	ret
Finally, the beginning of the C kernel

Code: Select all

extern void kernel_main()
{
.......
	isr_traps_init();     // install CPU exceptions (like divide by 0, #GF, etc)
	init_keyboard();    // install keyboard ISR
	enable_hardware_interrupts();   // enable hardware keyboard IRQ
	asm ("sti");      // enable interrupts
......

Any help or insight is greatly appreciated. I cannot find out why the iret causes a #GF only on real hardware. I'm thinking it may have something to do with the stack, but I can't see what I'm doing wrong.


Best regards
Octocontrabass
Member
Member
Posts: 5586
Joined: Mon Mar 25, 2013 7:01 pm

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by Octocontrabass »

Are you sure you're receiving an IRQ in the emulators you've tried? I notice you haven't mentioned anything about initializing the keyboard controller.
travisjayday wrote:- #GF gets triggerd when the iret instruction occurs of the 0x21 keyboard handler service routine (I've remapped the PIT such that keyboard interrupt is the 0x21 entry in the IDT)
How were you able to determine this? Is there any way you could get more information about the CPU state at the time of the #GP?
travisjayday wrote:- Here's the kicker: this behaviour is not consistent. Sometimes (after a few reboots), everything works as expected (no #GF). How is that possible?
This is why I suspect you're not actually receiving IRQ1 the times when it doesn't crash. The keyboard controller can get stuck waiting for you to acknowledge a lost IRQ, and race conditions tend to be more consistent on emulators than on real hardware.

If you actually are receiving IRQ1 then I can't make any guesses as to why it only crashes sometimes. You'll have to provide more information if that's the case.
travisjayday wrote:(C compiled with GCC on Linux)
Using a cross-compiler, right?
travisjayday wrote:

Code: Select all

register_isr_gate(0x21, (void*) isr_keyboard_handler);
Any particular reason you cast to void* instead of rewriting your register_isr_gate function to directly take an appropriate function pointer?
travisjayday wrote:

Code: Select all

	"movb	$0b11111101, %al # // 0 = enabled, 1 = disabled, irq 0 = lsb, irq 8 = msb\n"\
	"out	%al, $0x21 \n"\
	"out	%al, $0xA1 \n"); 
Any particular reason you're also unmasking IRQ9? (To be fair, it won't actually do anything as long as IRQ2 is masked.)
travisjayday wrote:

Code: Select all

	hlt
This is not enough to stop the CPU if your kernel_main function returns.
MichaelPetch
Member
Member
Posts: 799
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by MichaelPetch »

Your name nic looked familiar and I realized I recently helped you out with an SO answer regarding an int 13h/ah=2 problem and a boot drive issue on real hardware. I know you have a Github page. It is possible there are other factors outside the code you are showing that may be a problem. If you made your entire project available on Github it would be easier to look at all possible issues.

This likely isn't a problem here but this code is problematic:

Code: Select all

void enable_hardware_interrupts()
{
   asm volatile(
   "movb   $0b11111101, %al # // 0 = enabled, 1 = disabled, irq 0 = lsb, irq 8 = msb\n"\
   "out   %al, $0x21 \n"\
   "out   %al, $0xA1 \n"); 
}
You clobber the EAX register by writing to AL but don't have a clobber list to say your inline assembly clobbered the value. this would be better:

Code: Select all

asm volatile(
   "movb   $0b11111101, %al # // 0 = enabled, 1 = disabled, irq 0 = lsb, irq 8 = msb\n"\
   "out   %al, $0x21 \n"\
   "out   %al, $0xA1 \n" :::"eax"); 
I'd recommend writing a function that otuputs to a port and write more of the code using C:

Code: Select all

void outb(uint16_t port, uint8_t val)
{
    asm volatile ( "outb %0, %1" : : "a"(val), "Nd"(port) );
}
Then you can do something like

Code: Select all

outb(0x21, 0xFD);
outb(0xA1, 0xFF);
You may say that you overwrote AX but it is used for the return parameter so it is safe. That is not the case. If you turn on optimizations and a function becomes inlined that may no longer be the case. GCC's inline assembly is powerful but it is also hard to get right. Best thing to do is to limit its usage, and if you don't understand it well enough write a complete function in assembler.
travisjayday
Posts: 4
Joined: Sun Nov 26, 2017 2:06 pm

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by travisjayday »

Hello Michael--what a pleasant surprise! You've indeed helped me recently on SO! but after discovering this forum, I think my OS questions are more appropriate here.

In any case, here is the Github Project: https://github.com/travisjayday/LinearOS. It's my first hobby OS, so there's still a lot to improve , surely.

Michael --
Ok, I'll rewrite more code in C and turn away from too much inline assembly (which I'm not too good at, admittedly).

Octocontrabass --
I'm sure I'm receiving IRQs in the emulators. Before commenting out / removing all the keyboard handling code, the emulators functioned as expected (interrupts firing, ascii codes logging, etc).

However you make a valid point in that I don't have any keyboard initialization code.... I read this http://wiki.osdev.org/PS/2_Keyboard today and I'm assuming I should send a "0xF6" to "Set default parameters"? Not sure, if that's what you mean with "keyboard initialization code". It would be great if you could elaborate on that.

Regarding the casting (void*) irq_keyboard... -- I've changed it to cast to (uint32_t*) now because that's the parameter I use for the register_isr function uses. If you want to take a look at it, it's in os/init/src/interrupts.c

Ah yes, sorry, I meant to only umask IRQ1. I changed that too.

The last `hlt` instruction is more decorative than anything else. The kernel.c loops indefinitely so it should never return. I'll remove it the hlt anyway.


Note: relavent files in the github project

os/boot/bootstrapper.asm // this is where the jump to protected mode, and initial IDT setup occur
os/boot/src/interrupts.asm // this is the code that the bootsrapper uses (reprogram pic, etc)
os/init/* // this is where all initialization things will be (including interrupts, keyboard, PIT timer, etc)
os/kernel.c // the main c kernel that calls all the interrupt initialization functions defined in os/init/interrupt.h

Here is the "blue screen" with (relevant?) debug info. I've run it in bochs and the EIP address (0x89F1) points to an IRET instruction...

https://photos.google.com/share/AF1QipP ... AwZFRfMURB


Another interesting thing:

- Shutdown Linux
- Enter BIOS
- Boot from USB Drive into my OS
- Keypress => #GF
- Reboot
- Boot from USB Drive into my OS again
- Keypress => NO #GF, nothing happens
- Reboot into Linux
- Go to step 1.

It's weird that it works after a reboot.
MichaelPetch
Member
Member
Posts: 799
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by MichaelPetch »

An observation in that screenshot. I find it rather curious that CS and SS are using the same segment selector. Both appear to be 0x0008. At least that's what I see - that is one God awful text mode font on the laptop lol.

EDIT: After reviewing the code I realized you blindly print User ESP and User SS even if they weren't pushed by the interrupt. They are pushed if there was a privilege level change, but so far everything you have is running at ring 0 so this won't be the case. The value you are printing for SS is meaningless at present. I'll also apologize in a way. I thought your laptop had some bizarre 40 character text mode. I see you switch into a low res graphics mode and render the fonts. I didn't mean to take a swipe at you in that regard. I hadn't seen the code. It just so happens the numbers and letters of the hex values are a bit hard to read.
Last edited by MichaelPetch on Tue Nov 28, 2017 1:37 pm, edited 1 time in total.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by Brendan »

Hi,
travisjayday wrote:- Here's the kicker: this behaviour is not consistent. Sometimes (after a few reboots), everything works as expected (no #GF). How is that possible?
I suspect that you don't fill your ".bss" section with zeros anywhere (I looked but didn't find it), which means that maybe the behaviour of something somewhere depends on whatever random garbage happened to be in memory before the kernel was started.

For code like this:

Code: Select all

    draw_string(10, ++i*7, strcat("EIP: \0", int2hex(r->eip)), VGA_COLOR_WHITE); 
The compiler stores the string "EIP: \0" as 6 bytes in the ".data" section. Then you do "strcat()" to concatenate another 8+ bytes onto the end of those 6 bytes, to get a 14+ byte string which won't fit in the original 6 bytes of space and should overwrite/corrupt other data.

In some places (e.g. "os/init/asm/isr_keyboard.asm") you're mixing data and code in the same section (probably the ".text" section). In general this is a bad idea because it causes performance problems (CPU having to invalidate entries in "instruction cache" because of writes to data), and increases security risks (e.g. imagine a "buffer overflow bug" that allows an attacker to overwrite code that is after the buffer) and prevents you from using paging properly to mark pages as "executable, not writable" or "writable, not executable".

It's also a bad idea to include code files from within header files (e.g. "#include "src/keyboard.c" at line 49 of "os/init/keyboard.h").


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
MichaelPetch
Member
Member
Posts: 799
Joined: Fri Aug 26, 2016 1:41 pm
Libera.chat IRC: mpetch

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by MichaelPetch »

Brendan caught onto the things I did. The strcat problem can cause issues. As importantly the BSS section does need to be zeroed. At present your Widget code relies on the data being zero especially WidgetCount. If it isn't then your code could be updating and using memory that as intended to be overwritten or read from. I've written SO answer the topic of zeroing BSS. If you had used a multiboot compliant bootloader the zeroing is done for you since GRUB has an ELF loader that will parse out headers including BSS and zero the region automatically. With your own home grown code using raw binary images this process isn't done for you so you need to do it yourself.As well before you launch GCC code functions you need to make sure the direction flag for string instructions is in the forward direction. use the CLD instruction for that. This should be done in the assembly bootstrap code before calling kernel_main.

To fix the CLD issue and the zeroing of the BSS you can modify your linker script to place the BSS section after your text/data/rodata sections (you can save space by discarding unneeded sections as well). An example of a revised linker.ld would be:

Code: Select all

ENTRY(bootstrapper);
SECTIONS
{
        . = 0x7E00;
        .text : AT(0x7E00)
        {
                *(.text);
        }
        .data : ALIGN(0x8)
        {
                *(.rodata);
                *(.data);
        }
        /* BSS section defnition after the text and data */
        .bss : ALIGN(0x8)
        {
                __bss_start = .;
                *(COMMON);
                *(.bss)
                . = ALIGN(4);
                __bss_end = .;
        }
        __bss_sizeb = SIZEOF(.bss);

        /* Remove sections that won't be relevant to us */
        /DISCARD/ : {
                *(.eh_frame);
                *(.comment);
                *(.note.gnu.build-id);
        }
}
We create a couple of BSS symbols to find the beginning, end, and the size. This information can be used so that bootstrapper.asm can zero BSS. That can be done by adding a couple extern at the top of bootstrapper.asm:

Code: Select all

extern __bss_sizeb
extern __bss_start
. Then modify the code before the call to kernel_main to be this:

Code: Select all

        ; Zero fill the BSS section
        cld                     ; Explicitly set Forward direction
        xor al, al              ; Fill memory with 0 (AL)
        mov cx, __bss_sizeb     ; Size of BSS in bytes computed in linker script
        mov di, __bss_start     ; Start of BSS defined in linker script
        rep stosb               ; Fill memory with zero a byte at a time

        ; Start kernel
        call    kernel_main
travisjayday
Posts: 4
Joined: Sun Nov 26, 2017 2:06 pm

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by travisjayday »

Thanks for noticing my custom 5x5 high quality font :D

Brendan --
Regarding the strcat issue:
I see what you're saying and I've already accounted for that in the implementation of strcat. It allocates a memory on the heap and copies both strings into that separate space, so that shouldn't be an issue.

Regarding the .text and .data section issues:
I'll work on those by labeling sections more often and carefully, thanks

Regarding the including code files in the header files:
Why is that a bad idea? What would be a better approach? I'm assuming I'll have to make big changes to my Makefile if I changed the header/src setup.

Michael--
I've written my reasoning about the strcat issue above.
I'll definitely follow your linker / .bss suggestions add the cld. I'll also read through your .bss post on SO and report back the results.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by Brendan »

Hi,
travisjayday wrote:Brendan --
Regarding the strcat issue:
I see what you're saying and I've already accounted for that in the implementation of strcat. It allocates a memory on the heap and copies both strings into that separate space, so that shouldn't be an issue.
In that case, it's probably not a good idea to call it "strcat()" - functions that have the same name as things in the standard C library should behave the same as they do in the standard C library.
travisjayday wrote:Regarding the including code files in the header files:
Why is that a bad idea? What would be a better approach? I'm assuming I'll have to make big changes to my Makefile if I changed the header/src setup.
The normal usage is to split the project up into pieces and have a code file for each piece, and have an object file for each code file; where the makefile does "for each code file { if this code file changed, update its object file }" (usually by using pattern matching in the makefile to avoid the hassle of writing lots of rules) and then does "if an object file changed, do linking to update the executable". This means that if you have 100 code files and change one of them, you don't recompile 99 code files.

The only reason header files exist is to allow things in one code file to be used by other code files. All code and data belongs in the code file; and the header file contains things (like function declarations, macros, enums, etc) that don't create code or data, and only if other code files need them. If code or data is put into a header file then that code or data gets duplicated in every object file for whatever included the header. If 50 code files include the header, then you get 50 copies of any code or data in that header; which wastes time compiling the same code/data multiple times and wastes space for no reason.

The other problem is that when you change a header everything that used the header has to be recompiled; and typically it's too much hassle to keep track of that properly so a makefile will do "if any header file changed, recompile all code files". You want to have the least possible in header files to reduce the chance that something in the header file will need to be changed (to reduce the chance of spending ages waiting for everything to be recompiled).

Including a code file in the header file is the same as having code or data in the header file (and is bad for the same reasons); but it also makes a mess of the makefile (e.g. you can't use a pattern matching rule to compile "*.c" files into object files without ending up with another duplication of code and data).

The other problem is "common usage" - there's a set of things that almost all C programmers consider "normal" which makes it easier/faster for them to become familiar with a project. If you deviate from "common usage" it makes things annoying for everyone else because they can't rely on assumptions and have to double check everything.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
travisjayday
Posts: 4
Joined: Sun Nov 26, 2017 2:06 pm

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by travisjayday »

I got it to work with Brendan's remark about initializing the keyboard. T'was indeed the keyboard controller that didn't properly trigger interrupts and that caused the #GF. After adding an initialization function in which the keyboard controller gets reset and interrupt 1 gets enabled in the status register, all works as expected. My guess is that after a couple reboots the keyboard controller resets itself and that's why my kernel worked only sometimes.

Thanks for all your help.
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: #GF triggered by Keypress on IRET (IDT, IRQ)

Post by Korona »

External hardware (i.e. anything that is not the CPU) cannot cause #GP (or any other fault, except for MCE). Not initializing the keyboard controller cannot cause #GP. The reason for the fault must have been somewhere in your code and only got accidentally resolved.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Post Reply