Stack (and global variables) corruption in kernel

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
User avatar
Ycep
Member
Member
Posts: 401
Joined: Mon Dec 28, 2015 11:11 am

Stack (and global variables) corruption in kernel

Post by Ycep »

Hi,
I had this problem for a while that was really frustrating me;
And for some reason it doesn't seem to affect only the stack but the global variables also, which is maybe because something corrupts the registers but it is not interrupts since I save all registers used in each interrupt.
I have disabled compiler optimization, and, I really don't know where should I even look at.
I don't know what to do, and using Bochs debugger and just stepping on each instruction would be insane.

I would really appreciate if you find something wrong.
Attachments
Quartz-master.rar
Entire repository
(40.7 KiB) Downloaded 21 times
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Stack (and global variables) corruption in kernel

Post by iansjack »

If you use gdb in conjunction with qemu you can set watches on variables or memory locations. These will break into the program when the item being watched changes. Set a watch on one of the variables that is being corrupted; when it changes unexpectedly you have isolated the problem to a particular line of code.
User avatar
Ycep
Member
Member
Posts: 401
Joined: Mon Dec 28, 2015 11:11 am

Re: Stack (and global variables) corruption in kernel

Post by Ycep »

I... have never used GDB before... But I have found something [url]here[/url], but of what I see this is for ELF files, but my kernel is a PE file. Are there any Windows or just PE alternatives? Because porting code to GCC, adding support for ELF in bootloader, etc. would take a while. Thanks anyway.

Did you had (to the reader) some similar problems in past? Because possibly our problems may be similar, as the "tree" for OS developement don't have a lot of "branches" in the beginning.
User avatar
iansjack
Member
Member
Posts: 4706
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: Stack (and global variables) corruption in kernel

Post by iansjack »

I have no experience with pe files, but gdb does support them. http://www.delorie.com/gnu/docs/gdb/gdb_145.html

I suspect that everyone who develops an is has problems with stack or variable corruption at some time. It could be caused by just about anything.
User avatar
Velko
Member
Member
Posts: 153
Joined: Fri Oct 03, 2008 4:13 am
Location: Ogre, Latvia, EU

Re: Stack (and global variables) corruption in kernel

Post by Velko »

Figuring from the symptoms, the memory of your global variables (.data, .bss or its PE equivalents) and stack may be overlapping.

I took a quick look at your code:

Code: Select all

void main()
{
	_asm
	{
		mov ax, 0x08
		mov ds, ax
		mov es, ax
		mov fs, ax
		mov gs, ax
		mov ss, ax
		mov esp, 0x7c00
		mov ebp, esp
	}
This is very fragile way to initialize the stack. I can't even start to speculate what happens when you change both ESP and EBP in the middle of C function. You should never do that. Kernel's entrypoint should be in assembly code, you set up things there and then call C code.
If something looks overcomplicated, most likely it is.
User avatar
Ycep
Member
Member
Posts: 401
Joined: Mon Dec 28, 2015 11:11 am

Re: Stack (and global variables) corruption in kernel

Post by Ycep »

I was not able to work on Quartz for some time, but here I am now ;)
@Velko:
Thanks for the tip. Well I can just add "naked" attribute to the function... But I do not. Instead I just put it in an assembly function "_entry".

But sadly that did not fixed the problem. What I noticed is that disabling (e.g. masking) keyboard interrupt and therefore not executing its handler, no stack corruption occur.

I'm not even sure is this really stack corruption or whatever, but I know it's something with keyboard interrupt handler.
Through the fact that it operates correctly in bochsdbg which is not a case with bochs and the fact that each boot gives another random corruption makes me think that this may be something with time, as there is no randomness in computers. (and ironically current time is the key factor of computers we use at homes today random number generation)
...and the fact that I can not debug stack without bochsdbg!

Code: Select all

uint8 _val;
interrupt keyb_irqHandler()
{
	_asm pusha
	if (inb(KEYBC)&STATUS_READ)
	{
		_val = inb(KEYBE);
		switch (_val)
		{
		case EXTENDED_SCANCODE1:
			prev_ext= k_ext1; break;
		case EXTENDED_SCANCODE2:
			prev_ext= k_ext2; break;
		default:
			if (_val & 0x80)//Release
			{
				_val ^= 0x80;
				keyb_down[_val] ^= prev_ext;
				keyb_queue[kq_end].ext = prev_ext;
				keyb_queue[kq_end++].val = _val;
				if (kq_end == kq_max)
				{
					kq_end ^= kq_end; //Clearing to zero (This way looks familiar, doesn't it?)
				}
				kq_count++;
				if (kq_count == kq_max)
				{
					//Buffer overflow
				}
			}
			else
			{
				keyb_down[_val] |= prev_ext;
				// TODO : LED update
				// Forced
			}
			prev_ext &= 0;
		}
	}
	intend(1);
	_asm
	{
		popa
		iretd
	}
}
All of my previous interrupt handlers for Quartz I wrote in assembly because of what I remember that every time I wrote them in C (and before C++) some kind of corruption happened.

"interrupt" at the beginning of function declaration is just a "#define" to "declspec(naked)" (e.g. "naked" function attribute) in order to code not execute between the start of function block and assembly block (and correspondingly to the end of the function).

Although I can write this handler in assembly that's not really a solution... What do I do wrong (or compiler does, doing its optimization) which I can fix?
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: Stack (and global variables) corruption in kernel

Post by nullplan »

Don't do this. ASM snippets that manipulate stack are liable to confuse the compiler into generating broken code. Instead of surrounding your code with ASM snippets, use an ASM stub function to do the register saving and the "iretd", and just have it call a normal C function. If you don't want to get an assembler involved, try to use ASM on file level to generate the stub. As in:

Code: Select all

_asm {
keyb_irqHandler:
    pusha
    call _keyb_irqHandler_c
    popa
    iretd
}

void keyb_irqHandler_c(void) {
   /* foo */
}
This approach also works with other compilers. And, if you ever want to go multi-platform, it is easier to separate out the arch-dependent code this way.
Carpe diem!
User avatar
Ycep
Member
Member
Posts: 401
Joined: Mon Dec 28, 2015 11:11 am

Re: Stack (and global variables) corruption in kernel

Post by Ycep »

And stack corrupts no more thanks to @nullplan! Thanks brother!

Thanks to Velko for showing me a better and less fragile practice of making an entry from bootloader, and thanks to Iansjack for showing me another (and probably better) ways of debugging C in an emulator.

Now back to the FDC...
Post Reply