Page 1 of 1
Stack (and global variables) corruption in kernel
Posted: Wed Jun 27, 2018 12:39 pm
by Ycep
Hi,
I had this problem for a while that was really frustrating me;
And for some reason it doesn't seem to affect only the stack but the global variables also, which is maybe because something corrupts the registers but it is not interrupts since I save all registers used in each interrupt.
I have disabled compiler optimization, and, I really don't know where should I even look at.
I don't know what to do, and using Bochs debugger and just stepping on each instruction would be insane.
I would really appreciate if you find something wrong.
Re: Stack (and global variables) corruption in kernel
Posted: Wed Jun 27, 2018 1:31 pm
by iansjack
If you use gdb in conjunction with qemu you can set watches on variables or memory locations. These will break into the program when the item being watched changes. Set a watch on one of the variables that is being corrupted; when it changes unexpectedly you have isolated the problem to a particular line of code.
Re: Stack (and global variables) corruption in kernel
Posted: Wed Jun 27, 2018 3:26 pm
by Ycep
I... have never used GDB before... But I have found something [url]here[/url], but of what I see this is for ELF files, but my kernel is a PE file. Are there any Windows or just PE alternatives? Because porting code to GCC, adding support for ELF in bootloader, etc. would take a while. Thanks anyway.
Did you had (to the reader) some similar problems in past? Because possibly our problems may be similar, as the "tree" for OS developement don't have a lot of "branches" in the beginning.
Re: Stack (and global variables) corruption in kernel
Posted: Wed Jun 27, 2018 11:43 pm
by iansjack
I have no experience with pe files, but gdb does support them.
http://www.delorie.com/gnu/docs/gdb/gdb_145.html
I suspect that everyone who develops an is has problems with stack or variable corruption at some time. It could be caused by just about anything.
Re: Stack (and global variables) corruption in kernel
Posted: Thu Jun 28, 2018 1:38 am
by Velko
Figuring from the symptoms, the memory of your global variables (.data, .bss or its PE equivalents) and stack may be overlapping.
I took a quick look at your code:
Code: Select all
void main()
{
_asm
{
mov ax, 0x08
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov esp, 0x7c00
mov ebp, esp
}
This is very fragile way to initialize the stack. I can't even start to speculate what happens when you change both ESP and EBP in the middle of C function. You should never do that. Kernel's entrypoint should be in assembly code, you set up things there and
then call C code.
Re: Stack (and global variables) corruption in kernel
Posted: Mon Jul 02, 2018 6:04 pm
by Ycep
I was not able to work on Quartz for some time, but here I am now
@Velko:
Thanks for the tip. Well I can just add "naked" attribute to the function... But I do not. Instead I just put it in an assembly function "_entry".
But sadly that did not fixed the problem. What I noticed is that disabling (e.g. masking) keyboard interrupt and therefore not executing its handler, no stack corruption occur.
I'm not even sure is this really stack corruption or whatever, but I know it's something with keyboard interrupt handler.
Through the fact that it operates correctly in bochsdbg which is not a case with bochs and the fact that each boot gives another random corruption makes me think that this may be something with time, as there is no randomness in computers. (and ironically current time is the key factor of computers we use at homes today random number generation)
...and the fact that I can not debug stack without bochsdbg!
Code: Select all
uint8 _val;
interrupt keyb_irqHandler()
{
_asm pusha
if (inb(KEYBC)&STATUS_READ)
{
_val = inb(KEYBE);
switch (_val)
{
case EXTENDED_SCANCODE1:
prev_ext= k_ext1; break;
case EXTENDED_SCANCODE2:
prev_ext= k_ext2; break;
default:
if (_val & 0x80)//Release
{
_val ^= 0x80;
keyb_down[_val] ^= prev_ext;
keyb_queue[kq_end].ext = prev_ext;
keyb_queue[kq_end++].val = _val;
if (kq_end == kq_max)
{
kq_end ^= kq_end; //Clearing to zero (This way looks familiar, doesn't it?)
}
kq_count++;
if (kq_count == kq_max)
{
//Buffer overflow
}
}
else
{
keyb_down[_val] |= prev_ext;
// TODO : LED update
// Forced
}
prev_ext &= 0;
}
}
intend(1);
_asm
{
popa
iretd
}
}
All of my previous interrupt handlers for Quartz I wrote in assembly because of what I remember that every time I wrote them in C (and before C++) some kind of corruption happened.
"interrupt" at the beginning of function declaration is just a "#define" to "declspec(naked)" (e.g. "naked" function attribute) in order to code not execute between the start of function block and assembly block (and correspondingly to the end of the function).
Although I can write this handler in assembly that's not really a solution... What do I do wrong (or compiler does, doing its optimization) which I can fix?
Re: Stack (and global variables) corruption in kernel
Posted: Mon Jul 02, 2018 10:19 pm
by nullplan
Don't do this. ASM snippets that manipulate stack are liable to confuse the compiler into generating broken code. Instead of surrounding your code with ASM snippets, use an ASM stub function to do the register saving and the "iretd", and just have it call a normal C function. If you don't want to get an assembler involved, try to use ASM on file level to generate the stub. As in:
Code: Select all
_asm {
keyb_irqHandler:
pusha
call _keyb_irqHandler_c
popa
iretd
}
void keyb_irqHandler_c(void) {
/* foo */
}
This approach also works with other compilers. And, if you ever want to go multi-platform, it is easier to separate out the arch-dependent code this way.
Re: Stack (and global variables) corruption in kernel
Posted: Tue Jul 03, 2018 9:57 am
by Ycep
And stack corrupts no more thanks to @nullplan! Thanks brother!
Thanks to Velko for showing me a better and less fragile practice of making an entry from bootloader, and thanks to Iansjack for showing me another (and probably better) ways of debugging C in an emulator.
Now back to the FDC...