Page 1 of 1

[SOLVED] My memcpy and memzero destroy the stack???

Posted: Thu Mar 20, 2014 3:34 pm
by theesemtheesem
Hi All,

This is my first post so sorry if I do something stupid.
Anyway, to the heart of the matter.

I am developing a kernel in Free Pascal. I have these two procedures:

Code: Select all

procedure kernel_memzero(dest : pointer; numbytes : dword);cdecl; [public, alias: 'kernel_memzero'];
begin
  asm
        mov edi, dest;
	mov ecx, numbytes;
	xor al, al

@loop1:
        mov [edi], al

        add edi, 1
        dec ecx
        jnz @loop1
  end ['EAX','ECX','EDI'];
end;


procedure kernel_memcpy(source, dest : pointer; numbytes : dword); cdecl; [public, alias: 'kernel_memcpy'];
begin
  asm
        mov esi, source
        mov edi, dest
        mov ecx, numbytes

@loop1:
        mov al, [esi]
        mov [edi], al

        add esi, 1
        add edi, 1
        dec ecx
        jnz @loop1
  end ['EAX','ECX','EDI','ESI'];
end;
Please don't mind these are slow and shuffle only 1 byte at a go, at this point it is not important.
The problem is that these DO work when called from another procedure, i.e my screen handling
procedures use memzero and memcpy to scroll the screen up, all variables are initialized with memzero,
large variables or blocks are moved around and it works. However if I try to use these inside my kmain,
they cause the system to go bananas. The screen gets trashed, and the system reboots. Looking into
BOCHS's logs is not very usefull - there's info about accessing null descriptors in the GDT, and the cause
of the reboot is always a triple fault. CR2 has a value of FFFFFFFC or something similar, and ESP is 00000000.
Please note that my kernel is not higher half, so addresses like FFFFFFFC should normally invoke the page fault handler.
This does not happen, so it seems that the IDT gets trashed as well. Please note that EIP seems to be more or less where it should be.

So my guess is that somehow the stack gets trashed? Maybe the return address gets trashed? Changing the
size of the stack seems to change the behavior a bit, i meen if I make the kernel stack ridiculously large, say 128KB
or thereabouts, it is often possible to use memzero once or twice in kmain, as long as the number of bytes to zero/copy
is small. Copying/zeroing 1000 bytes (1000, not 0x1000) always crashes the kernel.

Another funny issue is that even disabling interrupts with cli before calling memzero/memcpy does not help.

Anyone has any ideas?

Cheers,
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Thu Mar 20, 2014 8:31 pm
by eryjus
I cannot speak to the inner workings of Pascal, but I know in C the calling function "owns" the ESI and EDI registers, while the called function "owns" the ECX and EAX registers. Since you are manipulating registers in asm directly, I would compile to asm and review what is really happening under the covers.

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 6:04 am
by theesemtheesem
Hi,

Yes, I thought about that, and disassembled the kernel to see what is going on. It turns out that the registers are saved by the compiler before the asm block starts (it pushes them to the stack and pops them back). That's what the ['EAX','ECX','EDI'] at the end of the asm block is for. Again, the funny thing is that it works fine when called from anywhere else than kmain (which is the equivalent of C's main). Now I tried to rewrite these functions in Pascal (not to use asm so to speak), and the problem is still there. Honestly I am loosing my sanity here, because this just makes no sense...

Cheers,
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 6:15 am
by Bender
Hi,

Code: Select all

mov al, [esi]
        mov [edi], al

        add esi, 1
        add edi, 1
        dec ecx
I don't see any problem with the above code but try this, Since I guess you have tried everything.

Code: Select all

;; Clear #DF to make sure we are going UP^ in memory
cld
@loop:
lodsb
stosb
loop @loop
Well, I am assuming that DS:ESI points to your source and ES:EDI is your destination. (Notice the segments)
I did have some problems with mov al, [esi] before, I have no idea whatever caused it but now I use
rep movsb in my memcpy function.
Can your post what output you get? Dump? Bochs?
-Bender

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 7:04 am
by theesemtheesem
Hi,

Thanks for Your reply, Your code is more elegant and I think will be faster as it uses less instructions.
Unfortunately, no change - both still work when called from anywhere else besides kmain, and crash the kernel
with lots of interesting garbage on the screen when called from kmain :(

This is the code in kmain:

Code: Select all

        //kmem_kmalloc(size,owner : dword)   size in bytes, owner - tag for kmem_debug_dumpheapalloc
        //so we can see who allocated each block
        tmpp1:=kmem_kmalloc((9 shl 20),$ABCD0001);  // alloc a 9MB block
        tmpp2:=kmem_kmalloc((1 shl 20),$ABCD0002);  // alloc a 1MB block
        tmpp3:=kmem_kmalloc((2 shl 20),$ABCD0003);  // alloc a 2MB block
        kconsole_putstr('zero tmpp1');
        kernel_memzero(tmpp1,5000); // zero just the first 5000 bytes
        kconsole_putstr('zero tmpp2');
        kernel_memzero(tmpp2,5000);
        kconsole_putstr('zero tmpp3');
        kernel_memzero(tmpp3,5000);
As You can see I am working on kernel malloc at this moment. I am sure that the addresses returned by kmalloc
are OK - they fall within the range that I have put aside for the kernel heap. Furthermore, when I do:

Code: Select all

        var p : pchar;
(...)
        p:=kmem_kmalloc(10,$ABCD4444);
        p^:='TESTING!'#0;
        kconsole_putstr(p);
everything works. Bud obviously doing

Code: Select all

        kernel_memzero(p,10);
causes a crash.

It is crazy, since kmem_memzero is, as I explained, used all over the place - it clears the screen, it is used to clear the
bottom of the screen when it needs to be scrolled (and memcpy moves it up).

I am attaching two bohcs log files - one with memzero commented out (bochs_ok.txt), one with memzero actualy used (bochs_err.txt). I've been trying to tackle this problem for a week now and it's driving me insane.

Funny note - different behavior in bochs and qemu - bochs reboots after showing garbage on the screen, qemu just keeps
on going and going and going and spews trash to the screen, but does not reboot.

Any ideas are welcome, at this point I will try just about anything.

Cheers,
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 7:40 am
by Bender
Is your stack sane? Can you try it without involving PUSH/POP?

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 7:52 am
by jnc100
I note you use the cdecl calling convention in the memcpy routine. Are you also using this at the call site (in kmain)? If not and the calling function is using stdcall I can see how your stack would be trashed.

Regards,
John.

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 8:19 am
by theesemtheesem
Hi,
Yes, as the matter of fact kmain is stdcall.

Code: Select all


procedure kmain(mbinfo: Pmultiboot_info_t; mbmagic: DWORD); stdcall;

All the functions/procedures are cdecl in my kernel, just in case I ever needed to interface some C code with it...
Is that what's causing the problem?

Anyway, I changed kmain to cdecl just now and still no joy :( Or should I change everything else to stdcall?

As to the stack - it seems that there is something there, because as I said if I increase the stack size
the kernel will survive one or two calls to memzero from kmain. Still lots and lots of other functions are called (and all of these are cdecl too) - see attached source file, and it only fails precisely at memzero.

If anyone is willing to have an in-depth look I can put the complete sources somewhere.

Best Regards,
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 11:40 am
by Gigasoft
You can single step through your code in Bochs to see exactly where it fails. Then you don't have to guess.

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 1:47 pm
by theesemtheesem
Hi,

OK now this is really weird. When I single step it it works. Always.
Am I correct in my assumption that bochs does not issue interrupts during single stepping?
Because if it does not, than the problem must be in my ISRs.

Cheers
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 6:03 pm
by theesemtheesem
Hi,

For anyone who follows this - I have found the cause of the problem, the problem is I still do not understand why this happens.
memzero and mamcpy break when it inside a repeat...until loop. They work from kmain when called "on their own", but die miserably inside a loop.
So, to ilustrate:

Code: Select all

        tmppchar:=kmem_kmalloc(1000,$AABBCCDD);
        kernel_memzero(tmppchar,1000);
        kstrings_const2pchar('AAA'#66#77#88#0#0#0,tmppchar);
        tmpp1:=kmem_kmalloc((9 shl 20),$ABCD0001);
        tmpp2:=kmem_kmalloc((1 shl 20),$ABCD0002);
        tmpp3:=kmem_kmalloc((2 shl 20),$ABCD0003);
        kconsole_putstr('zero tmpp1');
        kernel_memzero(tmpp1,1000);
        kconsole_putstr('zero tmpp2');
        kernel_memzero(tmpp2,1000);
        kconsole_putstr('zero tmpp3');
        kernel_memzero(tmpp3,1000);

        kconsole_putstr(tmppchar);
        kmem_debug_blockinfo(tmppchar);
        kmem_debug_dumpmem(tmppchar-8,4);
works fine.

But:

Code: Select all

        repeat
        tmppchar:=kmem_kmalloc(1000,$AABBCCDD);
        kernel_memzero(tmppchar,1000);
        kstrings_const2pchar('AAA'#66#77#88#0#0#0,tmppchar);
        tmpp1:=kmem_kmalloc((9 shl 20),$ABCD0001);
        tmpp2:=kmem_kmalloc((1 shl 20),$ABCD0002);
        tmpp3:=kmem_kmalloc((2 shl 20),$ABCD0003);
        kconsole_putstr('zero tmpp1');
        kernel_memzero(tmpp1,1000);
        kconsole_putstr('zero tmpp2');
        kernel_memzero(tmpp2,1000);
        kconsole_putstr('zero tmpp3');
        kernel_memzero(tmpp3,1000);

        kconsole_putstr(tmppchar);
        kmem_debug_blockinfo(tmppchar);
        kmem_debug_dumpmem(tmppchar-8,4);
        until 1=1; //loop once
dies as soon as the first memzero is called.

Now I have no idea why.

Cheers,
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 6:09 pm
by theesemtheesem
Hi,

Nope :( still no joy

If it's outside a loop it only works for small numbers of bytes.
So zeroing a 1000 bytes - now ok,
but zeroing 2000 bytes - crash.

DARN!

Cheers,
Andrew

Re: My memcpy and memzero destroy the stack???

Posted: Fri Mar 21, 2014 10:06 pm
by eryjus
Andrew,

A couple of observations:
1. You are allocating memory inside the loop. Are you sure you are not consuming all your memory and the kernel_memzero() function is just is demonstrating the issue (yeah, I know it is a single iteration...)?
2. You are not checking for successful memory allocation. You should check for the proper return value of kmem_kmalloc() before trying to manipulate the contents of the memory pointer.
3. Can you trivialize the test scenarios? For example:

Code: Select all

tmpp1:=kmem_kmalloc((9 shl 20),$ABCD0001);
kernel_memzero(tmpp1,1000);
When that is successful:

Code: Select all

tmpp1:=kmem_kmalloc((9 shl 20),$ABCD0001);
{ notice the allocation is outside the loop }
repeat
    kernel_memzero(tmpp1,1000);
until 1=1;
And, when that is successful:

Code: Select all

tmpp1:=kmem_kmalloc((9 shl 20),$ABCD0001);
i := 0;
repeat
    kernel_memzero(tmpp1,1000);
    i := i + 1;
until i=100;
Is it just repeat...until loops? Or do all loop constructs have this issue? The challenge here is to narrow down the specific circumstances under which you have an issue. So, we go back to identifying and verifying assumptions...

Disclaimer: my Pascal is rusty -- it's been a while.

Re: My memcpy and memzero destroy the stack???

Posted: Sat Mar 22, 2014 5:08 am
by theesemtheesem
Hi,

Thanks to everyone for Your comments and input, I appreciate it very much.
It turns out that there is a problem somewhere in my paging code.

In desperation I started to turn sections of the kernel off to see what happens.
I disabled the keyboard and mouse handlers - no joy.
I disabled the rtc handler - no joy.
Next in line went the pit, and finally I even disabled initializing the pic.
But as soon as I disabled paging (my kernel is not higher half, so at this stage
it just sits above 1mb) everything started to run smoothly. I will go back to examine
the paging code - must be a stupid mistake in there, something must be mapped wrong.

Once again thanks to everyone for Your help and support.

Cheers,
Andrew