[Newlib] Bug with malloc_r()

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
fiveayem
Member
Member
Posts: 51
Joined: Sun Aug 14, 2011 8:01 am

[Newlib] Bug with malloc_r()

Post by fiveayem »

Hello,

Having succeeded in making a toolchain and porting Newlib for my OS, I am currently facing quite a great problem : there seems to be a bug with malloc_r() function. Precisely, I get a Page Fault caused by bad virtual address 0x00000004. I disassembled the binary and I found that the bug occured at this line :

40001900: 89 41 04 mov %eax,0x4(%ecx)

So I guessed that ECX had been previously set to 0. I then looked for the place where it may have happened :

Code: Select all

400018bb:       89 55 d0                mov    %edx,-0x30(%ebp)
400018be:       89 4d cc                mov    %ecx,-0x34(%ebp)
400018c1:       e8 3a 06 00 00          call   40001f00 <_sbrk_r>
    if (new_brk == (char*)(MORECORE_FAILURE))
400018c6:       83 c4 10                add    $0x10,%esp
400018c9:       8b 55 d0                mov    -0x30(%ebp),%edx
400018cc:       83 f8 ff                cmp    $0xffffffff,%eax
400018cf:       8b 4d cc                mov    -0x34(%ebp),%ecx[color=#FF0000] // Here ?[/color]
400018d2:       0f 84 ba 03 00 00       je     40001c92 <_malloc_r+0x792>
(the ASM code is here intermixed with the source)

My sbrk() function works, and I checked that the program really made the syscall to sbrk() and that the return value was correct (by printing debug strings in my kernel).

Where does the problem come from, according to you ?

Thanks for your help.

EDIT : Maybe the memory should be "formatted" in a particular way by sbrk() ? I say that because I use my own format (1 "used" bit and 31 size bits), and not that of standard lib for memory chunk headers.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: [Newlib] Bug with malloc_r()

Post by JamesM »

There's no bug in malloc_r. I use it in several projects and it works fine.

What exactly is your sbrk() call returning? That code appears to set %ecx to the value of a local.

I take it you *have* initialised newlib? There's a call you must make on process startup to initialise the memory allocator (it calls sbrk(0) to find its current memory breakpoint).

If you haven't called that, it is likely there will be null pointers around.

Cheers,

James
fiveayem
Member
Member
Posts: 51
Joined: Sun Aug 14, 2011 8:01 am

Re: [Newlib] Bug with malloc_r()

Post by fiveayem »

So the sbrk(0) should be made by the kernel ?

EDIT : It is OK now. The directory where I put my sbrk syscall code was just the wrong one.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: [Newlib] Bug with malloc_r()

Post by JamesM »

fiveayem wrote: It is OK now. The directory where I put my sbrk syscall code was just the wrong one.
... ](*,) ...

I knew there was a reason I didn't frequent this forum any more.
blanham
Posts: 10
Joined: Sat Apr 28, 2012 5:16 pm

Re: [Newlib] Bug with malloc_r()

Post by blanham »

Normally I wouldn't revive dead threads, but this is the exact problem I'm having. Well, this problem and crashes with C++ constructors which I assume is also related to memory allocation. JamesM mentioned above that you are supposed to call a function to initialize malloc, but I can't find any mention of that in the documentation, nor in any of the various examples of porting newlib that I have managed to find on Google.
gerryg400
Member
Member
Posts: 1801
Joined: Thu Mar 25, 2010 11:26 pm
Location: Melbourne, Australia

Re: [Newlib] Bug with malloc_r()

Post by gerryg400 »

blanham wrote:Normally I wouldn't revive dead threads, but this is the exact problem I'm having. Well, this problem and crashes with C++ constructors which I assume is also related to memory allocation. JamesM mentioned above that you are supposed to call a function to initialize malloc, but I can't find any mention of that in the documentation, nor in any of the various examples of porting newlib that I have managed to find on Google.
You don't need to call a function to initialise malloc. When you call malloc() it will in turn call the sbrk() that you supply to get memory from your OS.
If a trainstation is where trains stop, what is a workstation ?
blanham
Posts: 10
Joined: Sat Apr 28, 2012 5:16 pm

Re: [Newlib] Bug with malloc_r()

Post by blanham »

Also, I should have mentioned, theses crashes happen before any call to sbrk.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: [Newlib] Bug with malloc_r()

Post by bluemoon »

Did you zero BSS?
blanham
Posts: 10
Joined: Sat Apr 28, 2012 5:16 pm

Re: [Newlib] Bug with malloc_r()

Post by blanham »

Yes, I do zero the BSS by using __bss_start and _end.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: [Newlib] Bug with malloc_r()

Post by JamesM »

The function I was thinking of was actually to initialise the signal subsystem, and was _init_signals().
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: [Newlib] Bug with malloc_r()

Post by bluemoon »

The default build for newlib disable signal (it use user-space internal signal that cannot cross process boundary), so you don't need to call _init_signals unless you enabled it.

There may be other bug that cause the crash, you know, a lot of code related to running user-space program(loader, parser, linker, syscall, etc).
I'd just list my procedure and syscall sequence caused by newlib, for your reference and see if you missed anything:

The test program is:

Code: Select all

#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

class FOO {
public:
    int foo;    
    FOO() {
        foo = 123;
        printf ("FOO::FOO(%p);\n", this);
    }
    ~FOO() {
    }
};

FOO g_foo;

int main ( int argc, char *argv[] ) {
    FOO* foo = new FOO;    
    int pid = (int)getpid();
    for (int i=0;i<pid+3;i++) {
        printf ("  PID[%d]: Hello again! counter=%d, foo=%d\n", pid, i, g_foo.foo++);
        usleep((useconds_t) (1000000));
    }
    delete foo;
    return (int)0xBADC0DE0+ (int)pid;
}
The basic environment I initialized are:
1. zero BSS
2. jump to crt0 with ring3

The syscall generated are:

Code: Select all

SYSCALL : fstat(1, 7FBFF7D0)
SYSCALL : sbrk(00000430)
SYSCALL : sbrk(00000BD0)
SYSCALL : write(1, 01000010, 20)
FOO::FOO(0x10d008);      <--- Note this is global
SYSCALL : write(1, 01000010, 21)
FOO::FOO(0x1000420);    <--- Note this is new
SYSCALL : getpid() -> FFFFFFFF:801210D0: 1
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=0, foo=123
SYSCALL : usleep(000F4240)
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=1, foo=124
SYSCALL : usleep(000F4240)
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=2, foo=125
SYSCALL : usleep(000F4240)
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=3, foo=126
SYSCALL : usleep(000F4240)
SYSCALL : exit(BADC0DE1)
  KMAIN : Clean zombie process: FFFFFFFF:801210D0
Hope this help. And by the way, since you said crash, what's the crash point and error code?
blanham
Posts: 10
Joined: Sat Apr 28, 2012 5:16 pm

Re: [Newlib] Bug with malloc_r()

Post by blanham »

It's crashing at line 2214 in mallocr:

Code: Select all

set_head(top, top_size | PREV_INUSE);
By stepping through the code the offending line is the same as in the first post in this thread.

I also ran your example (though I had to remove usleep since I have not implemented it yet). The call to getpid functions fine, but then it crashes in mallocr (a page fault), I assume while trying to malloc space for printf's output. The faulting address is 4.

My only guess is that somehow the starting sbrk value isn't 0, so mallocr doesn't call sbrk, and tries to use this value as the beginning of the heap.

I really thought this was an issue with either my elf loader or maybe something in the filesystem, but I can't see anything obviously wrong.

Code: Select all


#include <common.h>
#include <stdio.h>
#include <string.h>
#include <memory.h>
#include <mm/liballoc.h>
#include <thread.h>
#include <fs/vfs.h>

#define PT_LOAD 0x1
typedef struct elf_header {
    uint8_t magic[4];
    uint8_t class;
    uint8_t byteorder;
    uint8_t hversion;
    uint8_t pad[9];
    uint16_t filetype;
    uint16_t archtype;
    uint32_t fversion;
    uint32_t entry;
    uint32_t phdrpos;
    uint32_t shdrpos;
    uint32_t flags;
    uint16_t hdrsize;
    uint16_t phdrent;
    uint16_t phdrcnt;
    uint16_t shdrent;
    uint16_t shdrcnt;
    uint16_t strsec;
} __attribute__((packed)) elf_header_t;
typedef struct elf_section {
    uint32_t sh_name;
    uint32_t sh_type;
    uint32_t sh_flags;
    uint32_t sh_addr;
    uint32_t sh_offset;
    uint32_t sh_size;
    uint32_t sh_link;
    uint32_t sh_info;
    uint32_t sh_align;
    uint32_t sh_entsize;
}  __attribute__((packed)) elf_section_t;
typedef struct elf_program_header {
    uint32_t ph_type;
    uint32_t ph_offset;
    uint32_t ph_virtaddr;
    uint32_t ph_physaddr;
    uint32_t ph_filesize;
    uint32_t ph_memsize;
    uint32_t ph_flags;
    uint32_t ph_align;
} __attribute__((packed)) elf_program_header_t;

void elf_print_sections(elf_section_t *sections)
{
    printf("type %i addr %X offset %i size %X link %X info %X align %X entsize %x\n",
        sections->sh_type, sections->sh_addr, sections->sh_offset, sections->sh_size,
        sections->sh_link, sections->sh_info, sections->sh_align, sections->sh_entsize);
}

void elf_print_programs(elf_program_header_t *program)
{
    printf("type %i offset %X virtaddt %X filesize %X memsize %X align %X\n",
        program->ph_type, program->ph_offset, program->ph_virtaddr,
        program->ph_filesize, program->ph_memsize, program->ph_align);
}

static void elf_load_program(elf_header_t *header, int fd)
{
    elf_program_header_t *program = kmalloc(sizeof(*program)*header->phdrcnt);
    void *code;
    pagedir_t pd = thread_current()->pd;
    int pages = 0;
    sys_lseek(fd, header->phdrpos, SEEK_SET);
    sys_read(fd, program, sizeof(*program)*header->phdrcnt);
//  elf_print_programs(program);
    for(int i = 0; i < header->phdrcnt; i++)
    {
        if(program->ph_type == PT_LOAD)
        {
            sys_lseek(fd, program->ph_offset, SEEK_SET);
            pages = program->ph_memsize/PAGE_SIZE + 1;
            code = pallocn(pages);
            sys_read(fd, code, program->ph_filesize);
            pagedir_insert_pagen(pd, (uintptr_t)code, program->ph_virtaddr, 0x7, pages);
        //  printf("insert %i pages\n",pages);
        }
        program++;
    }

    pagedir_install(pd);
}
int load_elf(const char *path, uintptr_t *eip)
{
    int fd;
    elf_header_t *header;
    header = kmalloc(sizeof(*header));

    if((fd = sys_open(path, 0)) < 0)
    {
        printf("Error opening ELF executable\n");
        return -1;
    }

    if(sys_read(fd, header, sizeof(*header)) != sizeof(*header))
    {
        printf("Error reading ELF header\n");
        return -1;
    }

    int ret = memcmp(header->magic, ELF_MAGIC, 4);

    if(ret != 0)
    {
        printf("Missing or invalid ELF magic number!\n");
        return -1;
    }

    elf_load_program(header, fd);

    *eip = header->entry;
    kfree(header);
    sys_close(fd);

    return 0;
}
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: [Newlib] Bug with malloc_r()

Post by bluemoon »

In the PT_LOAD, I do this one more thing:

Code: Select all

 if ( elf_phdr->p_filesz < elf_phdr->p_memsz ) {
   memset ( (void*)(seg->base + elf_phdr->p_filesz), 0, elf_phdr->p_memsz - elf_phdr->p_filesz);
 }
This may make the difference.
blanham
Posts: 10
Joined: Sat Apr 28, 2012 5:16 pm

Re: [Newlib] Bug with malloc_r()

Post by blanham »

The actual problem seems to have been here:

Code: Select all

  sys_read(fd, code, program->ph_filesize);
  pagedir_insert_pagen(pd, (uintptr_t)code, program->ph_virtaddr, 0x7, pages);
I noticed that in some cases ph_virtaddr wasn't on a page boundary, so say when the virtaddr was 0x400020, I was actually loading at 0x400000, since code is page aligned.

C++ is still giving me trouble, but I built a decently complicated C program and haven't seen any issues yet. I figured I'd followup in case anyone has a similar issue.
User avatar
bluemoon
Member
Member
Posts: 1761
Joined: Wed Dec 01, 2010 3:41 am
Location: Hong Kong

Re: [Newlib] Bug with malloc_r()

Post by bluemoon »

blanham wrote:I noticed that in some cases ph_virtaddr wasn't on a page boundary
That sound strange, I never saw that. Did you put align for every section in the linker script? Can you confirm there is nothing on SHT_RELA?
I am interested to know why, since the same problem may also arise in mine kernel.
Post Reply