OSDev.org

Posted: **Sat Dec 24, 2011 4:44 am**

Hello,

Having succeeded in making a toolchain and porting Newlib for my OS, I am currently facing quite a great problem : there seems to be a bug with malloc_r() function. Precisely, I get a Page Fault caused by bad virtual address 0x00000004. I disassembled the binary and I found that the bug occured at this line :

40001900: 89 41 04 mov %eax,0x4(%ecx)

So I guessed that ECX had been previously set to 0. I then looked for the place where it may have happened :

Code: Select all

400018bb:       89 55 d0                mov    %edx,-0x30(%ebp)
400018be:       89 4d cc                mov    %ecx,-0x34(%ebp)
400018c1:       e8 3a 06 00 00          call   40001f00 <_sbrk_r>
    if (new_brk == (char*)(MORECORE_FAILURE))
400018c6:       83 c4 10                add    $0x10,%esp
400018c9:       8b 55 d0                mov    -0x30(%ebp),%edx
400018cc:       83 f8 ff                cmp    $0xffffffff,%eax
400018cf:       8b 4d cc                mov    -0x34(%ebp),%ecx[color=#FF0000] // Here ?[/color]
400018d2:       0f 84 ba 03 00 00       je     40001c92 <_malloc_r+0x792>

(the ASM code is here intermixed with the source)

My sbrk() function works, and I checked that the program really made the syscall to sbrk() and that the return value was correct (by printing debug strings in my kernel).

Where does the problem come from, according to you ?

Thanks for your help.

EDIT : Maybe the memory should be "formatted" in a particular way by sbrk() ? I say that because I use my own format (1 "used" bit and 31 size bits), and not that of standard lib for memory chunk headers.

Posted: **Sat Dec 24, 2011 4:59 am**

There's no bug in malloc_r. I use it in several projects and it works fine.

What exactly is your sbrk() call returning? That code appears to set %ecx to the value of a local.

I take it you *have* initialised newlib? There's a call you must make on process startup to initialise the memory allocator (it calls sbrk(0) to find its current memory breakpoint).

If you haven't called that, it is likely there will be null pointers around.

Cheers,

James

Posted: **Sat Dec 24, 2011 5:29 am**

So the sbrk(0) should be made by the kernel ?

EDIT : It is OK now. The directory where I put my sbrk syscall code was just the wrong one.

Posted: **Sat Dec 24, 2011 1:07 pm**

fiveayem wrote: It is OK now. The directory where I put my sbrk syscall code was just the wrong one.

...

...

I knew there was a reason I didn't frequent this forum any more.

Posted: **Thu Sep 06, 2012 6:57 am**

Normally I wouldn't revive dead threads, but this is the exact problem I'm having. Well, this problem and crashes with C++ constructors which I assume is also related to memory allocation. JamesM mentioned above that you are supposed to call a function to initialize malloc, but I can't find any mention of that in the documentation, nor in any of the various examples of porting newlib that I have managed to find on Google.

Posted: **Thu Sep 06, 2012 7:05 am**

blanham wrote:Normally I wouldn't revive dead threads, but this is the exact problem I'm having. Well, this problem and crashes with C++ constructors which I assume is also related to memory allocation. JamesM mentioned above that you are supposed to call a function to initialize malloc, but I can't find any mention of that in the documentation, nor in any of the various examples of porting newlib that I have managed to find on Google.

You don't need to call a function to initialise malloc. When you call malloc() it will in turn call the sbrk() that you supply to get memory from your OS.

Posted: **Thu Sep 06, 2012 7:14 am**

Also, I should have mentioned, theses crashes happen before any call to sbrk.

Posted: **Thu Sep 06, 2012 7:25 am**

Did you zero BSS?

Posted: **Thu Sep 06, 2012 7:30 am**

Yes, I do zero the BSS by using __bss_start and _end.

Posted: **Thu Sep 06, 2012 8:03 am**

The function I was thinking of was actually to initialise the signal subsystem, and was _init_signals().

Posted: **Thu Sep 06, 2012 10:24 am**

The default build for newlib disable signal (it use user-space internal signal that cannot cross process boundary), so you don't need to call _init_signals unless you enabled it.

There may be other bug that cause the crash, you know, a lot of code related to running user-space program(loader, parser, linker, syscall, etc).
I'd just list my procedure and syscall sequence caused by newlib, for your reference and see if you missed anything:

The test program is:

Code: Select all

#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

class FOO {
public:
    int foo;    
    FOO() {
        foo = 123;
        printf ("FOO::FOO(%p);\n", this);
    }
    ~FOO() {
    }
};

FOO g_foo;

int main ( int argc, char *argv[] ) {
    FOO* foo = new FOO;    
    int pid = (int)getpid();
    for (int i=0;i<pid+3;i++) {
        printf ("  PID[%d]: Hello again! counter=%d, foo=%d\n", pid, i, g_foo.foo++);
        usleep((useconds_t) (1000000));
    }
    delete foo;
    return (int)0xBADC0DE0+ (int)pid;
}

The basic environment I initialized are:
1. zero BSS
2. jump to crt0 with ring3

The syscall generated are:

Code: Select all

SYSCALL : fstat(1, 7FBFF7D0)
SYSCALL : sbrk(00000430)
SYSCALL : sbrk(00000BD0)
SYSCALL : write(1, 01000010, 20)
FOO::FOO(0x10d008);      <--- Note this is global
SYSCALL : write(1, 01000010, 21)
FOO::FOO(0x1000420);    <--- Note this is new
SYSCALL : getpid() -> FFFFFFFF:801210D0: 1
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=0, foo=123
SYSCALL : usleep(000F4240)
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=1, foo=124
SYSCALL : usleep(000F4240)
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=2, foo=125
SYSCALL : usleep(000F4240)
SYSCALL : write(1, 01000010, 42)
  PID[1]: Hello again! counter=3, foo=126
SYSCALL : usleep(000F4240)
SYSCALL : exit(BADC0DE1)
  KMAIN : Clean zombie process: FFFFFFFF:801210D0

Hope this help. And by the way, since you said crash, what's the crash point and error code?

Posted: **Thu Sep 06, 2012 9:53 pm**

It's crashing at line 2214 in mallocr:

Code: Select all

set_head(top, top_size | PREV_INUSE);

By stepping through the code the offending line is the same as in the first post in this thread.

I also ran your example (though I had to remove usleep since I have not implemented it yet). The call to getpid functions fine, but then it crashes in mallocr (a page fault), I assume while trying to malloc space for printf's output. The faulting address is 4.

My only guess is that somehow the starting sbrk value isn't 0, so mallocr doesn't call sbrk, and tries to use this value as the beginning of the heap.

I really thought this was an issue with either my elf loader or maybe something in the filesystem, but I can't see anything obviously wrong.

Code: Select all


#include <common.h>
#include <stdio.h>
#include <string.h>
#include <memory.h>
#include <mm/liballoc.h>
#include <thread.h>
#include <fs/vfs.h>

#define PT_LOAD 0x1
typedef struct elf_header {
    uint8_t magic[4];
    uint8_t class;
    uint8_t byteorder;
    uint8_t hversion;
    uint8_t pad[9];
    uint16_t filetype;
    uint16_t archtype;
    uint32_t fversion;
    uint32_t entry;
    uint32_t phdrpos;
    uint32_t shdrpos;
    uint32_t flags;
    uint16_t hdrsize;
    uint16_t phdrent;
    uint16_t phdrcnt;
    uint16_t shdrent;
    uint16_t shdrcnt;
    uint16_t strsec;
} __attribute__((packed)) elf_header_t;
typedef struct elf_section {
    uint32_t sh_name;
    uint32_t sh_type;
    uint32_t sh_flags;
    uint32_t sh_addr;
    uint32_t sh_offset;
    uint32_t sh_size;
    uint32_t sh_link;
    uint32_t sh_info;
    uint32_t sh_align;
    uint32_t sh_entsize;
}  __attribute__((packed)) elf_section_t;
typedef struct elf_program_header {
    uint32_t ph_type;
    uint32_t ph_offset;
    uint32_t ph_virtaddr;
    uint32_t ph_physaddr;
    uint32_t ph_filesize;
    uint32_t ph_memsize;
    uint32_t ph_flags;
    uint32_t ph_align;
} __attribute__((packed)) elf_program_header_t;

void elf_print_sections(elf_section_t *sections)
{
    printf("type %i addr %X offset %i size %X link %X info %X align %X entsize %x\n",
        sections->sh_type, sections->sh_addr, sections->sh_offset, sections->sh_size,
        sections->sh_link, sections->sh_info, sections->sh_align, sections->sh_entsize);
}

void elf_print_programs(elf_program_header_t *program)
{
    printf("type %i offset %X virtaddt %X filesize %X memsize %X align %X\n",
        program->ph_type, program->ph_offset, program->ph_virtaddr,
        program->ph_filesize, program->ph_memsize, program->ph_align);
}

static void elf_load_program(elf_header_t *header, int fd)
{
    elf_program_header_t *program = kmalloc(sizeof(*program)*header->phdrcnt);
    void *code;
    pagedir_t pd = thread_current()->pd;
    int pages = 0;
    sys_lseek(fd, header->phdrpos, SEEK_SET);
    sys_read(fd, program, sizeof(*program)*header->phdrcnt);
//  elf_print_programs(program);
    for(int i = 0; i < header->phdrcnt; i++)
    {
        if(program->ph_type == PT_LOAD)
        {
            sys_lseek(fd, program->ph_offset, SEEK_SET);
            pages = program->ph_memsize/PAGE_SIZE + 1;
            code = pallocn(pages);
            sys_read(fd, code, program->ph_filesize);
            pagedir_insert_pagen(pd, (uintptr_t)code, program->ph_virtaddr, 0x7, pages);
        //  printf("insert %i pages\n",pages);
        }
        program++;
    }

    pagedir_install(pd);
}
int load_elf(const char *path, uintptr_t *eip)
{
    int fd;
    elf_header_t *header;
    header = kmalloc(sizeof(*header));

    if((fd = sys_open(path, 0)) < 0)
    {
        printf("Error opening ELF executable\n");
        return -1;
    }

    if(sys_read(fd, header, sizeof(*header)) != sizeof(*header))
    {
        printf("Error reading ELF header\n");
        return -1;
    }

    int ret = memcmp(header->magic, ELF_MAGIC, 4);

    if(ret != 0)
    {
        printf("Missing or invalid ELF magic number!\n");
        return -1;
    }

    elf_load_program(header, fd);

    *eip = header->entry;
    kfree(header);
    sys_close(fd);

    return 0;
}

Posted: **Thu Sep 06, 2012 10:45 pm**

In the PT_LOAD, I do this one more thing:

Code: Select all

 if ( elf_phdr->p_filesz < elf_phdr->p_memsz ) {
   memset ( (void*)(seg->base + elf_phdr->p_filesz), 0, elf_phdr->p_memsz - elf_phdr->p_filesz);
 }

This may make the difference.

Posted: **Fri Sep 07, 2012 7:29 pm**

The actual problem seems to have been here:

Code: Select all

  sys_read(fd, code, program->ph_filesize);
  pagedir_insert_pagen(pd, (uintptr_t)code, program->ph_virtaddr, 0x7, pages);

I noticed that in some cases ph_virtaddr wasn't on a page boundary, so say when the virtaddr was 0x400020, I was actually loading at 0x400000, since code is page aligned.

C++ is still giving me trouble, but I built a decently complicated C program and haven't seen any issues yet. I figured I'd followup in case anyone has a similar issue.

Posted: **Sat Sep 08, 2012 2:48 am**

blanham wrote:I noticed that in some cases ph_virtaddr wasn't on a page boundary

That sound strange, I never saw that. Did you put align for every section in the linker script? Can you confirm there is nothing on SHT_RELA?
I am interested to know why, since the same problem may also arise in mine kernel.

OSDev.org

[Newlib] Bug with malloc_r()

[Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()

Re: [Newlib] Bug with malloc_r()