Am I using this memory correctly?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
iamnoob
Posts: 13
Joined: Mon Oct 19, 2015 12:36 pm

Am I using this memory correctly?

Post by iamnoob »

I have the following output when iterating the memory map:

Code: Select all

size = 14, base_addr = 0, length = 9fc00, type = available
size = 14, base_addr = 9fc00, length = 400, type = reserved
size = 14, base_addr = f0000, length = 10000, type = reserved
size = 14, base_addr = 100000, length = 7ee0000, type = available
size = 14, base_addr = 7fe0000, length = 20000, type = reserved
size = 14, base_addr = fffc0000, length = 40000, type = reserved
Since I don't have any sort of memory management, I wanted to see if I could use the memory directly.

Code: Select all

    auto free_mem_start = (uint8_t*) 0x100000;
    auto free_mem_end   = (uint8_t*) 0x7ee0000;
    auto dist           = free_mem_end - free_mem_start;
    memset(free_mem_start, 0, dist);
    auto pos   = 0 * vbe->LinBytesPerScanLine + 0 * (vbe->bpp / 8);
    auto color = 0xFF0000;
    free_mem_start[pos] = color & 0xFF;
    free_mem_start[pos + 1] = (color >> 8) & 0xFF;
    free_mem_start[pos + 2] = (color >> 16) & 0xFF;
    mem[pos] = free_mem_start[pos];
    mem[pos + 1] = free_mem_start[pos + 1];
    mem[pos + 2] = free_mem_start[pos + 2];
This just gives me a blank screen though. Is there something special I need to do before I can start using the memory?

<hr />

EDIT:

OK, looks like it was just slow. I changed the distance to 0x40. Maybe I need to optimize my mem* functions?

Code: Select all

void *memset(void *s, int c, size_t n)
{
    unsigned char* p= (unsigned char*)s;
    while(n--)
        *p++ = (unsigned char)c;
    return s;
}

void *memcpy(void *dest, const void *src, size_t n)
{
    char *dp = (char *)dest;
    const char *sp = (const char *)src;
    while (n--)
        *dp++ = *sp++;
    return dest;
}
The issue is I've heard two pieces of information: GCC's built-in mem* are very fast, but since it's not always called, I need to provide my own mem*. If copying/setting is always going to be this slow, it's unfeasible for me not to write directly to video memory. Any tips?
User avatar
BASICFreak
Member
Member
Posts: 284
Joined: Fri Jan 16, 2009 8:34 pm
Location: Louisiana, USA

Re: Am I using this memory correctly?

Post by BASICFreak »

I would not trust hard coding memory size.

Especially since the mem map you see is a bit strange, it's missing the hole at 15M and 0xA0000

memcpy should (IMO) be done this way:

Code: Select all

#define memcpy(d, s, l) __asm__ __volatile__ ("rep movsb" :  : "S" (s), "D" (d), "c" (l) : "memory")
#define memcpyw(d, s, l) __asm__ __volatile__ ("rep movsw" :  : "S" (s), "D" (d), "c" (l) : "memory")
#define memcpyd(d, s, l) __asm__ __volatile__ ("rep movsd" :  : "S" (s), "D" (d), "c" (l) : "memory")
If you have the mem map, why not just use the mem map to set free space (as is recommended by everyone here), but yes you can use the memory without any initialization (after POST that is)

EDIT:
Also, auto pos = 0 * vbe->LinBytesPerScanLine + 0 * (vbe->bpp / 8);
so pos = 0 * x + 0 * y = 0 + 0 * y = 0 * y = 0??? no matter what value you have its always 0.
BOS Source Thanks to GitHub
BOS Expanded Commentary
Both under active development!
Sortie wrote:
  • Don't play the role of an operating systems developer, be one.
  • Be truly afraid of undefined [behavior].
  • Your operating system should be itself, not fight what it is.
iamnoob
Posts: 13
Joined: Mon Oct 19, 2015 12:36 pm

Re: Am I using this memory correctly?

Post by iamnoob »

BASICFreak wrote: EDIT:
Also, auto pos = 0 * vbe->LinBytesPerScanLine + 0 * (vbe->bpp / 8);
so pos = 0 * x + 0 * y = 0 + 0 * y = 0 * y = 0??? no matter what value you have its always 0.
As a test, I was just plotting one pixel. Anyways, I've made some modifications to my program. All writing operations now happen to free_mem_start instead of the video memory, and then I copy it all over using your memcpy. I've reduced the size to 1 megabyte and the starting address to 1 megabyte. No issues so far.

There is certainly an improvement in speed, but it's hard to tell because I don't see "visual feedback" before each copy.

Code: Select all

uint8_t *mem;
uint8_t *free_mem_start;

auto xy_to_offset(int x, int y)
{
    return y * vbe->LinBytesPerScanLine + x * (vbe->bpp / 8);
}

void draw_pixel(int x, int y, uint32_t color)
{
    auto pos = xy_to_offset(x, y);
    free_mem_start[pos] = color & 0xFF;
    free_mem_start[pos + 1] = (color >> 8) & 0xFF;
    free_mem_start[pos + 2] = (color >> 16) & 0xFF;
}

void draw_rectangle(int x, int y, int width, int height, uint32_t color)
{
    for (int i = x; i < x + width; ++i)
    {
        for (int j = y; j < y + height; ++j)
        {
            draw_pixel(i, j, color);
        }
    }
}

extern bitmap_font font;

void draw_char(wchar_t c, int x, int y, int color)
{
    auto index  = index_of(font.Index, font.Index + font.Chars, c);
    auto height = font.Height;
    auto width  = font.Widths[index];

    auto bitmap = font.Bitmap + 
        index
        * (height * 2);

    for(auto cy = 0u; cy < height; ++cy)
    {
        for(auto cx = 0u; cx <= width; ++cx)
        {
            auto mask = (1 << (width - cx));
            if(bitmap[cy * 2] & mask)
            {
                draw_pixel(x + cx, y + cy, color);
            }
        }
    }
}

void copy_pixel(int x, int y, int x2, int y2)
{
    auto pos  = xy_to_offset(x, y);
    auto pos2 = xy_to_offset(x2, y2);
    free_mem_start[pos2]     = free_mem_start[pos];
    free_mem_start[pos2 + 1] = free_mem_start[pos + 1];
    free_mem_start[pos2 + 2] = free_mem_start[pos + 2];
}

void copy_glyph(int x, int y, int x2, int y2)
{
    for(auto cy = 0u; cy < font.Height; ++cy)
    {
        for(auto cx = 0u; cx < font.Width; ++cx)
        {
            copy_pixel(x + cx, y + cy, x2 + cx, y2 + cy);
        }
    }
}

size_t row    = 0;
size_t column = 0;
size_t max_rows = 0;
size_t max_columns = 0;
void scroll()
{
    if (row >= max_rows)
    {
        auto start = 0;
        auto end   = max_rows - 1;
        for (auto it = start; it < end; ++it)
        {
            for (auto i = 0u; i < max_columns; ++i)
                copy_glyph(i * font.Width, (it + 1) * font.Height, 
                           i * font.Width, it * font.Height);
        }

        draw_rectangle(0, (max_rows - 1) * font.Height,
                       max_columns * font.Width, font.Height,
                       0);

        row = max_rows - 1;
    }
}

void draw_string(const wchar_t* s, int color = 0xFFFFFF)
{
    auto len = strlen(s);

    for (auto i = 0u; i < len; ++i)
    {
        if (s[i] == '\n')
        {
            ++row;
            column = 0;
        }
        else
        {
            auto index  = index_of(font.Index, font.Index + font.Chars, s[i]);
            auto width  = font.Widths[index];            
            draw_char(s[i], column * width, row * font.Height, color);
            ++column;
        }

        scroll();
    }
}

void print_memory_map()
{
    wchar_t buf[100];

    snprintf (buf, sizeof(buf), 
        "upper_mem = 0x%x, lower_mem   = 0x%x\n"
        "mmap_addr = 0x%x, mmap_length = 0x%x\n",
        mbi->mem_lower, mbi->mem_upper,
        mbi->mmap_addr, mbi->mmap_length);

    write_serial_string(buf);    
    wchar_t buf2[100];

    if (CHECK_BIT(mbi->flags, 6))
    {
        auto mmap = (multiboot_memory_map_t *) mbi->mmap_addr;
        auto end  = mbi->mmap_addr + mbi->mmap_length;
        while((uint32_t) mmap < end) 
        {
            snprintf (buf2, sizeof(buf2),
                "size = %x, base_addr = %x, length = %x, type = %s\n",
                (uint16_t) mmap->size,
                mmap->addr_low,
                mmap->len_low,
                (mmap->type == MULTIBOOT_MEMORY_AVAILABLE
                    ? L"available"
                    : L"reserved"));
            
            write_serial_string(buf2);
            mmap = (multiboot_memory_map_t *) 
                ((uint32_t) mmap + mmap->size + sizeof(mmap->size));
        }
    } else write_serial_string(L"No memory map.\n");    
}

extern "C" void early_main(uint32_t m, uint32_t a)
{
    mbi = (multiboot_info_t *)a;
    vbe = (vbe_mode_info_t *)mbi->vbe_mode_info;
    mem = (uint8_t *)vbe->physbase;
    free_mem_start = (uint8_t*) 0x1000000;
    memset(free_mem_start, 0, 0x1000000);
    auto xres = vbe->Xres;
    auto yres = vbe->Yres;
    max_rows   = yres / font.Height;
    max_columns = xres / font.Width;
}

extern "C" void kernel_main(uint32_t magic, uint32_t addr)
{
    wchar_t buf[10];
    for (int i = 0; i < 100; ++i)
    {
        snprintf(buf, 10, "#%d\n", i);
        draw_string(buf);
    }
    memcpy(mem, free_mem_start, 0x1000000);
}
User avatar
BASICFreak
Member
Member
Posts: 284
Joined: Fri Jan 16, 2009 8:34 pm
Location: Louisiana, USA

Re: Am I using this memory correctly?

Post by BASICFreak »

Just a note, feel free to disregard this if you wish.

For performance on large copies I would recommend using the SSE in something like the following:

Code: Select all

refreshScreen:
	pusha

	mov ecx, DWORD [VBE_FBSize]
	shr ecx, 7
	mov esi, DWORD [VBE_FB]
	mov edi, DWORD [VBE_LFB]
	.copyloop:
		MOVAPS xmm0, [es:esi]
		MOVAPS xmm1, [es:esi + 0x10]
		MOVAPS xmm2, [es:esi + 0x20]
		MOVAPS xmm3, [es:esi + 0x30]
		MOVAPS xmm4, [es:esi + 0x40]
		MOVAPS xmm5, [es:esi + 0x50]
		MOVAPS xmm6, [es:esi + 0x60]
		MOVAPS xmm7, [es:esi + 0x70]
		
		MOVAPS [es:edi], xmm0
		MOVAPS [es:edi + 0x10], xmm1
		MOVAPS [es:edi + 0x20], xmm2
		MOVAPS [es:edi + 0x30], xmm3
		MOVAPS [es:edi + 0x40], xmm4
		MOVAPS [es:edi + 0x50], xmm5
		MOVAPS [es:edi + 0x60], xmm6
		MOVAPS [es:edi + 0x70], xmm7
		
		add esi, 0x80
		add edi, 0x80
		dec ecx
		jnz .copyloop

	popa
	ret
Extracted directly from my (old) SVGA driver. I have tried 32-bit memcpy and it is WAY (WAY) slower than 128-bit copy X 8 (that's 1Kbit (128Bytes) at a time in 16 instructions).

Again just a suggestion if you are that concerned about speed.
BOS Source Thanks to GitHub
BOS Expanded Commentary
Both under active development!
Sortie wrote:
  • Don't play the role of an operating systems developer, be one.
  • Be truly afraid of undefined [behavior].
  • Your operating system should be itself, not fight what it is.
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Am I using this memory correctly?

Post by alexfru »

Pushad/popad?
User avatar
BASICFreak
Member
Member
Posts: 284
Joined: Fri Jan 16, 2009 8:34 pm
Location: Louisiana, USA

Re: Am I using this memory correctly?

Post by BASICFreak »

alexfru wrote:Pushad/popad?
That or push esi and edi - otherwise it breaks C code (was a fun bug to find), and it was able to be called from anywhere.

Either way, I was referencing the SSE functionality with an excerpt from old test code for a quicker way to move around data - as the OP kept referring to speed on memory copy. And it was in context to video frame buffer so I left the surrounding instead of just the copy loop.

I guess if I were to re-do-it it might look more like:

Code: Select all

mov ecx, DWORD [VBE_FBSize]
	shr ecx, 7
	mov eax, [VBE_FB]
	mov edx, [VBE_LFB]
	.copyloop:
		MOVAPS xmm0, [eax]
		MOVAPS xmm1, [eax + 0x10]
		MOVAPS xmm2, [eax + 0x20]
		MOVAPS xmm3, [eax + 0x30]
		MOVAPS xmm4, [eax + 0x40]
		MOVAPS xmm5, [eax + 0x50]
		MOVAPS xmm6, [eax + 0x60]
		MOVAPS xmm7, [eax + 0x70]
		
		MOVAPS [edx], xmm0
		MOVAPS [edx + 0x10], xmm1
		MOVAPS [edx + 0x20], xmm2
		MOVAPS [edx + 0x30], xmm3
		MOVAPS [edx + 0x40], xmm4
		MOVAPS [edx + 0x50], xmm5
		MOVAPS [edx + 0x60], xmm6
		MOVAPS [edx + 0x70], xmm7
		
		add eax, 0x80
		add edx, 0x80
		dec ecx
		jnz .copyloop
	ret
Which would be complaint to the C calling standard without using the stack... but even then compiler optimization can even break this thinking eax, ecx, and edx didn't change...
BOS Source Thanks to GitHub
BOS Expanded Commentary
Both under active development!
Sortie wrote:
  • Don't play the role of an operating systems developer, be one.
  • Be truly afraid of undefined [behavior].
  • Your operating system should be itself, not fight what it is.
pcmattman
Member
Member
Posts: 2566
Joined: Sun Jan 14, 2007 9:15 pm
Libera.chat IRC: miselin
Location: Sydney, Australia (I come from a land down under!)
Contact:

Re: Am I using this memory correctly?

Post by pcmattman »

In terms of memcpy/memset, note that rep movsb/rep stosb can use larger block sizes "behind the scenes".

It's worth noting that using SSE in a kernel is problematic. You typically need to save floating point state due to interrupts (taking you potentially to another memcpy) and userspace programs using floating point. See kernel_fpu_begin/kernel_fpu_end in the Linux kernel for a real-world example of this.

Once you do a full SSE state save, copy with preemption disabled, and full SSE state restore, you tend to lose the advantages you might have otherwise gained.
Post Reply