Insane fun.
Taking a few hours tonight to look at optimization.
Looked at my memset function.
Changed the for loop, into rep stos
GCC is brilliant at optimizing out unwanted code.
But is it possible to tell GCC to use code if the size is hard coded?
I am still to align to boundary, but here is what I have.
Code: Select all
extern "C" FIL void* memset(void* s, int c, DWORD count)
{
int d0, d1;
DWORD AND7 = (QWORD)s & 7;
DWORD SHR3 = count >> 3;
QWORD RAX = (QWORD)0x0101010101010101;
int MOVZBL = (int)c;
QWORD IMULQ = RAX * MOVZBL;
if (AND7)
{
asm volatile( \
"rep stosq;" \
"movl %5, %%ecx;" \
"rep stosb;"
: "=&c" (d0), "=&D" (d1)
: "a"(IMULQ), "1"(s), "0"(SHR3), "g"(AND7) : "memory");
}
else
{
asm volatile( \
"rep stosq;" \
: "=&c" (d0), "=&D" (d1)
: "a"(IMULQ), "1"(s), "0"(SHR3) : "memory");
}
return s;
}
Code: Select all
memset((void*)pVGADisplayScreen, 0, 80 * 25 * 2);
Or the opposite, can I get it to check if (AND7)? If the size is known at run-time.