How to get GCC to use code if compile-time value?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
tsdnz
Member
Member
Posts: 333
Joined: Sun Jun 16, 2013 4:09 am

How to get GCC to use code if compile-time value?

Post by tsdnz »

WOW, I love OS programming!!
Insane fun.

Taking a few hours tonight to look at optimization.
Looked at my memset function.

Changed the for loop, into rep stos

GCC is brilliant at optimizing out unwanted code.

But is it possible to tell GCC to use code if the size is hard coded?

I am still to align to boundary, but here is what I have.

Code: Select all

extern "C" FIL void* memset(void* s, int c, DWORD count)
{
	int d0, d1;

	DWORD AND7 = (QWORD)s & 7;
	DWORD SHR3 = count >> 3;
	QWORD RAX = (QWORD)0x0101010101010101;
	int MOVZBL = (int)c;
	QWORD IMULQ = RAX * MOVZBL;

	if (AND7)
	{
		asm volatile( \
			"rep stosq;" \
			"movl %5, %%ecx;" \
			"rep stosb;"
			: "=&c" (d0), "=&D" (d1)
			: "a"(IMULQ), "1"(s), "0"(SHR3), "g"(AND7) : "memory");
	}
	else
	{
		asm volatile( \
			"rep stosq;" \
			: "=&c" (d0), "=&D" (d1)
			: "a"(IMULQ), "1"(s), "0"(SHR3) : "memory");
	}

	return s;
}
This code works great when called like

Code: Select all

memset((void*)pVGADisplayScreen, 0, 80 * 25 * 2);
If I call memset with a size that is not known at compile can I get GCC to use the below code, with out checking the if (AND7)?
Or the opposite, can I get it to check if (AND7)? If the size is known at run-time.
Octocontrabass
Member
Member
Posts: 5588
Joined: Mon Mar 25, 2013 7:01 pm

Re: How to get GCC to use code if compile-time value?

Post by Octocontrabass »

Just as a reminder, GCC has a built-in memset, so there's no guarantee it'll call your version. (And it may call memset when you don't expect it!)

Some CPUs have optimized handling of rep movsb and rep stosb; on those CPUs, a simple rep stosb for memset will be faster than what you have now.
tsdnz wrote:But is it possible to tell GCC to use code if the size is hard coded?
I'm not sure I understand this question.

If you hardcode the inputs to a function, GCC will remove the unnecessary code paths. (GCC follows the C standard's definition of "necessary", which might not be what you expect.)

If you don't want GCC to do that, use -O0 to disable optimizations.
tsdnz wrote:If I call memset with a size that is not known at compile can I get GCC to use the below code, with out checking the if (AND7)?
GCC can remove code paths only when it's able to determine that the code is unreachable.
tsdnz wrote:Or the opposite, can I get it to check if (AND7)? If the size is known at run-time.
If GCC's optimizations have removed a code path that you are trying to use, there is a bug in your code.
tsdnz
Member
Member
Posts: 333
Joined: Sun Jun 16, 2013 4:09 am

Re: How to get GCC to use code if compile-time value?

Post by tsdnz »

Some CPUs have optimized handling of rep movsb and rep stosb; on those CPUs, a simple rep stosb for memset will be faster than what you have now.
Right, well in that case the question is mute, thank-you.

I was trying to ask if it was possible to tell GCC to use code if a parameter was constant, eg known at compile time.
Like;
#IfKnown (xxyy)
#else
#endif

But for my example this does not matter, I will change to rep stosb.

Thanks again.
Octocontrabass
Member
Member
Posts: 5588
Joined: Mon Mar 25, 2013 7:01 pm

Re: How to get GCC to use code if compile-time value?

Post by Octocontrabass »

tsdnz wrote:I was trying to ask if it was possible to tell GCC to use code if a parameter was constant, eg known at compile time.
If GCC can inline the function, it can optimize it for constant inputs. I'm not sure if it can optimize a function like that if it can't inline the function.
tsdnz wrote:But for my example this does not matter, I will change to rep stosb.
On CPUs that don't support ERMSB, rep stosb is very slow.
tsdnz
Member
Member
Posts: 333
Joined: Sun Jun 16, 2013 4:09 am

Re: How to get GCC to use code if compile-time value?

Post by tsdnz »

Thanks, ERMSB pointed me to some great information.
Exampes, etc, great, thank you.

64-ia-32-architectures-optimization-manual.pdf
GENERAL OPTIMIZATION GUIDELINES
3.7.7 Enhanced REP MOVSB and STOSB operation (ERMSB)
Beginning with processors based on Intel microarchitecture code named Ivy Bridge,
REP string operation using MOVSB and STOSB can provide both flexible and highperformance
REP string operations for software in common situations like memory
copy and set operations. Processors that provide enhanced MOVSB/STOSB operations
are enumerated by the CPUID feature flag: CPUID:(EAX=7H,
ECX=0H):EBX.ERMSB[bit 9] = 1.
3.7.7.1 Memcpy Considerations
The interface for the standard library function memcpy introduces several factors
(e.g. length, alignment of the source buffer and destination) that interact with
microarchitecture to determine the performance characteristics of the implementation
of the library function. Two of the common approaches to implement memcpy
are driven from small code size vs. maximum throughput. The former generally uses
REP MOVSD+B (see Section 3.7.6), while the latter uses SIMD instruction sets and
has to deal with additional data alignment restrictions.
For processors supporting enhanced REP MOVSB/STOSB, implementing memcpy
with REP MOVSB will provide even more compact benefits in code size and better
throughput than using the combination of REP MOVSD+B. For processors based on
Intel microarchitecture code named Ivy Bridge, implementing memcpy using ERMSB
might not reach the same level of throughput as using 256-bit or 128-bit AVX alternatives,
depending on length and alignment factors.
Post Reply