Page 2 of 2
Posted: Tue Dec 11, 2007 7:14 am
by exkor
http://www.wasm.ru/forum/viewtopic.php?id=20503
posts # 8,15,...
post # is at the top right corner of post header
bewing, you really need to get rid of those calls, I hope they are not far calls.
far call instruction:
http://www.agner.org/optimize/ (10+ times slower plus call itself is not fast instruction0
Posted: Tue Dec 11, 2007 7:41 am
by JamesM
What a useful post! If only I spoke russian...
Posted: Tue Dec 11, 2007 7:44 am
by exkor
russian is not required, each peace of code is less than 40 baits in size, but if you don't know assembly - there is nothing there to do probably.
and the 2nd link is extremely important. ALL Read.
http://www.asmcommunity.net/board/ , search for "hex" and you'll find lots of hex related function, not each of them is state of the art but some better than C/C++
Posted: Tue Dec 11, 2007 9:17 am
by JamesM
exkor wrote:russian is not required, each peace of code is less than 40 baits in size, but if you don't know assembly - there is nothing there to do probably.
and the 2nd link is extremely important. ALL Read.
http://www.asmcommunity.net/board/ , search for "hex" and you'll find lots of hex related function, not each of them is state of the art but some better than C/C++
I do know assembly, I just found all the explanations being in an eastern bloc language a little confusing!!!
Posted: Tue Dec 11, 2007 2:37 pm
by bewing
exkor, they are near calls, of course -- there has been no good reason for anyone to ever use a far call for the last 10 years. But, you are right, I had never gotten around to optimizing this pair of routines. So, better code:
Code: Select all
dhex_lng:
ror ecx, 16 ; swap the high & low words of ecx
dhex_s:
push edx
mov edx, ecx
movzx ecx, ch ; convert high byte to 2 hex digits + 2 attrib bytes
mov cx, [hex_lkup + ecx*2] ; get 2 bytes from lookup table
mov al, cl ; high digit
; Note: the high digit is supposed to be written 1st, so it ends up in dl
stosw
mov al, ch ; low digit
stosw ; attrib was still in ah
movzx ecx, dl ; now the low byte of input ecx
mov cx, [hex_lkup + ecx*2] ; get 2 bytes from lookup table
mov al, cl ; high digit
stosw
mov al, ch ; low digit
stosw ; attrib still in ah
mov ecx, edx
pop edx
ret
Posted: Tue Dec 11, 2007 3:56 pm
by XCHG
I am not a C programmer but I did two macros that simulate ROL and ROR in C:
Code: Select all
#define ROL(Value, Times) Value = ((Value << Times) | Value >> ((sizeof(Value) << 0x03) - Times))
#define ROR(Value, Times) Value = ((Value >> Times) | Value << ((sizeof(Value) << 0x03) - Times))
Posted: Wed Dec 12, 2007 2:36 am
by JamesM
I am not a C programmer but I did two macros that simulate ROL and ROR in C:
Code: Select all
#define ROL(Value, Times) Value = ((Value << Times) | Value >> ((sizeof(Value) << 0x03) - Times))
#define ROR(Value, Times) Value = ((Value >> Times) | Value << ((sizeof(Value) << 0x03) - Times))
A better version:
Code: Select all
int rol(int value, int times)
{
asm volatile("rol %0, %1" : "=a" (value) : "a" (value), "r" (times));
return value;
}
int ror(int value, int times)
{
asm volatile("ror %0, %1" : "=a" (value) : "a" (value), "r" (times));
return value;
}
Ph33r! And with code inlining the calls get taken out. Much faster than three shifts, an OR and a subtract!
Posted: Wed Dec 12, 2007 2:52 am
by XCHG
JamesM,
Of course that's better. But then again, that's not C. Oh by the way, this is another Hexadecimal printing function that I wrote today in VC++. With this one, you can specify the number of nibbles that have to be printed and the value will be adjusted according to the number of nibbles that have to be printed.
Code: Select all
void WriteHex (unsigned int InValue, unsigned int NumberofNibblesToPrint = (sizeof(unsigned int) << 0x01),
bool IncludeHexPrefix = true) {
/* Exit the function if the number of nibbles to print is zero */
if (NumberofNibblesToPrint == 0)
return;
/* Print the "0x" hexadecimal value prefix if requested */
if (IncludeHexPrefix == true)
printf("0x");
/* If the number of nibbles requested to be printed is more than the number of nibbles available
in the [InValue] parameter, then set the maximum number of nibbles to the number of nibbles
available in this parameter */
if (NumberofNibblesToPrint > (sizeof(InValue) << 0x01))
NumberofNibblesToPrint = (sizeof(InValue) << 0x01);
/* Now get the requested nibble and the nibbles to its right (if any) to the leftmost nibble of
the value so that they can be easily extracted */
InValue = InValue << ((sizeof(InValue) << 0x03) - (NumberofNibblesToPrint << 0x02));
/* Create a acharacter buffer to take care of printing a nibble at a time */
char CurrentChar;
/* While there are still nibbles to be printed ... do these */
while (NumberofNibblesToPrint-- > 0) {
/* Get the leftmost nibble in the [CurrentChar] variable */
CurrentChar = (char) ((InValue & 0xF0000000) >> 28) & 0x0F;
/* If the nibble is more than 9, then it should be a character between 'A' to 'F' */
if (CurrentChar > 9)
CurrentChar += 7;
/* Convert the current nibble to its equivalent character */
CurrentChar += 48;
/* Print the current nibble */
printf("%c", CurrentChar);
/* Get the next nibble to the leftmost nibble in the value */
InValue = InValue << (sizeof(char) << 0x02);
} /* while (NumberofNibblesToPrint-- > 0) { */
}
Posted: Wed Dec 12, 2007 3:23 am
by JamesM
XCHG wrote:JamesM,
Of course that's better. But then again, that's not C.
It doesn't matter what it's implemented in, it can be called from C and that's the main point. How do you think inb() and outb() are implemented normally?
Posted: Wed Dec 12, 2007 3:41 am
by os64dev
JamesM wrote:XCHG wrote:JamesM,
Of course that's better. But then again, that's not C.
It doesn't matter what it's implemented in, it can be called from C and that's the main point. How do you think inb() and outb() are implemented normally?
As Memory Mapped IO on a MIPSboard?
I would use the C function (maybe a templated inline version) anytime because at -O3 gcc translates is to a single rol and ror. Any it still would be portable across platforms.
Posted: Wed Dec 12, 2007 3:54 am
by JamesM
os64dev: After a quick test I discovered you're right. I had no idea GCC could optimise that well! I retract my statement.