Page 2 of 2

Posted: Tue Dec 11, 2007 7:14 am
by exkor
http://www.wasm.ru/forum/viewtopic.php?id=20503
posts # 8,15,...
post # is at the top right corner of post header

bewing, you really need to get rid of those calls, I hope they are not far calls.
far call instruction: http://www.agner.org/optimize/ (10+ times slower plus call itself is not fast instruction0

Posted: Tue Dec 11, 2007 7:41 am
by JamesM
What a useful post! If only I spoke russian...

Posted: Tue Dec 11, 2007 7:44 am
by exkor
russian is not required, each peace of code is less than 40 baits in size, but if you don't know assembly - there is nothing there to do probably.
and the 2nd link is extremely important. ALL Read.

http://www.asmcommunity.net/board/ , search for "hex" and you'll find lots of hex related function, not each of them is state of the art but some better than C/C++

Posted: Tue Dec 11, 2007 9:17 am
by JamesM
exkor wrote:russian is not required, each peace of code is less than 40 baits in size, but if you don't know assembly - there is nothing there to do probably.
and the 2nd link is extremely important. ALL Read.

http://www.asmcommunity.net/board/ , search for "hex" and you'll find lots of hex related function, not each of them is state of the art but some better than C/C++
I do know assembly, I just found all the explanations being in an eastern bloc language a little confusing!!! ;)

Posted: Tue Dec 11, 2007 2:37 pm
by bewing
exkor, they are near calls, of course -- there has been no good reason for anyone to ever use a far call for the last 10 years. But, you are right, I had never gotten around to optimizing this pair of routines. So, better code:

Code: Select all

dhex_lng:
	ror ecx, 16			; swap the high & low words of ecx
dhex_s:
	push edx
	mov edx, ecx
	movzx ecx, ch			; convert high byte to 2 hex digits + 2 attrib bytes
	mov cx, [hex_lkup + ecx*2]		; get 2 bytes from lookup table
	mov al, cl				; high digit
	; Note: the high digit is supposed to be written 1st, so it ends up in dl
	stosw
	mov al, ch				; low digit
	stosw					; attrib was still in ah
	movzx ecx, dl			; now the low byte of input ecx
	mov cx, [hex_lkup + ecx*2]		; get 2 bytes from lookup table
	mov al, cl				; high digit
	stosw
	mov al, ch				; low digit
	stosw					; attrib still in ah
	mov ecx, edx
	pop edx
	ret

Posted: Tue Dec 11, 2007 3:56 pm
by XCHG
I am not a C programmer but I did two macros that simulate ROL and ROR in C:

Code: Select all

#define ROL(Value, Times) Value = ((Value << Times) | Value >> ((sizeof(Value) << 0x03) - Times))
#define ROR(Value, Times) Value = ((Value >> Times) | Value << ((sizeof(Value) << 0x03) - Times))

Posted: Wed Dec 12, 2007 2:36 am
by JamesM
I am not a C programmer but I did two macros that simulate ROL and ROR in C:

Code: Select all

#define ROL(Value, Times) Value = ((Value << Times) | Value >> ((sizeof(Value) << 0x03) - Times)) 
#define ROR(Value, Times) Value = ((Value >> Times) | Value << ((sizeof(Value) << 0x03) - Times))
A better version:

Code: Select all

int rol(int value, int times)
{
  asm volatile("rol %0, %1" : "=a" (value) : "a" (value), "r" (times));
  return value;
}
int ror(int value, int times)
{
  asm volatile("ror %0, %1" : "=a" (value) : "a" (value), "r" (times));
  return value;
}
Ph33r! And with code inlining the calls get taken out. Much faster than three shifts, an OR and a subtract!

Posted: Wed Dec 12, 2007 2:52 am
by XCHG
JamesM,

Of course that's better. But then again, that's not C. Oh by the way, this is another Hexadecimal printing function that I wrote today in VC++. With this one, you can specify the number of nibbles that have to be printed and the value will be adjusted according to the number of nibbles that have to be printed.

Code: Select all

void WriteHex (unsigned int InValue, unsigned int NumberofNibblesToPrint = (sizeof(unsigned int) << 0x01), 
               bool IncludeHexPrefix = true) {
  
  /* Exit the function if the number of nibbles to print is zero */
  if (NumberofNibblesToPrint == 0)
    return;

  /* Print the "0x" hexadecimal value prefix if requested */
  if (IncludeHexPrefix == true)
    printf("0x");
  
  /* If the number of  nibbles requested to be printed is more than the number of nibbles available
     in the [InValue] parameter, then set the maximum number of nibbles to the number of nibbles
     available in this parameter */
  if (NumberofNibblesToPrint >  (sizeof(InValue) << 0x01))
    NumberofNibblesToPrint = (sizeof(InValue) << 0x01);

  /* Now get the requested nibble and the nibbles to its right (if any) to the leftmost nibble of
     the value so that they can be easily extracted */
  InValue = InValue << ((sizeof(InValue) << 0x03) - (NumberofNibblesToPrint << 0x02));
  
  /* Create a acharacter buffer to take care of printing a nibble at a time */
  char CurrentChar;

  /* While there are still nibbles to be printed ... do these */
  while (NumberofNibblesToPrint-- > 0) {

    /* Get the leftmost nibble in the [CurrentChar] variable */
    CurrentChar = (char) ((InValue & 0xF0000000) >> 28) & 0x0F;

    /* If the nibble is more than 9, then it should be a character between 'A' to 'F' */
    if (CurrentChar > 9)
      CurrentChar += 7;

    /* Convert the current nibble to its equivalent character */
    CurrentChar += 48;
    
    /* Print the current nibble */
    printf("%c", CurrentChar);

    /* Get the next nibble to the leftmost nibble in the value */
    InValue = InValue << (sizeof(char) << 0x02);
  } /* while (NumberofNibblesToPrint-- > 0) { */

}

Posted: Wed Dec 12, 2007 3:23 am
by JamesM
XCHG wrote:JamesM,

Of course that's better. But then again, that's not C.
It doesn't matter what it's implemented in, it can be called from C and that's the main point. How do you think inb() and outb() are implemented normally? ;)

Posted: Wed Dec 12, 2007 3:41 am
by os64dev
JamesM wrote:
XCHG wrote:JamesM,

Of course that's better. But then again, that's not C.
It doesn't matter what it's implemented in, it can be called from C and that's the main point. How do you think inb() and outb() are implemented normally? ;)
As Memory Mapped IO on a MIPSboard?

I would use the C function (maybe a templated inline version) anytime because at -O3 gcc translates is to a single rol and ror. Any it still would be portable across platforms.

Posted: Wed Dec 12, 2007 3:54 am
by JamesM
os64dev: After a quick test I discovered you're right. I had no idea GCC could optimise that well! I retract my statement.