Outstripping the GCC Inline Compiler: inb/outb functions.

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
stonedzealot

Outstripping the GCC Inline Compiler: inb/outb functions.

Post by stonedzealot »

So, I got curious about how the GCC inline ASM compiler did so I had it compile two simple inline ASM functions: outportb and inportb. They look like this in C:

Code: Select all

void outportb(unsigned port, unsigned val)

{

   __asm__ __volatile__("outb %b0,%w1"

      :

      : "a"(val), "d"(port));

}
and

Code: Select all

unsigned inportb(unsigned short port)

{

   unsigned char ret_val;



   __asm__ __volatile__("inb %1,%0"

      : "=a"(ret_val)

      : "d"(port));

   return ret_val;

}
Basic enough. Then I disassembled them into their GNU ASM counterparts and tried to translate them. First, outportb. The disassembled code is pretty straight forward, as is the NASM translation next to it:

Code: Select all

push %ebp                      push ebp
mov %esp, %ebp            mov ebp, esp
mov 0x8(%ebp),%edx     mov edx, [ebp + 8]
mov 0xc(%ebp),%eax      mov eax, [ebp + 12]
out %al, (%dx)                 out dx, al
pop %ebp                        pop ebp
ret                                    ret
The code for the inportb function is quite different though, and I'd like to ask some questions. Here's the disassembled code:

Code: Select all

push %ebp
mov %esp, %ebp
sub $0x4,%esp <-------------------------(1)      
mov 0x8(%ebp),%eax
mov %ax,0xfffffffe(%ebp) <------------(2)
movzwl 0xfffffffe(%ebp), %edx 
in (%dx), %al
mov %al, 0xfffffffd(%ebp)<--------------(3)
movzbl 0xffffffd(%ebp),%eax
leave
ret
(1) Flat out, why in the hell is this here? I could see if it was the addition of four to ESP, but not subtraction...

(2) This line and the next...doesn't that just simplify into mov edx, ax or xor edx, edx; mov dx, ax?

(3) Same problem with the last, doesn't this just expand al into the eax. (the same thing that would be achieved if you cleared eax right before the in call)?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Outstripping the GCC Inline Compiler: inb/outb functions.

Post by Solar »

Answer (1): Stack grows downwards.
Every good solution is obvious once you've found it.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Outstripping the GCC Inline Compiler: inb/outb functions.

Post by Pype.Clicker »

Well, from the code you obtained, i guess you did not turn the -Optimizer on ... try recompiling with -O3 if you want GCC to output something smart ...

(1.) the sub esp,4 is intended to allocate a place for the "ret" variable.

(2.) as you have "%1":"d"(port), GCC believes the *whole* value of edx must be prepared, so it zeroes the high part. using "%w1" should fix this.

(3.) as you said GCC that the output was an unsigned and al is just a byte, it goes through an additionnal mozb which will zero the high part of eax. saying that inb returns an unsigned char should fix this.
stonedzealot

Re:Outstripping the GCC Inline Compiler: inb/outb functions.

Post by stonedzealot »

wow. look at that. All that wasted CPU time in a seemingly simple function. This is why I'm not doing this inline...it's dangerous (and a little weird). Anyway, thanks again Pype.

Solar: Yes...but your response still doesn't tell why it's allocating space...oh well. It's all handled now.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Outstripping the GCC Inline Compiler: inb/outb functions.

Post by Pype.Clicker »

well, if the purpose of your code was to have inline function that would just emit an "out %eax, %dx", looking at the code generated for the function itself will not really help.

1. you must declare inb and outb as "static inline void outb(word port, byte val)"
2. you must enable the optimizer, so that inline functions are actually inlined.
3. you may use "Nd"(port) rather than "d"(port), which will allow GCC to emut "outb %al, $0x20" aswell.

Now, as Tim would say, it will not really speed up your code (because in/out have huuuge latencies compared to other instructions, but it can make your code smaller ...
Post Reply