C Language

Programming, for all ages and all languages.
vibhory2j

C Language

Post by vibhory2j »

hello,

can anybody give a guide or a refrence on "Assemlby in C Language" or "Linking C and Assembly.

Thanks in advance.
cheers
kernel_journeyman

Re:C Language

Post by kernel_journeyman »

The basics of it are that C passes arguments on the stack from right to left, and return values are given in the accumulator. C also prepends underscores to symbol names (data, functions) and so assembly language may have to take care of that if you're not using ELF executables.

Let's say you have an assembly language function, add3, in a file called func.asm:

[BITS 32]
[SECTION .text]

_add3:
   enter 0, 0
   xor eax, eax   ; zero out accumulator
   mov eax, [ebp + 8]
   add eax, [ebp + 12]
   add eax, [ebp + 16]
   leave
   ret

And in file testfunc.c:

#include <stdio.h>

extern add3(int, int, int);

int main()
{
   int i, j, k;
   printf("Enter three integers: ");
   scanf("%i %i %i", &i, &j, &k);
   printf("Result is %i", add3(i, j, k));
}


(By the way, don't enter negative integers, or the result won't be what you expect.)

The C compiler automatically prepends underscores to C symbols, so while in the C file you refer to the function as add3, its real name is _add3. And to compile:

nasm -f your_exe_format -o func.obj func.asm
gcc -o testfunc.exe -O2 testfunc.c func.obj

On Windows of course object and executable files are COFF files and unlinked object files normally have the extension ".obj".

In the assembly language function, enter 0, 0 is shorthand for:

   push ebp
   mov ebp, esp

... which saves the calling functions stack base pointer by pushing it on the stack, and sets up its own stack base pointer by putting the stack pointer in the ebp register. As you can see it allocates no local storage. The two zero arguments to enter are to allocate no stack space for local variables, and a nesting level of 0 respectively. The nesting level refers to the number of stack frames you want the procedure to see.

leave is shorthand for:

   mov esp, ebp   ; destroy our stack frame, put back caller's
   pop ebp      ; restore caller's base pointer

The [ebp + 8] refers to the first argument, which corresponds to "i" in the C function when called as add3(i, j, k). [ebp + 12] is the second argument and [ebp + 16] is the third. Integers on 32-bit platforms are, of course, 32 bits in size, or 4 bytes, and hence the referring to items in the base pointer by multiples of 4. Note that the first two arguments from the base pointer are the return address which is used in conjunction with ret when you return from the function, and something else which has escaped me at the moment (anyone? Maybe caller's base pointer?)

For inline assembler, most compilers do not parse the assembler and simply emit it in their own assembler verbatim, and assume that you know what you're doing. Hence you must preserve registers, because the compiler doesn't check. For example:

int func(int num1)
{
   int num2 = 10;
   __asm {
      push eax
      mov eax, _num1
      add eax, _num2
      mov _num1, eax
      pop eax
   }
   return num1;
}

Note again the underscores when referring to C symbol names. You can call a C function from within assembler thus:

   /* printf(message); */
   push _message
   call _printf
   add esp, 4

The "add esp, 4" undoes the pushing of the argument on the stack (one argument, a pointer, and thus 4 bytes = 32 bits.) You could "pop something" but we don't want to put it in a register so we discard it by simply readjusting the stack pointer to put it back where it was before we called the function.

Hope this helps. Check out your compiler's manual for the exact syntax of inline assembler. Most assembers accept

_asm mov eax, _something
or
__asm mov eax, _something

for one-liners, otherwise curly brackets or parentheses like this:

__asm {

}

__asm (

)

(one or two underscores for the asm keyword, or maybe no underscores.)

Hope this helps.
kernel_journeyman

Re:C Language

Post by kernel_journeyman »

Correction:

extern add3(int, int, int);

should be:

extern int add3(int, int, int);

The accumulator for returning values in assembler is, of course, the eax register.

Also:

int func(int num1)
{
int num2 = 10;
__asm {
push eax
mov eax, _num1
add eax, _num2
mov _num1, eax
pop eax
}
return num1;
}

Is a really bad example, sorry about that. (Can you spot the error?)

Answer: arguments are passed by value. The assembler is attempting to add num2 to num1 and return num1. num1 won't be modifying the original num1, and it trashes the return value by restoring eax. A better example:

int func(int num1)
{
int num2 = 10;
__asm {
mov eax, _num1
add eax, _num2
}
/* return statement redundant, return value already in eax */
}

Because eax is used to return values, functions do not have to preserve the eax register (the caller expects it to be used for the return value. So only other registers need to be preserved.)
kernel_journeyman

Re:C Language

Post by kernel_journeyman »

Also, when doing assembler inside C functions, the compiler generates the necessary boilerplate code:

   push ebp
   mov ebp, esp

   and

   mov esp, ebp
   pop ebp
   ret

And hence you don't need this in your assembler inside a C function:

int func() {
   /* No boilerplate needed in asm below */
   __asm {

   }
}
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:C Language

Post by Candy »

kernel_journeyman wrote: int func(int num1)
{
int num2 = 10;
__asm {
push eax
mov eax, _num1
add eax, _num2
mov _num1, eax
pop eax
}
return num1;
}

Is a really bad example, sorry about that. (Can you spot the error?)

Answer: arguments are passed by value. The assembler is attempting to add num2 to num1 and return num1. num1 won't be modifying the original num1, and it trashes the return value by restoring eax.
Did you try this code?

On most common computers (X86) the parameters are passed on the stack, not in registers, and most certainly not in eax. Saving eax is useless, but not wrong. Restoring it doesn't make a difference, you just restore a register with an unknown value which the compiler then overwrites with the num1 value again...


PS: the above posts assume you do not use GCC. GCC doesn't allow this simple inline assembly, but requires you to specify what it does (wrecks, gets, puts back). It then copies the contents verbatim (bit-for-bit) and assumes what you told is the truth. It also uses AT&T syntax, whereas the above is Intel-style.
kernel_journeyman

Re:C Language

Post by kernel_journeyman »

On most common computers (X86) the parameters are passed on the stack, not in registers, and most certainly not in eax. Saving eax is useless, but not wrong. Restoring it doesn't make a difference, you just restore a register with an unknown value which the compiler then overwrites with the num1 value again...
I think I mentioned that parameters are passed on the stack. But yes, I should have pointed out that this inline assembler is not for gcc. It would work with Borland C++ or Visual C++ however. Also, the __fastcall calling style passes parameters in registers, not on the stack (Borland, maybe called something else for Visual C++.)

The above code is wrong, not merely useless. And I think I also said that the return value is in eax, not a parameter. The corrected version fixes the problem of the return value in eax being overwritten.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:C Language

Post by Candy »

kernel_journeyman wrote: I think I mentioned that parameters are passed on the stack. But yes, I should have pointed out that this inline assembler is not for gcc. It would work with Borland C++ or Visual C++ however. Also, the __fastcall calling style passes parameters in registers, not on the stack (Borland, maybe called something else for Visual C++.)
fastcall uses only ecx and edx for the first two parameters and uses the stack for the rest, just like usual.
The above code is wrong, not merely useless. And I think I also said that the return value is in eax, not a parameter. The corrected version fixes the problem of the return value in eax being overwritten.
The return statement comes after the pop eax, so it's not overwritten. It's just a little ... excessive. You save the old value (not used), then do calculations, then store your result in _top1 (probably at ebp+8), then restore the old value, then the compiler inserts another mov eax, _top1. It does work, point one. The second doesn't put the result in top1, so the compiler-generated return-code will make the function malfunction. I don't have a windows compiler handy to try it out with.

PS: all unix compilers also leave out that excessive & confusing _.
vibhory2j

Re:C Language

Post by vibhory2j »

hi ,

thanks for ur replies.but i am new to all this.i just want a tutorial or a reference link which could give me basic knowledge of C Assembly or linking C and Assembly.

cheers
kernel_journeyman

Re:C Language

Post by kernel_journeyman »

vibhory2j wrote: hi ,

thanks for ur replies.but i am new to all this.i just want a tutorial or a reference link which could give me basic knowledge of C Assembly or linking C and Assembly.

cheers
How about this:

gcc -o exename cfile.c asmfile.S

There. C and assembler linking. ;D
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:C Language

Post by Candy »

quick connection between assembly and C:

This overview assumes that you use a form of C compiler not prepending underscores and that spits out ELFs by default (say, unix-gcc)

assembly file:

GLOBAL something
something:
mov eax, 01h
ret

C file:

int something();

int main() {
printf("%d", something());
}


compile using:

cc -c -o file.o file.c
nasm -f elf -o asmfile.o asmfile.asm
ld -o test file.o asmfile.o


That's the end of the simple overview. This idea can of course be extended in any direction, which would all make it more or less more complex without reason.
Schol-R-LEA

Re:C Language

Post by Schol-R-LEA »

Perhaps we need to explore the question a bit more.
  • How well do you know C, and how well do you know assembly language? Do you need any help on the languages themselves?
  • What is your overall goal? Do you have a particular program that you are writing, or is this for general programming knowledge? If you have a particular program, why does it require you to interface C and assembly, and are there any alternatives?
  • What combination of platform, OS, compiler, assembler, and linker are you using?
  • Which of these are you specifically trying to do:
    • use inline assembly in C
    • call an assembly language routine in C
    • call a C function in assembly
    • write an assembly-language glue function to interface C with another high-level language
  • Do you know if it requires a special-case calling convention (i.e., fastcall)?
vibhory2j

Re:C Language

Post by vibhory2j »

thanks for ur reply,

*it has been more than 3 years i am doing C/C++ and i am learning assembly language and soon i will become good in it.no i don't think i need much help in these.

*my overall goal? i want to develope a small operating system....thats' it.
*what combination of platform,OS,compiker,assembler......
i am having both Linux and windows,gcc,NASM.

*Which of these functions are you specifically trying to do:
well i want to do all of these.

*do you know if it requires a special case caliing conventions?
can't say anything about this...
Schol-R-LEA

Re:C Language

Post by Schol-R-LEA »

OK, that gives us something to work with. From this I'll assume GCC and GAS for inline assembly (NASM isn't an option with gcc inline, I'm afraid), and base GCC parameter passing and NASM for the externally linked code, with not OS-specific factors (at least not until you define any).

Here are some references to start with:

GAS assembly syntax
OS-FAQ: How At&t Syntax Differs From Nasm
Linux Assembly HOWTO

inline asembly
OS_FAQ: Inline Assembly in GCC
GCC-Inline-Assembly-HOWTO
Linux Assembly Language HOWTO chapter 2[/url]
DJGPP Inline Assembly HOWTO - while it is mostly for DJGPP, it should mostly apply to gcc in general
Inline assembly for x86 in Linux - again, most of this applies to gcc inline code for the x86 in general

gcc calling conventions

DJGPP FAQ: GCC calling conventions
Linux Assembly Language HOWTO chapter 5
NASM ASSEMBLER & COMPILE WITH GCC (PowerPoint; see here for the HTML version)

I would admonish you to work on you research-fu. Not only were two of these issues addressed in the OS FAQ (I was surprised to find that the GCC calling convention wasn't covered, and will see to redressing that oversight RSN) all of the external links can be trivially found with a few Google searches. You may want to search the message board archives, as well, as there are several places where this is discussed. Furthermore, your original question was poorly worded and your response to the answers petulant and unhelpful. The sages have spoken extensively on this subject, especially Master Esr of the Linux school; for your own sake and that of the forum, please meditate upon the wisdom of their words. ;)
Schol-R-LEA

Re:C Language

Post by Schol-R-LEA »

And in case those aren't enough, here's a review:

For inline assembly code, the first thing to kep in mind is that, by default, gcc expects what is called 'AT&T syntax', which is quite different from the 'Intel syntax' that NASM and most other x86 assemblers use. The most important differences here are that all arguments are "source, destination" instead of "destination, source" as in the Intel syntax, that register names are prefixed with a percent sign, and that the indirection syntax is rather different; see the OS-FAQ for more details.

for inline assembly gcc supports two forms, the C99 standard form,

Code: Select all

asm("<one line of GAS assembly code here>"
             "<another line of assembly code>"
              ... );
for example:

Code: Select all

asm("movl %ecx %eax");

 __asm__ ("movl %eax, %ebx\n\t"
          "movl $56, %esi\n\t"
          "movl %ecx, $label(%edx,%ebx,$4)\n\t"
          "movb %ah, (%ebx)");
(examples taken from GCC-Inline-Assembly-HOWTO)

The '__asm__' version is an alternate form to avoid naming comflicts.

GCC also has an extended form which can pass arguments to the assembly code,

Code: Select all

asm ("<one or more instructions with parameters>" 
         : "<optional list of output parameters>"
         : "<optional list of input parameters>"
         : "<optional list of clobbered registers>");
Again, as example would be

Code: Select all

       int a=10, b;
        asm ("movl %1, %%eax; 
              movl %%eax, %0;"
             :"=r"(b)        /* output */
             :"r"(a)         /* input */
             :"%eax"         /* clobbered register */
             );    
in which the '"=r"(b)' means 'replace %0 with any available register, and put it's final value into the variable b', while the '"r"(a)' says, 'replace %1 with any available register, and initialize it with the value of the variable a'. The last line says, 'I am going to use the %eax register, so save it it you're going to need it's current value later on'. Note also that the register names are prefixed by two percent signs, to differentiate them from the parameters.
Schol-R-LEA

Re:C Language

Post by Schol-R-LEA »

For both calling C from assembly and calling assembly from C, the main issue is that the assembly code has to honor the C compiler's sacred registers, and use the compiler's parameter passing rules. In gcc, the sacred registers are EBX, ESI, EDI, EBP, and the segment registers DS, ES, and SS. If you use them, you must save them first and restore them afterwards. Conversely, EAX and EDX are used for return values, and thus are assumed to be 'profane'. The other registers do not have to be saved, but usually should be.

GCC passes parameters to a function by the stack, and are pushed in reverse order from their order in the argument list; thus, if a function has a C prototype of

[tt]int foo(int bar, char baz, double quux); [/tt]

then the value of quux (a 48-bit FP value) is passed first as two doublewords, low-dword first; the value of baz is pushed as the first byte of in doubleword; and then finally bar is passed as a doubleword. Note that the stack operations in protected-mode always operate on doubleword (32-bit) values, even if the actual value is less than a full doubleword.

The first thing any assembly function that is to be called by a C function must do is push the value of EBP (the frame pointer of the calling function), then copy the value of ESP to EBP. this sets the frame pointer, which is used to track both the arguments and (in C, or in any properly reentrant assembly code) the local variables. To access the arguments, you need to use the EBP, minus an offset equal to 4 * (the place of the variable in the order of the arguments + 2). The +2 is an added offset for the saved EBP and the caller's return pointer. Thus, to move 'bar' into EAX and 'baz' into BL, you would write (in NASM):

Code: Select all

mov eax, [ebp + 8]  ; bar
mov bl, [ebp + 12] ; baz
As stated earlier, return values in GCC are passed using EAX and EDX. If a value exceeds 64 bits, it must be passed as a pointer.

As with any case where you link separate modules, you must use the appropriate 'global' and 'external' ('extern' in C) declarations and function prototypes in order for the linker to join the modules properly.
Post Reply