OSDev.org

Posted: **Mon Mar 30, 2015 1:57 pm**

Hi,

I would like to analyze and understand Linux kernel source code. I've downloaded the source code of the early Linux versions and a lot of code is more or less straightforward, however I got stuck trying to understand the pieces involving bunches of assembly instructions. One of such instructions is set_gate (seems like it's one of the very crucial kernel routines):

Code: Select all

#define _set_gate(gate_addr,type,dpl,addr) \
__asm__ ("movw %%dx,%%ax\n\t" \
	"movw %0,%%dx\n\t" \
	"movl %%eax,%1\n\t" \
	"movl %%edx,%2" \
	: \
	: "i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
	"o" (*((char *) (gate_addr))), \
	"o" (*(4+(char *) (gate_addr))), \
	"d" ((char *) (addr)),"a" (0x00080000))

So could you explain what is going on here?
What I understand (not sure if correctly) from above:
movw transfers a word from a register to a register (not sure which one is the source register and which one is the target one).
movl trasfers dword from a register to a register (the same doubt as above).
Why something is transferred from dx to ax (or reverse?) I have no idea. I don't understand what's the purpose of this operation as it is not known what is in any of these registers before the transfer is executed.
What does %, %% mean - I also don't know. What is %1 and %2 I have no idea at all.
And as to the fragment starting with : \ , I completely don't understand what is done there.
If you could make these points clearer, it would be much appreciated.

Posted: **Mon Mar 30, 2015 3:22 pm**

Hi,

michaelbarrett wrote:

Code: Select all

#define _set_gate(gate_addr,type,dpl,addr) \
__asm__ ("movw %%dx,%%ax\n\t" \
	"movw %0,%%dx\n\t" \
	"movl %%eax,%1\n\t" \
	"movl %%edx,%2" \
	: \
	: "i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
	"o" (*((char *) (gate_addr))), \
	"o" (*(4+(char *) (gate_addr))), \
	"d" ((char *) (addr)),"a" (0x00080000))

On entry, EAX = 0x00080000 and EDX = the address for the gate. The first instruction moves the lowest 16 bits of the address for the gate into EAX. The second instruction moves "(0x8000+(dpl<<13)+(type<<8)" into the lowest 16 bits of EDX.

The next 2 instructions just store the values of EAX and EDX at the address of the gate.

To understand this, you probably need to understand the "messy" layout of a gate descriptor in the CPU's IDT. You can find the format for this in the Intel manual, or online in various places (e.g. here). You'll see how the 32-bit address/offset is split into 2 different 16-bit fields.

Cheers,

Brendan

Posted: **Mon Mar 30, 2015 3:38 pm**

First of all, you have to be familiar with Intel assembly language and with how the processor works and what the various instructions do. The Intel Programmer's Manuals are a must read.

Secondly, you need to read about inline assembley: http://www.ibiblio.org/gferg/ldp/GCC-In ... HOWTO.html

Also note that there are two syntaxes commonly used for Intel assembly language. The example that you quote, and almost all - if not all - inline assembly uses the AT&T syntax. In this case when an instructions takes two operands the first is the source and the second the destination. So:

mov %eax, %edx

means "copy the value currently in register eax to register edx".

Posted: **Tue Mar 31, 2015 12:16 am**

Still it's IMHO a bit strange that they use this hand-crafted assembly stuff instead of just doing this in pure C. Well, the Linux kernel has lots of hand-crafted optimizations and maybe this is actually faster that doing it in pure C, but it's really hard to see.

Posted: **Tue Mar 31, 2015 2:02 am**

XenOS wrote:Still it's IMHO a bit strange that they use this hand-crafted assembly stuff instead of just doing this in pure C. Well, the Linux kernel has lots of hand-crafted optimizations and maybe this is actually faster that doing it in pure C, but it's really hard to see.

The clue is in the OP:

I've downloaded the source code of the early Linux versions

Posted: **Tue Mar 31, 2015 1:01 pm**

Ok, thanks to all of you, I'm starting to make sense out of it.
However, I'm still in need of help.

http://stackoverflow.com/questions/1474 ... e-assembly

"GCC inline assembly uses %0, %1, %2, etc. to refer to input and output operands. That means you need to use two %% for real registers."

So as far as I understand double percent signs are there in order to refer (or make sure that we refer to - about that I am not sure) to processor's registers. So it goes like that:

movw %%dx,%%ax\n\t - copy the word currently in the register dx to the register ax.
movw %0,%%dx\n\t - copy the word currently in the operand 0 to the register dx.
movl %%eax,%1\n\t - copy the dword currently in the register eax to the operand 1.
movl %%edx,%2 - copy the dword currently in the register edx to the operand 2.

I'm not yet sure what exactly these operands are about but I've found the following site:

http://locklessinc.com/articles/gcc_asm/

I haven't managed to go through the whole document but from what follows:

"A simple function using inline-assembly might look like:

static __attribute__((used)) int var1;
int func1(void)
{
int out;
asm("mov var1, %0" : "=r" (out));
return out;
}

The above shows several features of gcc's interface. Firstly, the asm code is a compile-time C constant string. You can put anything you like within that string. GCC doesn't parse the assembly language itself. What it does do is use escape sequences (i.e. %0 in the above) to reference the interface described by the programmer. In this case %0 corresponds to the zeroth constraint, which in turn is described after the colon."

I conclude that the operands (%0, %1, %2) in set_gate macro codes are somehow defined in turn by what follows after colon. The operand zeroth has value 0x8000+(dpl<<13)+(type<<8), what would fit Brendan's explanation. About what is going on futher I am a little bit confused - why is o defined (?) twice? %2 is the first o, the second o or d?

And one of more things confusing me:

Brendan wrote:On entry, EAX = 0x00080000 and EDX = the address for the gate

Do you mean that it is such before set_gate is called? How do you know it?

Posted: **Tue Mar 31, 2015 1:31 pm**

Hi,

michaelbarrett wrote:And one of more things confusing me:

Brendan wrote:On entry, EAX = 0x00080000 and EDX = the address for the gate
Do you mean that it is such before set_gate is called? How do you know it?

The input parameter list includes ""a" (0x00080000)"; which basically means that EAX is an input parameter and its value is the constant 0x00080000 (and forces the compiler to make sure EAX is set to 0x00080000 before the assembly begins).

Cheers,

Brendan

OSDev.org

Low level routines (like set_gate) in Linux kernel

Low level routines (like set_gate) in Linux kernel

Re: Low level routines (like set_gate) in Linux kernel

Re: Low level routines (like set_gate) in Linux kernel

Re: Low level routines (like set_gate) in Linux kernel

Re: Low level routines (like set_gate) in Linux kernel

Re: Low level routines (like set_gate) in Linux kernel

Re: Low level routines (like set_gate) in Linux kernel