Page 1 of 1

16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 4:39 pm
by JGDross
Hi there,

Sorry if this has been covered before - a search hasn't revealed anything...

Anyway, I'm using the GNU Assembler (2.18.39, if it makes any difference...) and am writing a basic bootloader. The code I'm writing at the moment is still in real mode, so (obviously...) I'm using 16-bit addressing. Instructions like "movw (%bx), %ax" work fine, but gas complains when I try and use 16-bit registers for more complicated addressing, such as:

Code: Select all

movb  (%bx,%cx,1), %al
When I try and do this, it refuses to compile and spits out:

Code: Select all

boot.s:74: Error: `(%bx,%cx,1)' is not a valid base/index expression
Instead, it forces me to (i.e. I can hack around it if I...) use the 32-bit registers, so the following:

Code: Select all

movb  (%ebx,%ecx,1), %al
works as expected. The start of my file looks like:

Code: Select all

.file "boot.s"

.code16
.arch i386

.section .text

.globl _start
_start:
# actual code...


Can anyone help me out? Sorry if this is a total n00b question, but I just can't see how this makes sense!!!

Re: 16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 5:02 pm
by Love4Boobies
I am only familiar with NASM and Intel assembly syntax. What exactly do you expect "movb (%bx,%cx,1), %al" to do?

Re: 16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 5:48 pm
by JGDross
Love4Boobies wrote:I am only familiar with NASM and Intel assembly syntax. What exactly do you expect "movb (%bx,%cx,1), %al" to do?
I want it to move the value at the address (bx register + (cx register * 1)) into the al register.

Just for clarity, gas - intel translator gives that instruction as:

Code: Select all

    mov   al, [bx+cx*1]
Which looks like the same thing to me...?

Re: 16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 5:54 pm
by CodeCat
Indirect addressing is severely limited in real mode. Given the format disp(base, index, scale) you can only use SI, DI, BX and BP as the base, and when using an index, the index can only be BX or BP and then the base must be SI or DI. This also explains why BX and BP are called 'base register' and 'base pointer'.

What this means is that only the following combinations are possible:

disp(%si)
disp(%di)
disp(%bx)
disp(%bp)
disp(%si, %bx, scale)
disp(%si, %bp, scale)
disp(%di, %bx, scale)
disp(%di, %bp, scale)
disp(, 1)

Another (obvious) requirement is that disp must be a 16 bit value. The disp part is optional, but I'm not sure if the base is too. You could always try and see what happens.

Re: 16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 6:12 pm
by JGDross
CodeCat wrote:Indirect addressing is severely limited in real mode. Given the format disp(base, index, scale) you can only use SI, DI, BX and BP as the base, and when using an index, the index can only be BX or BP and then the base must be SI or DI. This also explains why BX and BP are called 'base register' and 'base pointer'.

What this means is that only the following combinations are possible:

disp(%si)
disp(%di)
disp(%bx)
disp(%bp)
disp(%si, %bx, scale)
disp(%si, %bp, scale)
disp(%di, %bx, scale)
disp(%di, %bp, scale)
disp(, 1)

Another (obvious) requirement is that disp must be a 16 bit value. The disp part is optional, but I'm not sure if the base is too. You could always try and see what happens.
If you're saying what I think you're saying, and you're correct, then the following code:

Code: Select all

    movb  $0x0a, %gs:1(,%ecx,2)
surely shouldn't work in real mode with the %ecx register as the index. But it does exactly what you'd expect it to do (I just cut it from my bootloader, and it definitely works as you'd expect). Or am I misunderstanding you?

Re: 16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 6:19 pm
by CodeCat
It's possible that the assembler just emits an operand size prefix, which switches from 16 bit to 32 bit registers for that instruction. Maybe this ends up allowing you to use the protected mode format for the instruction. This would explain as well how GAS can assemble GCC's 32 bit output as 16 bit code without complications.

However, another possibility could be that GAS simply doesn't know how to handle 'real' 16 bit stuff at all. It goes nuts when it sees that you're using a 16 bit register as a base, which I believe would not be allowed in 32 bit mode. Just guessing though.

Re: 16-bit segment:offset addressing with AT&T syntax (gas)

Posted: Wed Nov 12, 2008 7:22 pm
by CodeCat
Ok, I've looked into it further and I wasn't far off. Here's an explanation for those interested (possible wiki material?):

When encoding operands, normally the instruction itself decides what kind of data must follow it. In many cases, the instruction specifies that a byte called the ModR/M byte is used, which is used for both register and memory access (hence the R/M). This byte is split up into three fields:

MMSSSRRR (M = mod, R = r/m field, S = spare register)

Register codes for the r/m and spare register fields:
000 = al, ax, eax, mm0, xmm0
001 = cl, cx, ecx, mm1, xmm1
010 = dl, dx, edx, mm2, xmm2
011 = bl, bx, ebx, mm3, xmm3
100 = ah, sp, esp, mm4, xmm4
101 = ch, bp, ebp, mm5, xmm5
110 = dh, si, esi, mm6, xmm6
111 = bh, di, edi, mm7, xmm7

The use of the r/m fields and spare register depend on the instruction opcode itself. For example, one form of the MOV instruction uses the r/m field as the destination and the spare as the source, while another form uses the opposite. Which of the 5 in each row is used is also determined by the instruction opcode. So if the instruction says to use 8-bit registers, then the first of each row is used.

So far, all of this only applies when mod=11. If it is 00, 01 or 10, it specifies that memory addressing is used, and the r/m field encodes the addressing mode to use. The spare always indicates a register, never a memory addressing mode, which also explains why it's not possible to use a memory reference as both the source and destination operands. (I believe it's also a hardware limitation because the CPU can't do two fetches from memory at the same time.)

When mod is not 11, the following possibilities for the r/m field exist, for 32 bit and 16 bit mode respectively:

000 = [eax], [bx+si]
001 = [ecx], [bx+di]
010 = [edx], [bp+si]
011 = [ebx], [bp+di]
100 = SIB byte, [si]
101 = [ebp], [di]
110 = [esi], [bp]
111 = [edi], [bx]

The mod field then determines the size of an additional displacement field following the ModR/M byte. Mod=00 means no displacement field, mod=01 means a byte displacement, mod=10 means a native-size displacement (16 bits in 16 bit mode, 32 bits in 32 bit mode). A special exception is that when mod=00, then the codes for [ebp] and [bp] do not encode these registers, but instead specify that a native-size displacement should be used without a register at all. A consequence is that if [ebp] or [bp] are really needed, then a byte-size or native-size displacement must always be added, even if it's zero.

The SIB byte is an additional byte that encodes Scale, Index register and Base register, and can only be specified in 32 bit mode. It is made up as follows:

SSIIIBBB (S = scale, I = index register, B = base register)

The registers are specified as listed above, and the scale is specified as a power of 2, so that scale=3 (11) means 2^3 = 8.

The SIB byte allows total freedom along with the existing combinations in 32 bit mode, but since it does not exist in 16 bit mode, only a limited amount of combinations are possible (in particular, no scale). However, an operand size prefix can override the default size, and this makes SIB bytes possible even when running in 16-bit mode. The catch is of course that this only works when running on a 386 or later, since the 286 did not have 32 bit mode at all and therefore also lacks the operand size prefix. And you must also use 32 bit registers, SIB with 16-bit registers is not possible.