Note that it is also the case that most ISAs' immediate addressing modes can only load a value that can fit into a single general register, and often is limited to one significantly smaller - for 32-bit MIPS, for example, immediate values cannot be larger than two bytes - half of one 32-bit instruction - as it has to fit into the same instruction as the opcode proper. The usual workaround in MIPS - where the 'load immediate' is actually a pseudo-instruction for 'OR immediate against the Zero register' - is to load the upper part of a larger value, then perform a 16-bit left shift, then OR in the rest of the large immediate value; the pseudo-instruction handling in the assembler is smart enough to emit this whenever 'li' is used with a larger value. I believe that it is similar on ARM, which optimizes it a little - all instructions have an optional shift operand, which means that the first and second operation can be done in one instruction.
Just a note. I don't think ARM is better here. MIPS has managed to have a 16-bit immediate directly encoded in one instruction without stupid trickery, with the same 4 byte length, still having twice as much registers! What arm does is just pfff, it's brain damaging. It has only 12-bits for the encoded value and rotation (8+4). All those rotations mean it only can load a subset of values (but with encoding redundancy), and this turns assembly prigramming into something not as enjoyable as one could think of. So on arm, often you still would need a pair of movt/movw (or literal pool trickery which is even more "fun") to load immediates and addresses. interesting thing about it is that, you will know what exactly you need either after having to do this rotations manually every time or walking through assembler complains.
And next, ARM has more limited offset field length, which result in an even more brain damaging thing, - those infamous "literal pools". I was amazed how much easier all these things are on MIPS compared to ARM, despite they are almost twins.
what I mean about offsets. Suppose you have somewhere a symbol gMySymbol and you want to read from it into a register. The same applies to loading immediates into a register. On mips you just do:
Code: Select all
/* for loading an address, for farther ordinary variable manipulations */
la $t0, gMySymbol /* this might result just in one istruiction if the addr of gMySymbol is 16-bit aligned */
lw $t1, 0($t0)
/* for loading an immediate */
li $t0, MY_IMMEDIATE
it's the best, it's all needed on a load/store risc architecture. You have a chance to load a 32-bit address even in one instruction if you are lucky to have it 16-bit aligned or immdeiate if it fits into 16 bit (taking in account possible sign extension, so, for example -1 will fit into 16 bit). or two maximum. the la peudoinstruction is rolled into either
or
Code: Select all
lui $t0, %hi(gMySymbol)
ori $t0, $t0, %lo(gMySymbol)
li differs only in sign extension treatment, as it is intended for immediates, it can roll out into sign aware instructuions where it's possible, for minimizing the instruction number needed.
But on arm, with that pc-related addressing, and very limited offsets, you have something like that. First, your symbol is lying somewhere outside of your code section, obviously, it's data. So, at the end of your code section or even not at the end, you, manually, or your compiler, put an indirect pointer to your symbol into a "literal pool". which is a distinguishing arm "feature" capable not only conceptually mess up all things around and add an additional level of indirection, but also, well capable to mislead CPU's branch predictor, as an interesting side effect, because BP thinks it's instructions whereas they are not. Obviously, it's not as easy to follow arm recommendations to not make your data in a literal pool "look like jumps".
so you put this:
Code: Select all
LITERAL_POOL_ITEM: .long gMySymbol @ that's right, a local, near pointer to gMySymbol, for the cpu to reach it with the limited pc-addressing
and then do:
Code: Select all
LDR r0, [pc, #(LITERAL_POOL_ITEM - . - 8)] @the arithmetics inside is yet another fun stuff of arm.
LDR r1, [r0]
an immediate, "label" as arm calls it, even though it's an offset from the current instruction to the literal pool label, its width is limited to 12 bits. So only -4096/4095 bytes from the curret location can the symbol be placed to, thus the need to have a literal pool. indeed, even not a size limitation is a problem, rather the whole idea - there is no possibilty to know where the symbol ends up in a resulting section, it's just a different section, with this approach you need to put indirection pointers in the code section as this is the only way to know the offset to it (offsets to symbols from different sections will be resolved only at the link time, and most probably will not fit into 12-bit limit).
With immediates, only if your immediate can be encoded with rotations, you end up with 1 instruction. otherwise you need either movt/movw pair or literal pool indirection.
As to me, the arm approach sucks compared to the ellegant mips one.
However, of course, there is possibility to manually recreate mips-like behavior on arm, in this example - you just use movt/movw pair for loaing the address of your varibale you want to reach to (or an immediate). but it's you doing this manually, what the compiler does, is up to it. And, judging by the arm documentation, they think placing a literal pool item is a better choice. They have LDR pseudo-instrcution (yes, LDR could be both a pseudoinstruction and instruction), which deals with all the hassle of this, for an ordinary assembly writer (above, I did it manually to show how it works, there, LDR is an instruction), and this is what they write about its preferences:
armasm doc wrote:
When using the LDR pseudo-instruction:
• If the value of expr can be loaded with a valid MOV or MVN instruction, the assembler uses that
instruction.
• If a valid MOV or MVN instruction cannot be used, or if the label_expr syntax is used, the assembler
places the constant in a literal pool and generates a PC-relative LDR instruction that reads the constant
from the literal pool.
Note
— An address loaded in this way is fixed at link time, so the code is not position-independent.
— The address holding the constant remains valid regardless of where the linker places the ELF
section containing the LDR instruction.
The assembler places the value of label_expr in a literal pool and generates a PC-relative LDR
instruction that loads the value from the literal pool.
If label_expr is an external expression, or is not contained in the current section, the assembler places a
linker relocation directive in the object file. The linker generates the address at link time.
Only immediate loading could take advantage of rotational encoding resulting in a 1 instruction. loading an address of your variable to read it or write, always results in an additional level of indirection through literal pool items. because arm dislikes movt/movw pair usage. What's better to have?
1 or 2 instructions not touching memory ("lui" or "lui/ori"), mips; "movt/movw" arm, but, articificially, in the arm case there is no possibilty to pick just a "movt" if it fits 16 bit - assembly follows ARM preference of literal pools and doesn't care about a nice behaving "la/li" analog. i forgot, there is "mov32" pseudo-instruction, but it always generates movt/movw pair, unlike mips's la/li. because lui zeroes lower 16-bits of the destination register, and movt doesn't.
or
1 memory touching instruction (ldr rX, [pc, #offset]), arm
at least at the ideological level and gastroenterological as well, the mips approach seems to be cleaner.