Endianness and instruction encoding
Posted: Tue Feb 17, 2015 11:22 pm
Hi all, long time lurker. I'm mucking around with a OS-less FORTH for ARM. While making the internal assembler I started playing around with ARM's variable endianness (call it perversity on my part) and I'm getting stuck conceptually here. The chip manual says that the endianness settings "do not affect instruction fetches" and that "the mapping of instruction memory is always little-endian", but it's not clear to me what that means exactly.
To take an simple instruction example, BKPT (breakpoint) is 11100001001000000000000001110000. Now, as an instruction/bit pattern, that doesn't really have any "endian-ness" to speak of (endianness being how you interpret a word as an integer, and an instruction isn't interpreting a word as an integer -- the intra-instruction "immediate number" formats are all defined already). (Also, I know the Thumb instruction set is a weird half-word invariant setup, but I'm not using T or TEE so we can ignore it for now.)
So, if for whatever reason I had to write a BKPT instruction byte by byte starting at A, is the point that I would need to write
01110000 at A
00000000 at A + 1
00100000 at A + 2
11100001 at A + 3?
And that if I stored 0b11100001001000000000000001110000 or 0xE1200070 at A in the little-endian regime it will "just work", whereas if I'm big-endian at the time of the store I will need to byte-reverse the word (so that I would literally be writing 0x700020E1?)
To take an simple instruction example, BKPT (breakpoint) is 11100001001000000000000001110000. Now, as an instruction/bit pattern, that doesn't really have any "endian-ness" to speak of (endianness being how you interpret a word as an integer, and an instruction isn't interpreting a word as an integer -- the intra-instruction "immediate number" formats are all defined already). (Also, I know the Thumb instruction set is a weird half-word invariant setup, but I'm not using T or TEE so we can ignore it for now.)
So, if for whatever reason I had to write a BKPT instruction byte by byte starting at A, is the point that I would need to write
01110000 at A
00000000 at A + 1
00100000 at A + 2
11100001 at A + 3?
And that if I stored 0b11100001001000000000000001110000 or 0xE1200070 at A in the little-endian regime it will "just work", whereas if I'm big-endian at the time of the store I will need to byte-reverse the word (so that I would literally be writing 0x700020E1?)