Direct memory addressing VS. variable addressing in C.

01000101 · Post by **01000101** » Mon Mar 03, 2008 10:31 pm

I have been working on a way to compress my driver data into memory areas I call apartments & buildings. There is one building per device type, but there can be multiple apartments within the building (multiple device of the same type). The 'type' of driver allocates a certain amount of memory space that I have calculated for it to need, and if multiple deviced of the same 'type' are found, there then becomes two apartments but within on building per'se.

I have also specified the location of variables for each apartment aligned with the amount of space allocated per apartment.

say each device requires 16k of memory, then the second device would start initializing variables at offset 16k, and then allocate another 16k for its needs.

my main question is: with this kind of variable assignment that must be aligned and precise, should I just access that memory via something like *(unsigned char*)0x12345678, or would it be just as efficient to declare a variable, then re-assign its memory position to suit the memory map?

I personally think, cutting out the variables would cutout some overhead.

speal · Post by **speal** » Tue Mar 04, 2008 12:50 am

Barring optimization and other goofy things I can't address without seeing all the code:

*(unsigned char*)0xABCDEF = 12345;

will result in the same general machine instructions as:

unsigned char* address = 0xABCDEF;
*address = 12345;

You may save yourself some headaches if you stick things in variables. A compiler, even with optimizations turned off, will handle both of these in very similar (or identical) ways.

I hope I got the gist of the problem..

Edit: kind of a silly mistake:
mov [0xABCDEF], %rax
vs.
lea %rax, [address wrt %rip]
mov [%rax], 12345

The first line would be possible (and 1 cycle faster I believe), but fixing the address of everything in your kernel seems like a bad idea. You'd end up with the second example if you want to support a variable number of devices (like an array), and I assume you do.

JackScott · Post by **JackScott** » Tue Mar 04, 2008 1:20 am

Using the preprocessor may also be possible in this situation. Save the cycle, save a variable, and save the day?

exkor · Post by **exkor** » Tue Mar 04, 2008 1:49 am

speal wrote: mov [0xABCDEF], %rax
vs.
lea %rax, [address wrt %rip]
mov [%rax], 12345

in LongMode

Code: Select all

 mov [imm], r0-r15 ;7 bytes, write to mem on static addr

 ;you'll need to declare additional variable probaby? :
 ;15 bytes, 11bytes for dword
 var dq 0
 mov [var], rax
 
 ;7bytes also
 lea rax, [rcx+127]
 mov [rax], rax

 ;6 bytes
 lea eax, [rcx+127]  ;if rcx replaced with r8-r15 then +1 byte
 mov [rax], rax

using offset higher than +/-127 will add 3 more bytes
using r8-r15 in "mov [rax], reg64" instead of rax will add 1 more byte
so 11 bytes lea+mov in worst ase scenario

I think doing such optimization in C is pointless.

01000101 · Post by **01000101** » Wed Mar 05, 2008 4:53 pm

Also, this may be a very basic ASM question, but when performing a mov such as:

Code: Select all

mov byte [0x00500000], byte 0xFF;

no registers are altered correct?
wouldn't that be another advantage over the other method using variables or registers to store the value?

exkor · Post by **exkor** » Wed Mar 05, 2008 5:40 pm

01000101 wrote: no registers are altered correct?

yes, that's advantage
the only disadvantage of such method is that you can't relocate your data block dynamically other than using paging I guess.

JamesM · Post by **JamesM** » Thu Mar 06, 2008 4:13 am

I personally think, cutting out the variables would cutout some overhead.

Let the compiler deal with it. Internally the compiler creates temporary variables all over the shop, for example:

Code: Select all

a = *(unsigned int*)0x1000;

Would internally turn into:

Code: Select all

unsigned int *tmp = 0x1000;
a = *tmp;

Similarly for arithmetic operations:

Code: Select all

unsigned int a = b + c - (d+e);

->

Code: Select all

unsigned int a = b + c;
unsigned int tmp = d + e;
a = a - tmp;

There is a name for the form the compiler generates, but it escapes me for the moment. The idea is there is only one operation per line, which makes assembly code easier to generate.

Once that form has been created, assembly code is created and optimisations take place - where to store each temporary - register? stack? can two instructions be merged because of complex addressing modes? etc.

So really the variable declarations in your C code have absolutely no correlation with what the compiler outputs (on anything over -O0).

Also, this may be a very basic ASM question, but when performing a mov such as:
Code: Select all
mov byte [0x00500000], byte 0xFF;
no registers are altered correct?
wouldn't that be another advantage over the other method using variables or registers to store the value?

Whether the code you give is better or worse than two instructions that achieve the same result depends wholly on the processor in question.

On the one hand, the entire operation is achieved in one instruction. Which is good.

On the other hand, that instruction might be heavily microcoded, which slows things down (remember that not all instructions in CISC architectures are as heavily optimised - ones that compilers use get optimised more).

The instruction you give (a store immediate) is so common that I would personally consider it more efficient than a register-move, register-store. However also bear in mind that a store immediate instruction takes up more space than a register store, so if you're using the same constant over again I would reccommend storing it temporarily somewhere.

bewing · Post by **bewing** » Thu Mar 06, 2008 1:01 pm

Yes, basically you are turning variables into #define or EQU statements. It is certainly best to let the compiler/linker deal with the details. This only works in virtual memory, of course, and doesn't work well at all in physical mem. But when you put things at known memory locations, it DOES create more opportunities for tightening up the assembler code. Heck, the entire *concept* of assembler SIB byte addressing [base + offset + index*size] assumes that either "base" or "offset" is a known *fixed constant* memory address. Without known fixed constant memory addresses, that entire CPU feature becomes much less useful, and much less of an enhancement to your code.

exkor · Post by **exkor** » Thu Mar 06, 2008 4:24 pm

JamesM wrote:[However also bear in mind that a store immediate instruction takes up more space than a register store, so if you're using the same constant over again I would reccommend storing it temporarily somewhere.

In ProtectedMode x86: savings start when you "mov" same constant(a byte like 01000101 wants) 6 or more times
'mov cl, 3' is not considered because its slight hit on performance in most cases
Same goes for LongMode x86-64 if r8-r15 not used

;29bytes
mov ecx, 3
mov [0723872h], cl
mov [0723872h], cl
mov [0723872h], cl
mov [0723872h], cl

;28 bytes
mov byte [0723872h], 3
mov byte [0723872h], 3
mov byte [0723872h], 3
mov byte [0723872h], 3

However x86 is optimized for eax reg:
;25 bytes in ProtectedMode, same 29byte in LongMode
mov eax, 5
mov [0723872h], al
mov [0723872h], al
mov [0723872h], al
mov [0723872h], al

~ · Post by ~ » Fri Mar 07, 2008 11:05 am

exkor wrote:
JamesM wrote:[However also bear in mind that a store immediate instruction takes up more space than a register store, so if you're using the same constant over again I would reccommend storing it temporarily somewhere.
In ProtectedMode x86: savings start when you "mov" same constant(a byte like 01000101 wants) 6 or more times
'mov cl, 3' is not considered because its slight hit on performance in most cases
Same goes for LongMode x86-64 if r8-r15 not used

;29bytes
mov ecx, 3
mov [0723872h], cl
mov [0723872h], cl
mov [0723872h], cl
mov [0723872h], cl

;28 bytes
mov byte [0723872h], 3
mov byte [0723872h], 3
mov byte [0723872h], 3
mov byte [0723872h], 3

However x86 is optimized for eax reg:
;25 bytes in ProtectedMode, same 29byte in LongMode
mov eax, 5
mov [0723872h], al
mov [0723872h], al
mov [0723872h], al
mov [0723872h], al

Wouldn't it be better to do:

Code: Select all

mov dword[0x723872],0x05050505

That will use 12 bytes in 16-bit mode (Unreal Mode), 11 bytes in 64-bit mode and 10 bytes in 32-bit mode.

Also, why to copy one same value several times in the same memory location?

exkor · Post by **exkor** » Fri Mar 07, 2008 11:01 pm

01000101 wants 1 byte not dword. Optimizations must be precise. You must know exactly what you want to you optimize. If you want general optimization leave it to your high level compiler unless you optimize algorithms.
You can change addresses, size of instruction will not change.

01000101 · Post by **01000101** » Fri Mar 07, 2008 11:43 pm

Even though I was talking about moving bytes of data, if what is said above to be true, wouldn't moving dwords be more efficient and optimised as far as machine instructions per asm line?

JamesM · Post by **JamesM** » Sat Mar 08, 2008 8:03 am

wouldn't moving dwords be more efficient and optimised as far as machine instructions per asm line?

Moving around an architecture's native word size is always efficient - it's what the vast majority of mov's are.

mrvn · Post by **mrvn** » Tue Mar 11, 2008 8:35 am

speal wrote:Barring optimization and other goofy things I can't address without seeing all the code:

*(unsigned char*)0xABCDEF = 12345;

will result in the same general machine instructions as:

unsigned char* address = 0xABCDEF;
*address = 12345;

I think you mean

Code: Select all

static const unsigned char* address = 0xCAFEBABE;
*address = 12345;

The static const makes this a global variable with a non changing value. For any optimizing compiler this will be just like a #define.

That said, what about the following?

Code: Select all

struct Memory_Mapped_Device {
  uint32_t reg_foo;
  uint32_t reg_bla;
  uint32_t reg_blub;
 ...
} *device = 0xCAFEBAB0;

debice->reg_foo = 17;
device->reg_bla = 23;

Isn't that much more readable than using addresses like below?

Code: Select all

*(uint32_t*)0xCAFEBAB0 = 17;
*(uint32_t*)0xCAFEBAB4 = 23;

I'm pretty certain the two will result in the same or speedwise equivalent code. The former might even be better as the compiler can put the address into a register and access it with an offset. The later might not see 0xCAFEBAB4 as being 0xCAFEBAB0 + 4.

MfG
Goswin

01000101 · Post by **01000101** » Tue Mar 11, 2008 2:08 pm

I posted a few code snippets in an earlier post about re-loacting a struct, I think that would also be a fair approach to make things more readable all while controlling the memory allocation process for variables.

basically, fill a struct with variables, then move then entire struct, and once place, the variables stack up from the base of the struct.