Memory Address Handling in C - Updated - OSDev.org

Memory Address Handling in C - Updated

22 posts

1
2
Next

sparky: Posts: 19; Joined: Sun Apr 29, 2007 5:48 pm

Memory Address Handling in C - Updated

Quote

Post by sparky » Sun Apr 29, 2007 5:59 pm

Hi all,

I am having problems using memory addresses within c code. I've seen code snippets with "unsigned long *address = 0x10000" and "uchar *addr = 0x10000" and am wondering which is correct, or when to use each example. Ok let me try to clarify with a few short questions...

1. What is the following doing? How can we cast an int to a unsigned char* ?

Code: Select all

 
g_pmm_map = (unsigned char*)page; // page is an unsigned int

2. What is the difference (pointer/mem_address-wise ignore the shift operator) between...

Code: Select all

  g_pmm_map = (unsigned char*)page;

and

Code: Select all

 g_pmm_map[(page>>12)+x] = 0;

3. In the following are the declared types and casts fine?

Code: Select all

unsigned char *p = (unsigned char *)0x100000;
do {                                     
    while (*p!=02) p++;
    if (*(p+1)==0xB0 && *(p+2)==0xAD && *(p+3)==0x1B) break;
}
while (p != (unsigned char *)0x100100);

So basically I'm trying to get an understanding of how to assign/compare/find both the contents of memory and the actual address's.

Thanks for the help.

Last edited by sparky on Mon Apr 30, 2007 8:20 am, edited 3 times in total.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Sun Apr 29, 2007 6:07 pm

C:

Code: Select all

unsigned int *a = (unsigned int*)500;
*a = 4;
a = 4;                 // can generate a error or warning depending on the compiler.

unsigned short *b = (unsigned short*)300;
*b = 2;

unsigned int ***a = (unsigned int***)300;
***a = 200;

ASM:

Code: Select all

mov eax, 500
mov dword [eax], 4
mov eax, 4

mov eax, 300
mov word [eax], 2

mov eax, 300
mov eax, dword [eax]
mov eax, dword [eax]
mov dword [eax], 200

typedef unsigned char uchar;
typedef unsigned char* puchar;

puchar a;
uchar *b;
a = b;

PS: I have no idea what you are asking. Just use them. =)

Candy: Member; Posts: 3882; Joined: Tue Oct 17, 2006 11:33 pm; Location: Eindhoven

Quote

Post by Candy » Mon Apr 30, 2007 11:38 am

Kevin McGuire wrote: typedef unsigned char uchar;
typedef unsigned char* puchar;

puchar a;
uchar *b;
a = b;
[/i]
PS: I have no idea what you are asking. Just use them. =)

Please, for the love of what's holy (or what's good, if you're nonreligious) don't typedef a pointer type. If it's a pointer, expose it as such.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 12:27 pm

Whoa that guy edited his complete post. I should have used a quote.

I always thought it was better not to expose something as a pointer type kind of like a magic curtain of what is behind the scenes is the behind the scenes business.

I was wondering what be some problems of making a pointer in a typedef?

Candy: Member; Posts: 3882; Joined: Tue Oct 17, 2006 11:33 pm; Location: Eindhoven

Quote

Post by Candy » Mon Apr 30, 2007 12:44 pm

Kevin McGuire wrote:Whoa that guy edited his complete post. I should have used a quote.

I always thought it was better not to expose something as a pointer type kind of like a magic curtain of what is behind the scenes is the behind the scenes business.

I was wondering what be some problems of making a pointer in a typedef?

You conceal the fact that it's a pointer, so you're confusing people that try to make a deep copy of an object.

Code: Select all

struct {
   puchar x;
   uchar *y;
   uchar z;
};

without knowing the types in question, I should be able to rely on x and z to be an object I can just value-assign and y to be something I should use a deep copy thing for. If you use puchar as a pointer, that logic completely fails.

On the other hand, it's better than using a define for the same purpose, since that also breaks normally valid code:

Code: Select all

#define PUCHAR uchar *
PUCHAR a = 0, b = 0;
a = b;

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Re: Memory Address Handling in C - Updated

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 12:58 pm

sparky wrote:Hi all,

I am having problems using memory addresses within c code. I've seen code snippets with "unsigned long *address = 0x10000" and "uchar *addr = 0x10000" and am wondering which is correct, or when to use each example. Ok let me try to clarify with a few short questions...

1. What is the following doing? How can we cast an int to a unsigned char*
?
Code: Select all
 
g_pmm_map = (unsigned char*)page; // page is an unsigned int

Well. A unsigned char* is the exact size of a int so no data would be lost in the conversion for one.

unsigned char* = 4 bytes
char* = 4 bytes
int* = 4 bytes
void* = 4 bytes
... and so on ..

Any pointer on a thirty-two bit machine is thirty-two bits in size. However, what the pointer actually points to may be of a certain type.

So unsigned char *g_pmm_map is really four bytes in size. The unsigned char part is ignored until needed. Only the * is used to tell the compiler that the variable g_pmm_map is a pointer, and it is four bytes.

Once you try to access something with the pointer then the type rules apply.

*g_pmm_map = 4;
g_pmm_map[100] = 4;

Here the compiler sees that you are using a pointer to access some data and it will load the thirty-two bit value represented by g_pmm_map into a machine register, and then use that value as a memory address to access one byte of data since it was declared unsigned char.

2. What is the difference (pointer/mem_address-wise ignore the shift operator) between...
Code: Select all
  g_pmm_map = (unsigned char*)page; 
and
Code: Select all
 g_pmm_map[(page>>12)+x] = 0; 

unsigned char *g_pmm_map;
g_pmm_map = 4;
g_pmm_map[100] = 4;

The first uses g_pmm_map as a pointer value (for lack of knowing exactly what to call this relationship). If you used *g_pmm_map = 4; instead you would have uses g_pmm_map as a pointer to a value hence the star. Or: **g_pmm_map would be a pointer to a pointer to a value just for a example.

The g_pmm_map[100] is considered a pointer to value when using the array characters after it. "[x]"

Code: Select all

mov edi, g_pmm_map
mov ecx, 100
mov ebx, dword [edi+ecx*4] // See the dword [...] as a pointer.

3. In the following are the declared types and casts fine?

Code: Select all

unsigned char *p = (unsigned char *)0x100000;
do {                                     
    while (*p!=02) p++;
    if (*(p+1)==0xB0 && *(p+2)==0xAD && *(p+3)==0x1B) break;
}
while (p != (unsigned char *)0x100100);

Does it work? What does it not do that you would like it to do? It appears that it would cause no compiler warning or errors. It looks ok to me, except I did not try to assume what it is supposed to be doing.

So basically I'm trying to get an understanding of how to assign/compare/find both the contents of memory and the actual address's.

I am starting to understand that you are actually asking two questions:

1. How to use pointers.
2. What is the correct way to use pointers.

I can explain number one, and you apparently have already figured out how to partially do number two if you wrote that last function. But, you are correct in thinking that there are certain ways to do things to make you're source code remain portable and compilable for many years.

So for number two you need someone else to tell you about the pitfalls of:

<and I do barely know what I the hell I am talking about on number two>
1. Using a unsigned int or int[/] for a pointer on other platforms.
I think a 'long' , 'long long', or some weird type becomes the size of a pointer on a sixty-four bit machine...*shrug*.
2. Aliasing Rules. (GCC at least)
3. And anything I have forgotten.

I learned everything from the school of hard knocks. The number two question has never bothered me, because if I ever needed to know I could almost instantly find the information from "google", which in this case I will let someone else do.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 1:13 pm

You conceal the fact that it's a pointer, so you're confusing people that try to make a deep copy of an object.

I did some searching in google, and found some examples of:

1. Deep Copying
2. Shallow Copying
3. Bitwise Copying

http://www.fredosaurus.com/notes-cpp/oop-condestructors/shallowdeepcopy.html wrote: A shallow copy of an object copies all of the member field values. This works well if the fields are values, but may not be what you want for fields that point to dynamically allocated memory. The pointer will be copied. but the memory it points to will not be copied -- the field in both the original object and the copy will then point to the same dynamically allocated memory, which is not usually what you want. The default copy constructor and assignment operator make shallow copies.

A deep copy copies all fields, and makes copies of dynamically allocated memory pointed to by the fields. To make a deep copy, you must write a copy constructor and overload the assignment operator, otherwise the copy will point to the original, with disastrous consequences.

@Candy:

So I was figuring that if you are using code that you did not write in the form of:
1. copy and paste
2. library with included headers

Should you not read the documentation for that C&P or library if any exists to know if you would even want to perform "copy" per say. I mean if I was using a library like so:

Code: Select all

#include <okefonoke.h>

int main(int argc, char *argv[]){
     OKEFOK a, b;
     a = b;
}

How do I even know it is a good idea to deep copy b to a?
What is OKEFOK?

You conceal the fact that it's a pointer, so you're confusing people that try to make a deep copy of an object.

So what I am asking is if there is a good example that could show me why the documentation alone would not answer a question of weather a object is deep copyable, and why some simple tests would not reveal a object as being a pointer or value?

As in maybe I do not want them to deep copy a object or something? What if entire purpose was to trick them into thinking they copied a object but really just keep referencing the same object?

And no I am not actually taking a defensive stance, but more of a half and half looking for a answer.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 1:31 pm

@Candy:

Oh. I think I just figured out what you mean.

You mean the way I used it (above) was a poor usage, not that any usage of a pointer in a typedef is bad. Right?

I am thinking you mean never use a pointer in a typedef.

Candy: Member; Posts: 3882; Joined: Tue Oct 17, 2006 11:33 pm; Location: Eindhoven

Re: Memory Address Handling in C - Updated

Quote

Post by Candy » Mon Apr 30, 2007 1:59 pm

<mode=language lawyer>

Kevin McGuire wrote:Well. A unsigned char* is the exact size of a int so no data would be lost in the conversion for one.

unsigned char* = 4 bytes
char* = 4 bytes
int* = 4 bytes
void* = 4 bytes
... and so on ..

That's bull. Only intptr_t is defined to be equal in size to a pointer, the rest is not always equal. Try amd64 on gcc for a counterexample.

Any pointer on a thirty-two bit machine is thirty-two bits in size. However, what the pointer actually points to may be of a certain type.

That again is a lie. Did you consider for example Watcom which supported segmented code, in which you can have far pointer (which are still pointers) that take 48 bits of space? Yes, that's just 32 bit mode.

1. Using a unsigned int or int[/] for a pointer on other platforms.
I think a 'long' , 'long long', or some weird type becomes the size of a pointer on a sixty-four bit machine...*shrug*.

Evil. Use intptr_t wherever available, make a typedef on the rest that makes your code portable so you'll at most have to change the typedef once.

2. Aliasing Rules. (GCC at least)

Again, use intptr_t. If you need to typedef it yourself, you'll have to figure out which native type doesn't violate the aliasing rules for the compiler and platform you target.

3. And anything I have forgotten.

Well... most I can think of is alignment, but you'll learn that pretty quickly after working on a sparc or such.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 2:07 pm

I learned something! It feels good. Do it again!

Candy: Member; Posts: 3882; Joined: Tue Oct 17, 2006 11:33 pm; Location: Eindhoven

Quote

Post by Candy » Mon Apr 30, 2007 2:07 pm

Kevin McGuire wrote: As in maybe I do not want them to deep copy a object or something? What if entire purpose was to trick them into thinking they copied a object but really just keep referencing the same object?

I'll send the copy to China then, let's see how your pointer holds up

If you hide whether it's a pointer or not, you make the library confusing for users. When I pass the "thing" to a new function, how should I do that? Can I do that on the stack or should I pass a pointer to it? Does it have to be writable?

Developers write defensive code that defends against anything you can throw at it, as far as they can tell, and the more experience developers get the more defensive their code gets. Most commercial code I've seen was completely filled with pointless checks and test variables and so on. I expect that if you compile that code with a compiler that does global constant propagation on the entire program that 50% of the code would instantly disappear.

Consider for example Allegro (www.allegro.cc). They use a bitmap object that you need a handle of sorts for. They give you a "BITMAP *" as such, so you know it's a type, you know it's a pointer but the rest you don't know. It could be a union, struct, class or char[] for all I care, but you at least know it's a pointer.

Never mind most of the warning, typedeffing is a lot safer than defining in any case.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 2:11 pm

I like it. It makes sense. I really do appreciate the time you took to explain this to me. This I feel will be a good practice to keep in mind when writing code instead of my usually way of thinking lately as, "keep pointer unexposed".

Thanks.

sparky: Posts: 19; Joined: Sun Apr 29, 2007 5:48 pm

Quote

Post by sparky » Mon Apr 30, 2007 5:54 pm

Thanks for hijacking the thread...haaha only kidding, some really nice points made. I always like to learn the ideal and best ways to code and more importantly why they are the best ways.

Kevin: Quick question regarding your informative reply..

Code: Select all

unsigned char *g_pmm_map;
g_pmm_map = 4;

Won't this be flagged as a compiler error for assigning an int to a pointer?

sparky: Posts: 19; Joined: Sun Apr 29, 2007 5:48 pm

Quote

Post by sparky » Mon Apr 30, 2007 6:00 pm

Sorry, that was lazy of me, Ive just tried it and it does indeed flag the error as "warning: assignment makes pointer from integer without a cast". It's issues exactly like this i want to understand if theres anyway to clean the code from warnings and errors, and to understand why a variable needs casting or such.

Cheers.

Kevin McGuire: Member; Posts: 843; Joined: Tue Nov 09, 2004 12:00 am; Location: United States; Contact:
Contact Kevin McGuire

Website

Quote

Post by Kevin McGuire » Mon Apr 30, 2007 7:45 pm

I wish Solar was around, and would answer this because I think he was really big on the things you are asking. I really only write software for the x86, and have hardly ever dealt with portability or even cases of extreme optimization which are basically the two points I can think of that would be very important to you're question.

The only point I could think of would be to reduce bugs by using a good coding style in C (I rarely use C++).

However borrowing a bunch of compiler flag for GCC from Solar's Makefile Tutorial:

http://www.osdev.org/wiki/Tutorial:Makefile wrote: -Wall -Wextra -pedantic -Wshadow -Wpointer-arith -Wcast-align \
-Wwrite-strings -Wmissing-prototypes -Wmissing-declarations \
-Wredundant-decls -Wnested-externs -Winline -Wno-long-long \
-Wconversion -Wstrict-prototypes

Try turning all these on and then writing something, and see what it flags as a error. Once you see a error try to determine what it is talking about and do a Google search about it. For example:

Code: Select all

int main(int argc, char *argv[]){
	unsigned int a = 2;
	int b = -5;
	if(a > b){
		printf("a>b\n");
		return 2;
	}
	printf("b>a\n");
	return 1;
}

Will say, "b>a". Do you think this is correct?

I think it is not. However it is rooted in the way the x86 does addition and subtraction.

Thus, a circuit designed for addition can handle negative operands without also including a circuit capable of subtraction (and a circuit which switches between the two based on the sign).

In this case the circuit in a processor in a generic case here does not care if a number is signed(negative and positive) or unsigned(just positive). It can add and subtract unsigned and signed numbers using the same circuit.

You should already know that if the processor has the number 0xFFFFFFFF in one of it's registers and it adds one that the number will wrap around back to 0x0 due to the way the circuit works which can be shown at this site: (also a great resource)
http://www.play-hookey.com/digital/adder.html

To do this it uses the two-compliment system. In which negative numbers count backward from zero.

0100 = 4
0011 = 3
0010 = 2
0001 = 1
0000 = 0
1111 = -1
1110 = -2
1101 = -3
1100 = -4

http://en.wikipedia.org/wiki/Two's_complement

You prolly wonder why? Of course I wondered the same thing until I tried it.

0011 + 1111 = 0010
3 + 15 = 18 = 10010

The one is dropped or on the x86 the carry flag in the EFLAGS is set. This carry flag is used in compare instruction on the x86.

Never the less addition of a positive and negative number was done using the exact same circuit and method that would also work for a positive number added to another positive number.

To do subtraction the circuit might contain thirty-three bits for each operand that is loaded into it. Then it would simply set the thirty-third bit of the second operand to subtract it from the first.

Okay. You might still be wonder so what is wrong?

Well. The problem is rooted in the fact that we lose a bit we making a integer signed. If we take that one bit and make it denote a positive or negative number then we lose the maximum value such as:

2^32 (unsigned int)
2^31 (signed int)

So if we did some calculation that pushed towards the limits of the x86's maximum register value and we did not need to represent negative numbers then we could simply use unsigned int instead of cutting our maximum value in half by using a signed int or also the equivalent in C being int.

The signed portion is default unless you specify unsigned to a primitive type in C.

So why does this fail?

Code: Select all

int main(int argc, char *argv[]){
	unsigned int a = 2;
	int b = -5;
	if(a > b){
		printf("a>b\n");
		return 2;
	}
	printf("b>a\n");
	return 1;
}

The compiler expects that you might put a value between 0 and 2^32 in a. It also expects that you might put a value between -(2^31) and 2^31 in b.

The problem occurs because of the compare instruction. There is no way for the operation to compare the signed and unsigned with out converting one to the other. It either has to compare them as unsigned or signed. If it treats a as signed, and a value over 2^31 is represented in it then it will become treated as a negative number causing a bug possibly since this might not be intentional.

Of course the opposite happens in the case above, and as far as I know C always treats a smaller data type like the larger data type in the operation that is happening between them.

So what it did was treat b as a unsigned int instead of signed and this caused a to become the value 0xFFFFFFFA. If you remember negative numbers start at after the positive zero(0000b).

So it really did if(0x2 > 0xFFFFFFFA).
--

Since C is designed to be portable too and as far as I can tell the people who came up with the C standard decided to help keep you from worrying over these things to use a signed and unsigned component to the data type.

And the warning is a result of this. Telling you, "Hey. Something might go wrong with this. You need to take a really close look and make sure that for the target platform(s) this is going to be okay to do.".

There could exist a instruction that might compare a signed with a unsigned integer so I suppose doing things this way supports this event..

I dunno. I had to research this all up tonight just to present it in this messy fashion. I knew a good bit about it, but it still takes a little to wrap my small brain around and hopefully you can understand it quicker than I.

There are most likely a lot more in depth and complex reasons why you have certain warnings in C. You will most likely end up doing a lot of research and reading to find the answers just like I had to do.

http://en.wikipedia.org/wiki/Aliasing_(computing)
The aliasing deal has something to do with the compiler trying to optimize you're code as much as possible.

Here is something related to you're casting question.
http://209.5.165.104/search?q=cache:r3h ... -bin/info2\
www%3F(gcc.info)Warning%2520Options+C+standard+comparison+between+signed+\
and+unsigned&hl=en&ct=clnk&cd=9&gl=us

(above) wrote: -Wcast-align'
Warn whenever a pointer is cast such that the required alignment
of the target is increased. For example, warn if a `char *' is
cast to an `int *' on machines where integers can only be accessed
at two- or four-byte boundaries.

I have never run into this, but someone has and that is why it is there.

Code: Select all

unsigned int c = 4;
unsigned int *a = &c;
unsigned int b;
b = a;

What if forge that a was a pointer to a unsigned int in a long and complex function. It could take a while to see the bug. So the compiler generates a warning that something looks wrong.

To tell it this is exactly what you want you might do:
b = (unsigned int)a

Since that is exactly what the compiler was going to do. It just lets the compiler know that you and it are on the same page of thinking.

Code: Select all

unsigned int *a = 0;
unsigned char *c = 0;

Will generate a warning. Since the compiler is thinking well is he really wanting to do this, or is he about to create a bug in his program. So:
c = (unsigned char*)a

Tell it this is what you want to do, and the warning will disappear.

The warnings I feel are there more to help you than tell you what you are doing is wrong. Of course if you are new, or maybe trying to make something portable they will also help you know what you might want to check out in the source code to make sure it will work where you want it to work.

There are some people who feel you should not do some things. Of course there are a lot of those situation such as the one earlier where Candy did not like the typedef hiding the fact that something was a pointer which he made a valid point about. Although using something like that would depend on the situation and how much time you have. You might want to write something quickly, then again you might want to write something portable and understandable to many people.

And the answer with casting.
really big on the things you are asking. I really only write software for the x86, and have hardly ever dealt with portability or even cases of extreme optimization which are basically the two points I can think of that would be very important to you're question.

The only point I could think of would be to reduce bugs by using a good coding style in C (I rarely use C++).

However borrowing a bunch of compiler flag for GCC from Solar's Makefile Tutorial:

http://www.osdev.org/wiki/Tutorial:Makefile wrote: -Wall -Wextra -pedantic -Wshadow -Wpointer-arith -Wcast-align \
-Wwrite-strings -Wmissing-prototypes -Wmissing-declarations \
-Wredundant-decls -Wnested-externs -Winline -Wno-long-long \
-Wconversion -Wstrict-prototypes

Try turning all these on and then writing something, and see what it flags as a error. Once you see a error try to determine what it is talking about and do a Google search about it. For example:

Code: Select all

int main(int argc, char *argv[]){
	unsigned int a = 2;
	int b = -5;
	if(a > b){
		printf("a>b\n");
		return 2;
	}
	printf("b>a\n");
	return 1;
}

You might do instead:

Code: Select all

...
if(b > 0){
   if((unsigned int)b > a){
      printf("b > a\n");
   }
}
printf("a > b\n");
....

The cast (above) suppresses the warning.

[edit mod="candy"]Sorry for the line splits, can't get the url tag to work[/edit]

Last edited by Kevin McGuire on Tue May 01, 2007 5:25 pm, edited 1 time in total.

Post Reply

22 posts

1
2
Next

Return to “General Programming”