Really stuck on seperating my buffer into segments

Programming, for all ages and all languages.
User avatar
kmcguire
Member
Member
Posts: 120
Joined: Tue Nov 09, 2004 12:00 am
Location: United States
Contact:

Re: Really stuck on seperating my buffer into segments

Post by kmcguire »

Thats the problem most likely. What is happening is that:

Code: Select all

char *data_1;
char *data_2;
Local Function Variables
Are allocated on the stack for the function. These variables are called local. They are always allocated on the stack. Lets talk a little about variables in C. You have quite a few basic ones like: char, short, and int.

Signed/Unsigned And Space Used On i386+
Then you can perpend unsigned or signed onto them like: unsigned short, signed int, or unsigned char. On a i386+ machine all types are by default signed.

So what does signed and unsigned mean? Well, they have to do with weather the integer can handle negative numbers. In mathematics you have something called a sign that tells if a number is negative like: +4, -5, or 6. Where 6 is by default positive. On the i386 processor there exists instructions to deal with signed or unsigned numbers. Why have unsigned or signed numbers? Well, the represent a number as negative or positive requires some space to store this. It is stored for a signed number in the most significant bit. If you remember that a byte is eight bits then you should understand that a signed byte would only have seven actual bits to store a value since the eight-th bit is used for the negative or positive sign.

This causes your data types like: char, short, and int; to change range. The range is two numbers that describe what values the integer can hold. For example:

unsigned char 0 to 255
signed char -128 to 128
unsigned short 0 to 65535
signed short -32767 to 32767
unsigned int 0 to 4294967296
signed int -2147483648 to 2147483648

Also lets go over how much space each takes on the i386+.

unsigned char and signed char take 1 byte
unsigned short and signed short take 2 bytes
unsigned int and signed int take 4 bytes

More Portable Types
You can also use the include file stdint.h like: #include <stdint.h>
Which defines types such as:
uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t that are guaranteed to use the same numbers of bytes no matter what machine your are targeting when compiling.
http://en.wikipedia.org/wiki/Stdint.h

Local Function Variables On Stack
When your function is compiled the compiler emits machine instruction that reserve some space on the stack. This space it reserves is enough to hold your local function variables. Lets say you have:

Code: Select all

char myVariable1;
char myVariable2;
If you remember the char type is one byte in size. You can check this by doing:

Code: Select all

printf("The size of a char is %u.\n", sizeof(char));
So if each variables takes one byte and we have two then that means the compiler will emit instructions to reserve two bytes on the stack for local function variables. The way the compiler reserves space on the stack is by using the stack pointer. The stack pointer is a register in the CPU that holds a thirty-two bit address when you are in protected mode using a i386+ processor. Stacks also grow downward and not upward. So the compiler will emit something like this:

Code: Select all

sub %esp, $2
This prepares the stack in the case of a function call. Since we start storing our local variable myVariable1 at ESP before the subtraction, and we store myVariable2 at ESP-1 before the subtraction. We do ESP-2 in case we perform a call instruction which will push the current EIP value onto the stack at the location pointed to by ESP so a RET instruction can later be used to come back. Take a look at this:

Code: Select all

myFunction:
mov %esp, %ebp
sub %esp, $2
movb $0, (%ebp)
movb $1, $-1(%ebp)
call myOtherFunction
ret
Which would be the equivalent of:

Code: Select all

void myFunction(void)
{
    char myVariable1;
    char myVariable2;
    myVariable1 = 0;
    myVariable2= 1;
    myOtherFunction();
    return;
}
Even if you do not understand everything you should understand that there was two bytes reserved on the stack for the local function variables: myVariable1 and myVariable2; using ESP the stack pointer.

Local Function Variable Arrays
Lets say you want to use arrays as local function variables. For instance:

Code: Select all

char myVariable[25];
This will cause the compiler to emit instructions to reserve 25 times the size of a char on the stack. printf("myVariable[25] will take %u bytes!\n", sizeof(char) * 25);
The actual amount of space on a i386+ is 25 bytes, because each char is one byte in size for that times 25 gives 25. If we used a short it would reserve 50 bytes instead of 25 since a short takes two bytes. Another example is a unsigned short which will take 50 bytes just like a signed short. The only difference in signed and unsigned is the instructions the compiler generates to perform operations on the data that the processor executes.

Pointers
Lets take a close look at pointer since they seem to be the core of much confusion. They are actually part of your problem of why both buffers always contain the same data. Take a look at:

Code: Select all

char *pchar;
short *pshort;
int *pint;
A pointer is always the width of a data address. The width on a i386+ is 32 bits. A byte is eight bits and 32 divided by 8 is four. This is exactly the same size as a unsigned int, signed int, or int. Even though it is char *pchar it is still stored as four bytes since it is a pointer! So the space reserved on the stack for these three variables is 12 bytes (since 3 times 4 is 12).

Lets take a look at the difference in instructions when using a pointer and basic data type.

Code: Select all

---- C/C++ Code ---
char *pchar;
short *pshort;
pchar = 0;
pchar = (char*)5;
pshort = 0;
pshort = (short*)3;
--- GAS Assembly ---
mov %esp, %ebp
addl %esp, $8
movl $0, (%ebp)
movl $5, (%ebp)
movl $0, $-4(%ebp)
movl $3, $-4(%ebp)
Now lets take a look at not using pointers.

Code: Select all

---- C/C++ Code ---
char a;
short b;
a = 0;
a = 5;
b = 0;
b = 3;
--- GAS Assembly ---
movl %esp, %ebp
addl %esp, $3
movb $0, (%ebp)
movb $5, (%ebp)
movw $0, $-1(%ebp)
movw $3, $-1(%ebp)
It looks exactly the same right? (except for the variable name changes). Lets try actually using a pointer and see the difference:

Code: Select all

---- C/C++ Code ---
char *pchar;
short *pshort;
pchar = 0;
*pchar = 5;
pshort = 0;
*pshort = 3;
--- GAS Assembly ---
movl %esp, %ebp
addl %esp, $8
movl $0, (%ebp)
movl (%ebp), %eax
movb $5, (%eax)
movl $0, $-4(%ebp)
movl $-4(%ebp), %eax
movw $3, (%eax)
It may look a little confusing, but what is actually happening is that pchar is a pointer to type char. So when you normally access the variable pchar you are accessing something just like a unsigned int. Except, you have to specifically cast your values to char* like:

Code: Select all

pchar = (char*)3392;
This lets the compiler know that this _is_ what you want to do. The compiler enforces the cast to help keep you from making mistakes, and filling a pointer with a wrong value is a big mistake!

So since pchar is a pointer to char type we must use it like a pointer. There are two ways:

Code: Select all

*pchar = 5;
pchar[0] = 5;
*(pchar + 1) = 22;
pchar[1] = 22;
Both ways do the exact same thing. One is just shorthand for the other. *(pchar + 0) = 5 is the same as pchar[0] = 5. The star in front of the pointer tells the compiler to resolve the pointer which means it will use the pointer as the address to data of type char. Take a look at:

Code: Select all

char ***a;
***a = 5;
This will cause the compiler to emit instructions to tell the processor to:

Code: Select all

movl (%ebp), %eax
movl (%eax), %eax
movl (%eax), %eax
movb $5, (%eax)
It works like a pointer to a pointer to a pointer of type char. So it expects three different location of four bytes where each location contains an address of the next and finally the actual data of type char.

Just Using A Pointer
Lets say you do something like:

Code: Select all

char *buffer_1;
char *buffer_2;
And, you start using these. What actually happens? Well, memory in a computer is constantly being used, filled, erased, and all sorts of things. So there is no way to tell the actual initial values in the pointer to type char local function variables. You remember that a pointer is four bytes on the i386+. Well, the compiler reserved some space on the stack for eight bytes. It is no telling what values are in these eight bytes. It may just so happen that the variable buffer_1 has the exact same value as buffer_2. In this case both pointers point to the same location.. or not.

You have to initialize the pointer to point somewhere, or it could essentially cause the program to terminate from an exception. It could overwrite data or code in your program and thats not good unless you do it on purpose and in this case you have no control. So we have two ways to initialize pointers under normal circumstances.

Malloc (C/C++ Language)

Code: Select all

buffer_1 = malloc(sizeof(char) * 100);
buffer_2 = malloc(sizeof(char) * 100);
New (C++ Language)

Code: Select all

buffer_1 = new char[100];
buffer_2 = new char[100];
Casting

Code: Select all

buffer_1 = (char*)0x91838;
buffer_2 = (char*)0x99382;
The casting really does us no good but I wanted to show it to you. The reason it does us no good is because I just used random numbers and you would be doing the same. We have no idea where that is in memory and if we can even write something there. So the best way is to use malloc or new. Under C (not C++) you can not use the new operator as it does not exist. You can however use malloc in the language C. The new operator is for usage with classes. It does a lot of important things like calling the class's default constructor and such which malloc will not do. You can use malloc from C++, but you can not use new in C.

Once you make these calls they will fill the pointer with an address to an area of memory. the new char[100] and malloc(sizeof(char) * 100) both allocate 100 bytes since a char is one byte in size and 100 times 1 equals 100.

Once this is done you can use your pointers..

To Fix Your Problem
Try something like:

Code: Select all

char *buffer_1 = (char*)malloc(sizeof(char) * 50);
char *buffer_2 = (char*)malloc(sizeof(char) * 50);
cjhawley001
Member
Member
Posts: 29
Joined: Mon Jun 30, 2008 9:51 am

Re: Really stuck on seperating my buffer into segments

Post by cjhawley001 »

ok, thank you for that lesson, it really sorted a lot of things out, and taught me a lot that i didnt know!

now everything works like it should!

thank you so much for all of your help, and for writing that whole lesson on variables and pointers...it helped so much!

-chris
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Really stuck on seperating my buffer into segments

Post by AJ »

@kmcguire: Wow - I know all of that is fairly basic C stuff, but I wonder if your nice succinct guide would be useful on the wiki?

Cheers,
Adam
Post Reply