Splitting a Integer

Programming, for all ages and all languages.
Post Reply
srg

Splitting a Integer

Post by srg »

Hi

What is the best way to split an integer into two shorts? Or to split an interger into 4 chars? This is in C.

Thanks
srg
Nychold

Re:Splitting a Integer

Post by Nychold »

By using pointers or bitmasking and bitshifting. Exactly how to split them up would really depend on the machine you're working with (big endian/little endian). For instance:

Code: Select all

// Assuming:
// int is defined as a 32-bit integer
// short is defined as a 16-bit integer
// char is defined as an 8-bit integer
int someInteger=0x12345678;
short *someShort=(short *)&someInteger;
char *someChar=(char *)&someInteger;

// Big Endian:
//  someShort[0] = 0x1234
//  someShort[1] = 0x5678
//  someChar[0] = 0x12
//  someChar[1] = 0x34
//  someChar[2] = 0x56
//  someChar[3] = 0x78
//
// Little Endian:
//  someShort[0] = 0x5678
//  someShort[1] = 0x1234
//  someChar[0] = 0x78
//  someChar[1] = 0x56
//  someChar[2] = 0x34
//  someChar[3] = 0x12

short shortValue[2];
char charValue[4];

shortValue[0]=(short)(someInteger & 0xFFFF);
shortValue[1]=(short)((someInteger >> 16) & 0xFFFF);
charValue[0]=(char)(someInteger & 0xFF);
charValue[1]=(char)((someInteger >> 8) & 0xFF);
charValue[2]=(char)((someInteger >> 16) & 0xFF);
charValue[3]=(char)((someInteger >> 24 ) & 0xFF);

// Both Big Endian and Little Endian:
//  shortValue[0] = 0x5678
//  shortValue[1] = 0x1234
//  charValue[0] = 0x78
//  charValue[1] = 0x56
//  charValue[2] = 0x34
//  charValue[3] = 0x12
It really all depends on exactly what you want to do with it.
srg

Re:Splitting a Integer

Post by srg »

Oh well S** it, I'm going to rewrite it in NASM

srg
Schol-R-LEA

Re:Splitting a Integer

Post by Schol-R-LEA »

Actually, in there is a simpler way to do this, which works in both C and assembly: just overlap the variables. In C, you can do this with a union, like this:

Code: Select all

#define BYTE_SEX 1   /* 0 == big-endian, 1 == little-endian */

#if BYTE_SEX == 0
#define LSW(x)  x.half[1]
#define MSW(x)   x.half[0]
/* the most significant word is the upper part of the integer, and least sig. word is the lower part */
#else
#define LSW(x)  x.half[0]
#define MSW(x)  x.half[1]
#endif

union 
{
    int whole;
    short half[2];
} split_int;

short upper, lower; 

spilt_int.whole = 0x012345678;

upper = MSW(split_int);
lower = LSW(split_int);
Note that the Intel systems are little-endian, which is why I have BYTE_SEX set to 1 in this example. If you know that the code will only run on an x86 system (a bad assumption unless it is system code), you can do away with the macros and just use

Code: Select all

upper = split_int.half[1];
lower = split_int.half[0];
In x86 assembly it is even easier: just define two adjacent labelled words, then save the whole dword value at the first of them. You can then access the LSW with the first label and the MSW with the second. Again in NASM, this code will have the same result as the code above:

Code: Select all

[section .code]
mov ebx, DWORD x
mov LSW, ebx
mov ax, MSW
mov bx, LSW  ; not strictly necessary here, 
; but demonstrates the way you could do it otherwise

[section .data]
LSW dw 0
MSW dw 0
which leaves you with BX == LSW(x) and AX == MSW(x).

An even easier, and probably faster (I haven't tested it) way to get the same result is to load the variable into an 32-bit register, copy the lower half (which is the MSB) into a different 16-bit half-register, then shift the upper half into the lower half of the first register. In NASM, this would be:

Code: Select all

   mov eax,  DWORD x
   mov bx, ax
   shr eax, 16
The downside of this is that if you need to save the value to memory, you need to take two extra steps, whereas with the former approach, once it is saved you can access the two half values or the whole value directly from memory whenever you wish.

Any of these approaches can be used to extract individual bytes as well, just as long as you keep the byte order in mind.
srg

Re:Splitting a Integer

Post by srg »

I basically used the latter approach in a function I wrote in nasm.
Post Reply