C: Unsigned vs signed, short vs int

suslik · Post by **suslik** » Thu Aug 23, 2012 6:08 am

In the 3d edition of C Programming Language Mr Stroustrup said: "Prefer a plain int over a short int or a long int" and "Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea". Why? I have made some investigation and found these:

1) Do not mix unsigned and signed in the arithmetic expressions, since if there is a comparison
then you can have a bug. For example:

Code: Select all

void foo(void)
{
unsigned int a = 6;
int b = -20;
(a + b > 6) ? puts(">6") : puts("<=6");
}

This function always print ">6", since b is converted to unsigned then added with a (result is a huge value since is treated as unsigned) and finally compared with unsigned 6.

To fix this replace "a + b > 6" with "(int)a + b > 6". But to guard yourself from this annoying bug always use signed type for variables that you want to use as numbers (and do all math you need in obvious way) and use unsigned type only if you intend to use this variable as bitmask.

OK, it is clear for me now.

2) Choose the minimal integer type that can hold all possible values of your variable, i.e: if my variable will be 0..300 that I should use short. 0..65535 then I should use int (not unsigned short! See item1) . There is one exception in this rule: use int (which represents word size of CPU) for loop counters and similar (i.e. for variables that are used in loops).

I agree with this. But what about putc (int ch)? In docs I see "ch will be converted to unsigned char" Why not "putc (unsigned char ch)"? And often I see "outportb(int port, unsigned val)" instead of "outportb(int port, unsigned char val)". The second variant looks natural for me. May be they want to improve a performance (to get the unsigned char arg from the stack into 32-bit register compiler need generate movzx instead of mov)?

pauldinhqd · Post by **pauldinhqd** » Thu Aug 23, 2012 7:20 am

suslik wrote:In the 3d edition of C Programming Language Mr Stroustrup said: "Prefer a plain int over a short int or a long int" and "Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea". Why? I have made some investigation and found these:

1) Do not mix unsigned and signed in the arithmetic expressions, since if there is a comparison
then you can have a bug. For example:
Code: Select all
void foo(void)
{
unsigned int a = 6;
int b = -20;
(a + b > 6) ? puts(">6") : puts("<=6");
}
This function always print ">6", since b is converted to unsigned then added with a (result is a huge value since is treated as unsigned) and finally compared with unsigned 6.

To fix this replace "a + b > 6" with "(int)a + b > 6". But to guard yourself from this annoying bug always use signed type for variables that you want to use as numbers (and do all math you need in obvious way) and use unsigned type only if you intend to use this variable as bitmask.

OK, it is clear for me now.

2) Choose the minimal integer type that can hold all possible values of your variable, i.e: if my variable will be 0..300 that I should use short. 0..65535 then I should use int (not unsigned short! See item1) . There is one exception in this rule: use int (which represents word size of CPU) for loop counters and similar (i.e. for variables that are used in loops).

I agree with this. But what about putc (int ch)? In docs I see "ch will be converted to unsigned char" Why not "putc (unsigned char ch)"? And often I see "outportb(int port, unsigned val)" instead of "outportb(int port, unsigned char val)". The second variant looks natural for me. May be they want to improve a performance (to get the unsigned char arg from the stack into 32-bit register compiler need generate movzx instead of mov)?

(1) is an interesting example which may cause a bug, i believe compiler should raise warning whenever doing operation on variables of different types, instead of just implicitly convert the values. however as i know, cpu doesn't know about the sign, it uses two's compliment.

(2) putc(int ch) is using 32bit number instead of byte, possibly for extensibility? for unicode? (<--just joking).

and i have another discussion here:

(3) why use "char*" to present string, instead of "unsigned char*"? can a character be negative?

turdus · Post by **turdus** » Thu Aug 23, 2012 7:25 am

I think Mr Stroustrup is wrong. Use singed integer if the represented value can be negative, and use unsigned if only positive numbers make sense.
The advice "always use signed int for numbers" leads to funny errors, like negative memory addresses in E820 map, only 2G storage capacity instead of 4G, and so on. Bad habit. You should always consider what the variable is for, what it represents, and choose singed/unsigned accordingly.

Owen · Post by **Owen** » Thu Aug 23, 2012 7:32 am

The use of int for characters throughout the C standard library is a matter of history - remember, K&R C didn't have prototypes

JamesM · Post by **JamesM** » Thu Aug 23, 2012 8:59 am

The advice "always use signed int for numbers" leads to funny errors,

Also, signed integers cause more optimisation problems. Zero-extending is more efficient in many cases (can be short-circuited) than sign extending, for example.

suslik · Post by **suslik** » Thu Aug 23, 2012 9:22 am

why use "char*" to present string, instead of "unsigned char*"? can a character be negative?

- since "char - a singe byte, capable of holding one character in the local character set" (K&R C) So, char *str - is natural way for declaring string.

Use singed integer if the represented value can be negative, and use unsigned if only positive numbers make sense.

- In this case you can miss difficult-to-locate error while mixing signed with unsigned in one expression.

The use of int for characters throughout the C standard library is a matter of history - remember, K&R C didn't have prototypes

- I suspect the same, but I've no clear explanation.

Griwes · Post by **Griwes** » Thu Aug 23, 2012 10:08 am

why use "char*" to present string, instead of "unsigned char*"? can a character be negative?

IDK about C at all, but in C++, char, signed char and unsigned char are three distinct types; default way to print "char" in standard streams is to print character encoded by it; (un)signed char is always treated as a number.

Love4Boobies · Post by **Love4Boobies** » Thu Aug 23, 2012 10:19 am

First of all, I see no one has made an extremely important point: This thread is about C++, not C (I took the liberty to change the thread's title but the OP still talks about C). They are different languages; C isn't a subset of C++. If your C code even compiles with a C++ compiler, it will likely not mean the same thing because there are many semantic differences between the two languages. That said, my answers will be about C, not C++, because I know the former much better.

suslik wrote:To fix this replace "a + b > 6" with "(int)a + b > 6".

You can never do this in C; that overflow has undefined behavior.

suslik wrote:2) Choose the minimal integer type that can hold all possible values of your variable, i.e: if my variable will be 0..300 that I should use short. 0..65535 then I should use int (not unsigned short! See item1) . There is one exception in this rule: use int (which represents word size of CPU) for loop counters and similar (i.e. for variables that are used in loops).

This generalization is nonsense; it will lead to all sorts of problems. The thing about the loop counter is just as terrible---use whatever you need and don't worry about performance (let the compiler optimize it away; C itself makes no performance requirements). For instance, your loop may require size_t because you're using the counter to index an array, possibly for string processing.

suslik wrote:I agree with this. But what about putc (int ch)? In docs I see "ch will be converted to unsigned char" Why not "putc (unsigned char ch)"?

The rationale is that putchar/putc/fputc's argument was made to match getchar/getc/fgetc's return value, which is int due to the fact that it may also return EOF. Note that putc does not have the prototype you wrote there.

pauldinhqd wrote:(3) why use "char*" to present string, instead of "unsigned char*"? can a character be negative?

Yes. Also, note that char is not the same as signed char. It can refer to either a signed or an unsigned type, depending on the C implementation (e.g., GCC allows for both). This is not true for other types, like int.

turdus wrote:The advice "always use signed int for numbers" leads to funny errors, like negative memory addresses in E820 map

That's true, but the example isn't a good one because you should be using fixed-width types there to begin with, such as uint32_t.

suslik wrote:- I suspect the same, but I've no clear explanation.

This is possibly (it was never officialy explained) the reason for which there are no char literals (e.g., unlike in C++, 'a' is an int literal). Something like toupper('c') in traditional C (i.e., pre-C89) caused an int to be passed. So as not to break older code, they did not make 'c' a char because, without prototypes, converting the value to the argument's expected type can be problematic.

EDIT: Damn, Griwes made the comment about char's default signedness as I was typing this response.

OSwhatever · Post by **OSwhatever** » Thu Aug 23, 2012 6:25 pm

C++ is evidence enough to show that Mr Stroustrup isn't always right. In low level programming, using unsigned integer is usually important as indexes, offsets and addresses cannot be negative and therefore I tend to use unsigned integers quite often. Basically, where it is natural to use unsigned. Now because C and C++ default constants in expressions as signed unless you explicitly define them as unsigned, you have to look out. Often it works without paying attention but sometimes you can get into trouble because of it.

I think Mr Stroustrup means mostly that you should use signed in application programming just as Java removed unsigned completely because we mortals cannot deal with the signed/unsigned dilemma.

Love4Boobies · Post by **Love4Boobies** » Fri Aug 24, 2012 1:04 am

In C, you can use negative indices as long as you don't go out of bounds. I don't know about C++, but I suspect it's the same.

cxzuk · Post by **cxzuk** » Fri Aug 24, 2012 5:22 am

if C++ is a superset of C. why is C not considered a subset?

Love4Boobies · Post by **Love4Boobies** » Fri Aug 24, 2012 5:35 am

C++ is not a superset of C. There is a subset that is common to both but most C code won't compile with a C++ compiler or it won't have the same meaning. Here is a program that illustrates some of the problems (some will result in compilation errors, some in different output) but there are plenty of other examples:

Code: Select all

int x[FOO];

int main(void)
{
    struct x { char bar; } *y = malloc(sizeof *y); // Illegal without a cast in C++; good practice in C.

    printf(             // Implicit declaration; won't work in C++.
        "%zu %zu\n",    // The appropriate format specifier doesn't exist in C++.
        sizeof 'a',     // This is sizeof (int) in C and sizeof (char) in C++.
        sizeof (x)      // This is the size of the array in C and the size of the struct type in C++.
    );

    return 0;
}

Owen · Post by **Owen** » Fri Aug 24, 2012 6:44 am

Code: Select all

        "%zu %zu\n",    // The appropriate format specifier doesn't exist in C++.

In C++97/C++03

C++11 doesn't define printf, but does define its semantics to be equal to those of C99's

OSDev.org

C: Unsigned vs signed, short vs int

C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C++ Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int

Re: C: Unsigned vs signed, short vs int