C: Unsigned vs signed, short vs int

Programming, for all ages and all languages.
Post Reply
suslik
Member
Member
Posts: 45
Joined: Sun May 27, 2012 1:00 am
Location: Russia

C: Unsigned vs signed, short vs int

Post by suslik »

In the 3d edition of C Programming Language Mr Stroustrup said: "Prefer a plain int over a short int or a long int" and "Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea". Why? I have made some investigation and found these:

1) Do not mix unsigned and signed in the arithmetic expressions, since if there is a comparison
then you can have a bug. For example:

Code: Select all

void foo(void)
{
unsigned int a = 6;
int b = -20;
(a + b > 6) ? puts(">6") : puts("<=6");
}
This function always print ">6", since b is converted to unsigned then added with a (result is a huge value since is treated as unsigned) and finally compared with unsigned 6.

To fix this replace "a + b > 6" with "(int)a + b > 6". But to guard yourself from this annoying bug always use signed type for variables that you want to use as numbers (and do all math you need in obvious way) and use unsigned type only if you intend to use this variable as bitmask.

OK, it is clear for me now.

2) Choose the minimal integer type that can hold all possible values of your variable, i.e: if my variable will be 0..300 that I should use short. 0..65535 then I should use int (not unsigned short! See item1) . There is one exception in this rule: use int (which represents word size of CPU) for loop counters and similar (i.e. for variables that are used in loops).

I agree with this. But what about putc (int ch)? In docs I see "ch will be converted to unsigned char" Why not "putc (unsigned char ch)"? And often I see "outportb(int port, unsigned val)" instead of "outportb(int port, unsigned char val)". The second variant looks natural for me. May be they want to improve a performance (to get the unsigned char arg from the stack into 32-bit register compiler need generate movzx instead of mov)?
User avatar
pauldinhqd
Member
Member
Posts: 37
Joined: Tue Jul 12, 2011 9:14 am
Location: Hanoi
Contact:

Re: C: Unsigned vs signed, short vs int

Post by pauldinhqd »

suslik wrote:In the 3d edition of C Programming Language Mr Stroustrup said: "Prefer a plain int over a short int or a long int" and "Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea". Why? I have made some investigation and found these:

1) Do not mix unsigned and signed in the arithmetic expressions, since if there is a comparison
then you can have a bug. For example:

Code: Select all

void foo(void)
{
unsigned int a = 6;
int b = -20;
(a + b > 6) ? puts(">6") : puts("<=6");
}
This function always print ">6", since b is converted to unsigned then added with a (result is a huge value since is treated as unsigned) and finally compared with unsigned 6.

To fix this replace "a + b > 6" with "(int)a + b > 6". But to guard yourself from this annoying bug always use signed type for variables that you want to use as numbers (and do all math you need in obvious way) and use unsigned type only if you intend to use this variable as bitmask.

OK, it is clear for me now.

2) Choose the minimal integer type that can hold all possible values of your variable, i.e: if my variable will be 0..300 that I should use short. 0..65535 then I should use int (not unsigned short! See item1) . There is one exception in this rule: use int (which represents word size of CPU) for loop counters and similar (i.e. for variables that are used in loops).

I agree with this. But what about putc (int ch)? In docs I see "ch will be converted to unsigned char" Why not "putc (unsigned char ch)"? And often I see "outportb(int port, unsigned val)" instead of "outportb(int port, unsigned char val)". The second variant looks natural for me. May be they want to improve a performance (to get the unsigned char arg from the stack into 32-bit register compiler need generate movzx instead of mov)?
(1) is an interesting example which may cause a bug, i believe compiler should raise warning whenever doing operation on variables of different types, instead of just implicitly convert the values. however as i know, cpu doesn't know about the sign, it uses two's compliment.

(2) putc(int ch) is using 32bit number instead of byte, possibly for extensibility? for unicode? (<--just joking).

and i have another discussion here:

(3) why use "char*" to present string, instead of "unsigned char*"? can a character be negative?
Last edited by pauldinhqd on Thu Aug 23, 2012 9:42 am, edited 1 time in total.
AMD Sempron 140
nVidia GTS 450
Transcend DDR2 2x1
LG Flatron L1742SE
User avatar
turdus
Member
Member
Posts: 496
Joined: Tue Feb 08, 2011 1:58 pm

Re: C: Unsigned vs signed, short vs int

Post by turdus »

I think Mr Stroustrup is wrong. Use singed integer if the represented value can be negative, and use unsigned if only positive numbers make sense.
The advice "always use signed int for numbers" leads to funny errors, like negative memory addresses in E820 map, only 2G storage capacity instead of 4G, and so on. Bad habit. You should always consider what the variable is for, what it represents, and choose singed/unsigned accordingly.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: C: Unsigned vs signed, short vs int

Post by Owen »

The use of int for characters throughout the C standard library is a matter of history - remember, K&R C didn't have prototypes
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: C: Unsigned vs signed, short vs int

Post by JamesM »

The advice "always use signed int for numbers" leads to funny errors,
Also, signed integers cause more optimisation problems. Zero-extending is more efficient in many cases (can be short-circuited) than sign extending, for example.
suslik
Member
Member
Posts: 45
Joined: Sun May 27, 2012 1:00 am
Location: Russia

Re: C: Unsigned vs signed, short vs int

Post by suslik »

why use "char*" to present string, instead of "unsigned char*"? can a character be negative?
- since "char - a singe byte, capable of holding one character in the local character set" (K&R C) So, char *str - is natural way for declaring string.
Use singed integer if the represented value can be negative, and use unsigned if only positive numbers make sense.
- In this case you can miss difficult-to-locate error while mixing signed with unsigned in one expression.
The use of int for characters throughout the C standard library is a matter of history - remember, K&R C didn't have prototypes
- I suspect the same, but I've no clear explanation.
User avatar
Griwes
Member
Member
Posts: 374
Joined: Sat Jul 30, 2011 10:07 am
Libera.chat IRC: Griwes
Location: Wrocław/Racibórz, Poland
Contact:

Re: C: Unsigned vs signed, short vs int

Post by Griwes »

why use "char*" to present string, instead of "unsigned char*"? can a character be negative?
IDK about C at all, but in C++, char, signed char and unsigned char are three distinct types; default way to print "char" in standard streams is to print character encoded by it; (un)signed char is always treated as a number.
Reaver Project :: Repository :: Ohloh project page
<klange> This is a horror story about what happens when you need a hammer and all you have is the skulls of the damned.
<drake1> as long as the lock is read and modified by atomic operations
User avatar
Love4Boobies
Member
Member
Posts: 2111
Joined: Fri Mar 07, 2008 5:36 pm
Location: Bucharest, Romania

Re: C++ Unsigned vs signed, short vs int

Post by Love4Boobies »

First of all, I see no one has made an extremely important point: This thread is about C++, not C (I took the liberty to change the thread's title but the OP still talks about C). They are different languages; C isn't a subset of C++. If your C code even compiles with a C++ compiler, it will likely not mean the same thing because there are many semantic differences between the two languages. That said, my answers will be about C, not C++, because I know the former much better.
suslik wrote:To fix this replace "a + b > 6" with "(int)a + b > 6".
You can never do this in C; that overflow has undefined behavior.
suslik wrote:2) Choose the minimal integer type that can hold all possible values of your variable, i.e: if my variable will be 0..300 that I should use short. 0..65535 then I should use int (not unsigned short! See item1) . There is one exception in this rule: use int (which represents word size of CPU) for loop counters and similar (i.e. for variables that are used in loops).
This generalization is nonsense; it will lead to all sorts of problems. The thing about the loop counter is just as terrible---use whatever you need and don't worry about performance (let the compiler optimize it away; C itself makes no performance requirements). For instance, your loop may require size_t because you're using the counter to index an array, possibly for string processing.
suslik wrote:I agree with this. But what about putc (int ch)? In docs I see "ch will be converted to unsigned char" Why not "putc (unsigned char ch)"?
The rationale is that putchar/putc/fputc's argument was made to match getchar/getc/fgetc's return value, which is int due to the fact that it may also return EOF. Note that putc does not have the prototype you wrote there.
pauldinhqd wrote:(3) why use "char*" to present string, instead of "unsigned char*"? can a character be negative?
Yes. Also, note that char is not the same as signed char. It can refer to either a signed or an unsigned type, depending on the C implementation (e.g., GCC allows for both). This is not true for other types, like int.
turdus wrote:The advice "always use signed int for numbers" leads to funny errors, like negative memory addresses in E820 map
That's true, but the example isn't a good one because you should be using fixed-width types there to begin with, such as uint32_t.
suslik wrote:- I suspect the same, but I've no clear explanation.
This is possibly (it was never officialy explained) the reason for which there are no char literals (e.g., unlike in C++, 'a' is an int literal). Something like toupper('c') in traditional C (i.e., pre-C89) caused an int to be passed. So as not to break older code, they did not make 'c' a char because, without prototypes, converting the value to the argument's expected type can be problematic.

EDIT: Damn, Griwes made the comment about char's default signedness as I was typing this response.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
OSwhatever
Member
Member
Posts: 595
Joined: Mon Jul 05, 2010 4:15 pm

Re: C: Unsigned vs signed, short vs int

Post by OSwhatever »

C++ is evidence enough to show that Mr Stroustrup isn't always right. In low level programming, using unsigned integer is usually important as indexes, offsets and addresses cannot be negative and therefore I tend to use unsigned integers quite often. Basically, where it is natural to use unsigned. Now because C and C++ default constants in expressions as signed unless you explicitly define them as unsigned, you have to look out. Often it works without paying attention but sometimes you can get into trouble because of it.

I think Mr Stroustrup means mostly that you should use signed in application programming just as Java removed unsigned completely because we mortals cannot deal with the signed/unsigned dilemma.
User avatar
Love4Boobies
Member
Member
Posts: 2111
Joined: Fri Mar 07, 2008 5:36 pm
Location: Bucharest, Romania

Re: C: Unsigned vs signed, short vs int

Post by Love4Boobies »

In C, you can use negative indices as long as you don't go out of bounds. I don't know about C++, but I suspect it's the same.
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
cxzuk
Member
Member
Posts: 164
Joined: Mon Dec 21, 2009 6:03 pm

Re: C: Unsigned vs signed, short vs int

Post by cxzuk »

if C++ is a superset of C. why is C not considered a subset?
User avatar
Love4Boobies
Member
Member
Posts: 2111
Joined: Fri Mar 07, 2008 5:36 pm
Location: Bucharest, Romania

Re: C: Unsigned vs signed, short vs int

Post by Love4Boobies »

C++ is not a superset of C. There is a subset that is common to both but most C code won't compile with a C++ compiler or it won't have the same meaning. Here is a program that illustrates some of the problems (some will result in compilation errors, some in different output) but there are plenty of other examples:

Code: Select all

int x[FOO];

int main(void)
{
    struct x { char bar; } *y = malloc(sizeof *y); // Illegal without a cast in C++; good practice in C.

    printf(             // Implicit declaration; won't work in C++.
        "%zu %zu\n",    // The appropriate format specifier doesn't exist in C++.
        sizeof 'a',     // This is sizeof (int) in C and sizeof (char) in C++.
        sizeof (x)      // This is the size of the array in C and the size of the struct type in C++.
    );

    return 0;
}
"Computers in the future may weigh no more than 1.5 tons.", Popular Mechanics (1949)
[ Project UDI ]
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: C: Unsigned vs signed, short vs int

Post by Owen »

Code: Select all

        "%zu %zu\n",    // The appropriate format specifier doesn't exist in C++.
In C++97/C++03 ;) C++11 doesn't define printf, but does define its semantics to be equal to those of C99's
Post Reply