Page 2 of 2

Re: GCC UTF16

Posted: Fri May 07, 2010 11:58 am
by cyr1x
Great to hear it worked :). I was in your position once, too. I was looking for that damned thing for like 2 weeks.
lemonyii wrote:but why don't you tell me earlier?
I didn't notice this thread earlier :(

Re: GCC UTF16

Posted: Fri May 07, 2010 12:35 pm
by Owen
Important note: short wchar is not conforming to the C standard

Re: GCC UTF16

Posted: Fri May 07, 2010 12:58 pm
by Combuster
Owen wrote:Important note: short wchar is not conforming to the C standard
I'm sorry but that's just plain wrong
ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales; the
null character shall have the code value zero.

Re: GCC UTF16

Posted: Fri May 07, 2010 2:26 pm
by Owen
OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)

Re: GCC UTF16

Posted: Fri May 07, 2010 3:31 pm
by Combuster
typedef short int wchar_t implies no unicode support? :roll:

Re: GCC UTF16

Posted: Fri May 07, 2010 3:32 pm
by Owen
ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales
; the
null character shall have the code value zero.
16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)

Re: GCC UTF16

Posted: Fri May 07, 2010 3:42 pm
by Combuster
Never heard of Unicode 3.0?

Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.

Re: GCC UTF16

Posted: Fri May 07, 2010 11:49 pm
by qw
Owen wrote:16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)
UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.

Re: GCC UTF16

Posted: Sat May 08, 2010 1:33 am
by lemonyii
UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.
i support the first part of this sentence,but not the latter part.
what is standard?
UNIX? but windows do not follow.
windows? but X world never appreciate it.
standard is that what can satisfy our needs like TCP/IP, what is really used, not the techonic papers like OSI
standard usually solve many problems but again cause a lot of problems.
to me, standard is what i like best and considered as appliable.

Re: GCC UTF16

Posted: Sat May 08, 2010 4:10 am
by qw
I mean the ISO/IEC 9899:1999 C standard.

Re: GCC UTF16

Posted: Sat May 08, 2010 4:22 am
by Owen
Combuster wrote:Never heard of Unicode 3.0?

Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.
The OP is Chinese. A significant quantity of CJK characters (40,000) are in the Supplementary Ideographic Plane. These characters, while not common, do crop up often (Particularly in names)

Secondly: Unicode 2.0 introduced the supplementary planes. Unicode 1.0 is positively ancient, and implementing such a limited subset of Unicode is positively stupid.

And this all gets even worse when one considers that C1X
  • Includes UTF-16 and UCS-4 support in the forms of char16_t and char32_t
  • Requires these be convertible to wchar_t
  • Requires that wchar_t be able to represent any single Unicode scalar value as a single character
In other words: The only way to sanely do things is to make wchar_t 32-bits.

To the OP: Its a shame C's internationalization handling is a mess; I empathize with you. My suggestion for Unicode handling is ICU, which works very well and is highly competent. I will admit though that it works much better from C++ than from C.

Re: GCC UTF16

Posted: Sat May 08, 2010 8:31 pm
by lemonyii
i browsed ICU. i will read documents later.
thx

Re: GCC UTF16

Posted: Wed May 12, 2010 3:50 am
by Solar
Owen wrote:OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.

Point in case:

Code: Select all

toupper( 'ß' );
Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.

I am sure there are more examples in other languages.

I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.

Re: GCC UTF16

Posted: Wed May 12, 2010 8:25 am
by Brynet-Inc
I recommend a nice warm cup of ASCII, okey doke?

Re: GCC UTF16

Posted: Wed May 12, 2010 9:20 am
by Owen
Solar wrote:
Owen wrote:OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.

Point in case:

Code: Select all

toupper( 'ß' );
Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.

I am sure there are more examples in other languages.

I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.
I never said that C's Unicode handling was sane; much to the contrary. There is also the issue that for many locales "upper" and "lower" cases make no sense!

However, as rudimentary and broken as C's Unicode support is, implementing it in non-compliant ways is not going to help you if/when it becomes decent.

And until then (And I expect, for many purposes, for long after that), for Unicode the best option is ICU.