GCC UTF16

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
cyr1x
Member
Member
Posts: 207
Joined: Tue Aug 21, 2007 1:41 am
Location: Germany

Re: GCC UTF16

Post by cyr1x »

Great to hear it worked :). I was in your position once, too. I was looking for that damned thing for like 2 weeks.
lemonyii wrote:but why don't you tell me earlier?
I didn't notice this thread earlier :(
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

Important note: short wchar is not conforming to the C standard
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: GCC UTF16

Post by Combuster »

Owen wrote:Important note: short wchar is not conforming to the C standard
I'm sorry but that's just plain wrong
ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales; the
null character shall have the code value zero.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: GCC UTF16

Post by Combuster »

typedef short int wchar_t implies no unicode support? :roll:
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales
; the
null character shall have the code value zero.
16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: GCC UTF16

Post by Combuster »

Never heard of Unicode 3.0?

Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
qw
Member
Member
Posts: 792
Joined: Mon Jan 26, 2009 2:48 am

Re: GCC UTF16

Post by qw »

Owen wrote:16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)
UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.
i support the first part of this sentence,but not the latter part.
what is standard?
UNIX? but windows do not follow.
windows? but X world never appreciate it.
standard is that what can satisfy our needs like TCP/IP, what is really used, not the techonic papers like OSI
standard usually solve many problems but again cause a lot of problems.
to me, standard is what i like best and considered as appliable.
Enjoy my life!------A fish with a tattooed retina
User avatar
qw
Member
Member
Posts: 792
Joined: Mon Jan 26, 2009 2:48 am

Re: GCC UTF16

Post by qw »

I mean the ISO/IEC 9899:1999 C standard.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

Combuster wrote:Never heard of Unicode 3.0?

Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.
The OP is Chinese. A significant quantity of CJK characters (40,000) are in the Supplementary Ideographic Plane. These characters, while not common, do crop up often (Particularly in names)

Secondly: Unicode 2.0 introduced the supplementary planes. Unicode 1.0 is positively ancient, and implementing such a limited subset of Unicode is positively stupid.

And this all gets even worse when one considers that C1X
  • Includes UTF-16 and UCS-4 support in the forms of char16_t and char32_t
  • Requires these be convertible to wchar_t
  • Requires that wchar_t be able to represent any single Unicode scalar value as a single character
In other words: The only way to sanely do things is to make wchar_t 32-bits.

To the OP: Its a shame C's internationalization handling is a mess; I empathize with you. My suggestion for Unicode handling is ICU, which works very well and is highly competent. I will admit though that it works much better from C++ than from C.
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

i browsed ICU. i will read documents later.
thx
Enjoy my life!------A fish with a tattooed retina
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: GCC UTF16

Post by Solar »

Owen wrote:OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.

Point in case:

Code: Select all

toupper( 'ß' );
Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.

I am sure there are more examples in other languages.

I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.
Every good solution is obvious once you've found it.
User avatar
Brynet-Inc
Member
Member
Posts: 2426
Joined: Tue Oct 17, 2006 9:29 pm
Libera.chat IRC: brynet
Location: Canada
Contact:

Re: GCC UTF16

Post by Brynet-Inc »

I recommend a nice warm cup of ASCII, okey doke?
Image
Twitter: @canadianbryan. Award by smcerm, I stole it. Original was larger.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

Solar wrote:
Owen wrote:OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.

Point in case:

Code: Select all

toupper( 'ß' );
Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.

I am sure there are more examples in other languages.

I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.
I never said that C's Unicode handling was sane; much to the contrary. There is also the issue that for many locales "upper" and "lower" cases make no sense!

However, as rudimentary and broken as C's Unicode support is, implementing it in non-compliant ways is not going to help you if/when it becomes decent.

And until then (And I expect, for many purposes, for long after that), for Unicode the best option is ICU.
Post Reply