I didn't notice this thread earlierlemonyii wrote:but why don't you tell me earlier?
GCC UTF16
Re: GCC UTF16
Great to hear it worked . I was in your position once, too. I was looking for that damned thing for like 2 weeks.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
Important note: short wchar is not conforming to the C standard
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: GCC UTF16
I'm sorry but that's just plain wrongOwen wrote:Important note: short wchar is not conforming to the C standard
ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales; the
null character shall have the code value zero.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
OK, so you're declaring your supported locales to not include Unicode?
Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: GCC UTF16
typedef short int wchar_t implies no unicode support?
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales; the
null character shall have the code value zero.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: GCC UTF16
Never heard of Unicode 3.0?
Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.
Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.
Re: GCC UTF16
UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.Owen wrote:16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)
Re: GCC UTF16
i support the first part of this sentence,but not the latter part.UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.
what is standard?
UNIX? but windows do not follow.
windows? but X world never appreciate it.
standard is that what can satisfy our needs like TCP/IP, what is really used, not the techonic papers like OSI
standard usually solve many problems but again cause a lot of problems.
to me, standard is what i like best and considered as appliable.
Enjoy my life!------A fish with a tattooed retina
Re: GCC UTF16
I mean the ISO/IEC 9899:1999 C standard.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
The OP is Chinese. A significant quantity of CJK characters (40,000) are in the Supplementary Ideographic Plane. These characters, while not common, do crop up often (Particularly in names)Combuster wrote:Never heard of Unicode 3.0?
Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.
Secondly: Unicode 2.0 introduced the supplementary planes. Unicode 1.0 is positively ancient, and implementing such a limited subset of Unicode is positively stupid.
And this all gets even worse when one considers that C1X
- Includes UTF-16 and UCS-4 support in the forms of char16_t and char32_t
- Requires these be convertible to wchar_t
- Requires that wchar_t be able to represent any single Unicode scalar value as a single character
To the OP: Its a shame C's internationalization handling is a mess; I empathize with you. My suggestion for Unicode handling is ICU, which works very well and is highly competent. I will admit though that it works much better from C++ than from C.
Re: GCC UTF16
i browsed ICU. i will read documents later.
thx
thx
Enjoy my life!------A fish with a tattooed retina
Re: GCC UTF16
The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.Owen wrote:OK, so you're declaring your supported locales to not include Unicode?
Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
Point in case:
Code: Select all
toupper( 'ß' );
I am sure there are more examples in other languages.
I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.
Every good solution is obvious once you've found it.
- Brynet-Inc
- Member
- Posts: 2426
- Joined: Tue Oct 17, 2006 9:29 pm
- Libera.chat IRC: brynet
- Location: Canada
- Contact:
Re: GCC UTF16
I recommend a nice warm cup of ASCII, okey doke?
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
I never said that C's Unicode handling was sane; much to the contrary. There is also the issue that for many locales "upper" and "lower" cases make no sense!Solar wrote:The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.Owen wrote:OK, so you're declaring your supported locales to not include Unicode?
Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
Point in case:
Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.Code: Select all
toupper( 'ß' );
I am sure there are more examples in other languages.
I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.
However, as rudimentary and broken as C's Unicode support is, implementing it in non-compliant ways is not going to help you if/when it becomes decent.
And until then (And I expect, for many purposes, for long after that), for Unicode the best option is ICU.