GCC UTF16

cyr1x · Post by **cyr1x** » Fri May 07, 2010 11:58 am

Great to hear it worked

. I was in your position once, too. I was looking for that damned thing for like 2 weeks.

lemonyii wrote:but why don't you tell me earlier?

I didn't notice this thread earlier

Owen · Post by **Owen** » Fri May 07, 2010 12:35 pm

Important note: short wchar is not conforming to the C standard

Combuster · Post by **Combuster** » Fri May 07, 2010 12:58 pm

Owen wrote:Important note: short wchar is not conforming to the C standard

I'm sorry but that's just plain wrong

ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales; the
null character shall have the code value zero.

Owen · Post by **Owen** » Fri May 07, 2010 2:26 pm

OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)

Combuster · Post by **Combuster** » Fri May 07, 2010 3:31 pm

typedef short int wchar_t implies no unicode support?

Owen · Post by **Owen** » Fri May 07, 2010 3:32 pm

ISO C99 standard wrote:wchar_t
which is an integer type whose range of values can represent distinct codes for all
members of the largest extended character set specified among the supported locales; the
null character shall have the code value zero.

16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)

Combuster · Post by **Combuster** » Fri May 07, 2010 3:42 pm

Never heard of Unicode 3.0?

Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.

qw · Post by qw » Fri May 07, 2010 11:49 pm

Owen wrote:16 bits cannot represent all Unicode characters - the minimum integer size capable of doing so is 22-bit (and therefore minimum practical size is 32-bit)

UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.

lemonyii · Post by **lemonyii** » Sat May 08, 2010 1:33 am

UTF-16 can, though I understand that UTF-16 does not confrom to the C standard.

i support the first part of this sentence,but not the latter part.
what is standard?
UNIX? but windows do not follow.
windows? but X world never appreciate it.
standard is that what can satisfy our needs like TCP/IP, what is really used, not the techonic papers like OSI
standard usually solve many problems but again cause a lot of problems.
to me, standard is what i like best and considered as appliable.

qw · Post by qw » Sat May 08, 2010 4:10 am

I mean the ISO/IEC 9899:1999 C standard.

Owen · Post by **Owen** » Sat May 08, 2010 4:22 am

Combuster wrote:Never heard of Unicode 3.0?

Point here is, 16-bit unicode supports more than you'll find being used in your OS (at least, for those who lack asian influences). And I think you might want to be a bit more careful with stating popular belief as fact. The world isn't that black and white.

The OP is Chinese. A significant quantity of CJK characters (40,000) are in the Supplementary Ideographic Plane. These characters, while not common, do crop up often (Particularly in names)

Secondly: Unicode 2.0 introduced the supplementary planes. Unicode 1.0 is positively ancient, and implementing such a limited subset of Unicode is positively stupid.

And this all gets even worse when one considers that C1X

Includes UTF-16 and UCS-4 support in the forms of char16_t and char32_t
Requires these be convertible to wchar_t
Requires that wchar_t be able to represent any single Unicode scalar value as a single character

In other words: The only way to sanely do things is to make wchar_t 32-bits.

To the OP: Its a shame C's internationalization handling is a mess; I empathize with you. My suggestion for Unicode handling is ICU, which works very well and is highly competent. I will admit though that it works much better from C++ than from C.

lemonyii · Post by **lemonyii** » Sat May 08, 2010 8:31 pm

i browsed ICU. i will read documents later.
thx

Solar · Post by **Solar** » Wed May 12, 2010 3:50 am

Owen wrote:OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)

The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.

Point in case:

Code: Select all

toupper( 'ß' );

Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.

I am sure there are more examples in other languages.

I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.

Brynet-Inc · Post by **Brynet-Inc** » Wed May 12, 2010 8:25 am

I recommend a nice warm cup of ASCII, okey doke?

Owen · Post by **Owen** » Wed May 12, 2010 9:20 am

Solar wrote:
Owen wrote:OK, so you're declaring your supported locales to not include Unicode?

Have fun with that. (Particularly when UTF-16/32 are most likely going to be incorporated into C1X)
The onboard toolbox of C99 does not allow for proper and full support of locales, no matter what you define wchar_t to or whether you support unicode.

Point in case:
Code: Select all
toupper( 'ß' );
Correct answer would be "SS", but toupper() (as well as it's wide counterpart towupper()) can only return one character.

I am sure there are more examples in other languages.

I'm looking forward to C1X introducting Unicode support into the standard, but you can only get that close with C99.

I never said that C's Unicode handling was sane; much to the contrary. There is also the issue that for many locales "upper" and "lower" cases make no sense!

However, as rudimentary and broken as C's Unicode support is, implementing it in non-compliant ways is not going to help you if/when it becomes decent.

And until then (And I expect, for many purposes, for long after that), for Unicode the best option is ICU.

OSDev.org

GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16

Re: GCC UTF16