GCC UTF16

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

GCC UTF16

Post by lemonyii »

hello!
i'm thinking about using UTF16 as the Internal Code of my OS. but after a long time, i found that gcc don't like it,and i hate gcc for this :?
we can use prefix 'L' in VS to declare that a string is UTF16 but not in gcc..e.g, L"abc".
and i find there is a 'u' act as 'L' in gcc,e.g. u"abc".
but i still can't get it work now even i add -std=gnu99.what's more, i don't like this argument, so i hope to write a macro to replace 'u'.
i hope to know how does the 'u' or 'L' works, and if it is done on compiling(then 'u' is considerable) or running(forget it,i dont like such a function running in my kernel).and i'm not good at preprocess,so i hope someone can help.
i use gcc 4.4.1.
problem: how does 'u' work? how to write a macro converting ascii to UTF16?
thx
Enjoy my life!------A fish with a tattooed retina
User avatar
qw
Member
Member
Posts: 792
Joined: Mon Jan 26, 2009 2:48 am

Re: GCC UTF16

Post by qw »

Tried it, no problem.

DJGPP and MinGW both accept long characters L'X' (even in non-C99 mode). A long character is 2 bytes, so it may be UCS-2 or UTF-16, haven't tested that.

It's probably dependent on the target environment your compiler is built for.
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

during these hours i read and tried some more.
yes,L"XXX" is really supported.and in windows environment it is 16bits, but 32bits in linux.
now i'm trying to make it 16bits in linux.
it seems that gcc has supported utf16 since 4.4 but i don't know how to use it without change too much.
and it seems that the u"xxx" is designed for this but not work on my machine :|
trying.
thx
Enjoy my life!------A fish with a tattooed retina
User avatar
Combuster
Member
Member
Posts: 9301
Joined: Wed Oct 18, 2006 3:45 am
Libera.chat IRC: [com]buster
Location: On the balcony, where I can actually keep 1½m distance
Contact:

Re: GCC UTF16

Post by Combuster »

I've tried out a bunch of crosscompilers, u"..." is unsupported in 3.4.4 and 4.2.2 as well, and seems to default to 32-bits in all cases. I even tried the freebasic compiler, and it can only output one wide character standard at any run (but it can be changed by passing -target windows or -target linux, which in turn has a bunch of other unwanted sideeffects :( ).

I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

yeah...
but there is no reason to store so many 0s in a string or use utf8 to slow down my system,right?
how i hope i can get u"xxx" work!
but it can be changed by passing -target windows or -target linux, which in turn has a bunch of other unwanted sideeffects :(
yeah,that's what i care most.
i've been searching the L"xxx" macro prototype for long. i think i can write one myself or copy it somewhere.
but the problem is, search L as the keyword will get so many results!
UTF16,UTF16!
thx!
Enjoy my life!------A fish with a tattooed retina
User avatar
qw
Member
Member
Posts: 792
Joined: Mon Jan 26, 2009 2:48 am

Re: GCC UTF16

Post by qw »

I'm afraid you have to configure your own GCC build. That's not something I can help you with, unfortunately.

EDIT: Strike this, I see you already found your answer.
Last edited by qw on Mon May 03, 2010 3:40 am, edited 1 time in total.
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

http://blogs.oracle.com/ezannoni/2008/0 ... n_gcc.html
this url is why i have been searching all day.
the reason why i use utf16 but not utf8 is that English is not my native language(and that's the reason why i search so slow).utf16 will greatly improve the speed in indexing the font set in my future GUI and limit the memory occupation.Obviously,if not for historical reason,UNIX and LINUX will also choose utf16 like winNT,right?
so, i really hope i can use utf16.
the url above will stick me keep searching.
help
thx
merci
谢谢!
Enjoy my life!------A fish with a tattooed retina
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

Hobbes wrote:I'm afraid you have to configure your own GCC build.
i think...unneccessarily!
the url above proved that gcc is not so weak.
and rebuild is really a tough thing which i have never done.
i will keep searching.
thanks!
Enjoy my life!------A fish with a tattooed retina
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

Combuster wrote:I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
Please. C's Unicode/l10n/i18n support is about the worst implementation of it I have ever seen. It is fundamentally broken.

And all the decent internationalization libraries use UTF-16
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

so,do you have some good advise?
say, how to use UTF16 in gcc?
thx
Enjoy my life!------A fish with a tattooed retina
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: GCC UTF16

Post by JamesM »

Owen wrote:
Combuster wrote:I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
Please. C's Unicode/l10n/i18n support is about the worst implementation of it I have ever seen. It is fundamentally broken.

And all the decent internationalization libraries use UTF-16
Unicode characters are 20 bits long. For future compatibility I'd use UTF32 as my internal representation. Fixed width AND can store the entire unicode character set.

That's one thumbs-up from me!

lemonyii: GCC is designed for UNIX environments that do not use UTF-16. The entire userspace was built around ASCII, and UTF-8 is fully backwards compatible with ASCII. They're never going to change any time soon - this is why I'm not surprised there is no UTF16 support.

To be more helpful, I would create a small script in the language of your choice that recognises a u" ... " sequence and transforms it into the byte-ified equivalent in UTF-16. For example:

u"ab" -> "\00\65\00\66"

(my calculations are guesswork as to the UTF-16 representation, but you get the idea)
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

maybe ... :idea: UTF32 is a good choice!
think about it... it will waste about 1M in my kernel when it grow to about 4M (several years later), 20M in my system at about 100M.
that's really nothing for a 64bit system with a memory above 4G. [-o<
but it will really take some time to accept this.
i am hesitating...
thank you all!

POST END
Enjoy my life!------A fish with a tattooed retina
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: GCC UTF16

Post by Owen »

JamesM wrote: Unicode characters are 20 bits long. For future compatibility I'd use UTF32 as my internal representation. Fixed width AND can store the entire unicode character set.
Unicode characters are of variable (potentially infinite) length, in the form of a base character plus combining marks. In UCS-4, these are still variable length. Additionally, Unicode warrant that no character will ever be introduced which cannot be represented in UTF-16.

Unicode Scalar Values however are actually 21bits, from one base multilingual plane plus 16 "astral planes", each plane comprising 65536 scalar values, some of which are reserved for special purposes.

As I see it, you are going to be doing one of the following:
  • Binary string equality checks. Ignores character sets.
  • String sorts and linguistic/uniform collations. Needs to work with full characters. Whatever UTF you use, you are going to have to interpret it
  • Render text to a graphic. Whether using UCS-4 or UTF-16, you're going to have a big state machine here.
For the first, less memory bandwidth is used on average (Astral plane characters are very rare), and compensates for the extra instructions required in order to produce in general a speedup. For the second, everything is about equal because you're going to have to reference a mass of tables.
cyr1x
Member
Member
Posts: 207
Joined: Tue Aug 21, 2007 1:41 am
Location: Germany

Re: GCC UTF16

Post by cyr1x »

Maybe you're looking for the '-fshort-wchar' compiler switch?
User avatar
lemonyii
Member
Member
Posts: 153
Joined: Thu Mar 25, 2010 11:28 pm
Location: China

Re: GCC UTF16

Post by lemonyii »

thank you so much, but why don't you tell me earlier?
i have been using UTF32 for some time,but i turned them into UTF16 just now.
luckly not too much work. it is so easy, and will spare much memory for me =D>
thx
Enjoy my life!------A fish with a tattooed retina
Post Reply