GCC UTF16
GCC UTF16
hello!
i'm thinking about using UTF16 as the Internal Code of my OS. but after a long time, i found that gcc don't like it,and i hate gcc for this
we can use prefix 'L' in VS to declare that a string is UTF16 but not in gcc..e.g, L"abc".
and i find there is a 'u' act as 'L' in gcc,e.g. u"abc".
but i still can't get it work now even i add -std=gnu99.what's more, i don't like this argument, so i hope to write a macro to replace 'u'.
i hope to know how does the 'u' or 'L' works, and if it is done on compiling(then 'u' is considerable) or running(forget it,i dont like such a function running in my kernel).and i'm not good at preprocess,so i hope someone can help.
i use gcc 4.4.1.
problem: how does 'u' work? how to write a macro converting ascii to UTF16?
thx
i'm thinking about using UTF16 as the Internal Code of my OS. but after a long time, i found that gcc don't like it,and i hate gcc for this
we can use prefix 'L' in VS to declare that a string is UTF16 but not in gcc..e.g, L"abc".
and i find there is a 'u' act as 'L' in gcc,e.g. u"abc".
but i still can't get it work now even i add -std=gnu99.what's more, i don't like this argument, so i hope to write a macro to replace 'u'.
i hope to know how does the 'u' or 'L' works, and if it is done on compiling(then 'u' is considerable) or running(forget it,i dont like such a function running in my kernel).and i'm not good at preprocess,so i hope someone can help.
i use gcc 4.4.1.
problem: how does 'u' work? how to write a macro converting ascii to UTF16?
thx
Enjoy my life!------A fish with a tattooed retina
Re: GCC UTF16
Tried it, no problem.
DJGPP and MinGW both accept long characters L'X' (even in non-C99 mode). A long character is 2 bytes, so it may be UCS-2 or UTF-16, haven't tested that.
It's probably dependent on the target environment your compiler is built for.
DJGPP and MinGW both accept long characters L'X' (even in non-C99 mode). A long character is 2 bytes, so it may be UCS-2 or UTF-16, haven't tested that.
It's probably dependent on the target environment your compiler is built for.
Re: GCC UTF16
during these hours i read and tried some more.
yes,L"XXX" is really supported.and in windows environment it is 16bits, but 32bits in linux.
now i'm trying to make it 16bits in linux.
it seems that gcc has supported utf16 since 4.4 but i don't know how to use it without change too much.
and it seems that the u"xxx" is designed for this but not work on my machine
trying.
thx
yes,L"XXX" is really supported.and in windows environment it is 16bits, but 32bits in linux.
now i'm trying to make it 16bits in linux.
it seems that gcc has supported utf16 since 4.4 but i don't know how to use it without change too much.
and it seems that the u"xxx" is designed for this but not work on my machine
trying.
thx
Enjoy my life!------A fish with a tattooed retina
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: GCC UTF16
I've tried out a bunch of crosscompilers, u"..." is unsupported in 3.4.4 and 4.2.2 as well, and seems to default to 32-bits in all cases. I even tried the freebasic compiler, and it can only output one wide character standard at any run (but it can be changed by passing -target windows or -target linux, which in turn has a bunch of other unwanted sideeffects ).
I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
Re: GCC UTF16
yeah...
but there is no reason to store so many 0s in a string or use utf8 to slow down my system,right?
how i hope i can get u"xxx" work!
i've been searching the L"xxx" macro prototype for long. i think i can write one myself or copy it somewhere.
but the problem is, search L as the keyword will get so many results!
UTF16,UTF16!
thx!
but there is no reason to store so many 0s in a string or use utf8 to slow down my system,right?
how i hope i can get u"xxx" work!
yeah,that's what i care most.but it can be changed by passing -target windows or -target linux, which in turn has a bunch of other unwanted sideeffects
i've been searching the L"xxx" macro prototype for long. i think i can write one myself or copy it somewhere.
but the problem is, search L as the keyword will get so many results!
UTF16,UTF16!
thx!
Enjoy my life!------A fish with a tattooed retina
Re: GCC UTF16
I'm afraid you have to configure your own GCC build. That's not something I can help you with, unfortunately.
EDIT: Strike this, I see you already found your answer.
EDIT: Strike this, I see you already found your answer.
Last edited by qw on Mon May 03, 2010 3:40 am, edited 1 time in total.
Re: GCC UTF16
http://blogs.oracle.com/ezannoni/2008/0 ... n_gcc.html
this url is why i have been searching all day.
the reason why i use utf16 but not utf8 is that English is not my native language(and that's the reason why i search so slow).utf16 will greatly improve the speed in indexing the font set in my future GUI and limit the memory occupation.Obviously,if not for historical reason,UNIX and LINUX will also choose utf16 like winNT,right?
so, i really hope i can use utf16.
the url above will stick me keep searching.
help
thx
merci
谢谢!
this url is why i have been searching all day.
the reason why i use utf16 but not utf8 is that English is not my native language(and that's the reason why i search so slow).utf16 will greatly improve the speed in indexing the font set in my future GUI and limit the memory occupation.Obviously,if not for historical reason,UNIX and LINUX will also choose utf16 like winNT,right?
so, i really hope i can use utf16.
the url above will stick me keep searching.
help
thx
merci
谢谢!
Enjoy my life!------A fish with a tattooed retina
Re: GCC UTF16
i think...unneccessarily!Hobbes wrote:I'm afraid you have to configure your own GCC build.
the url above proved that gcc is not so weak.
and rebuild is really a tough thing which i have never done.
i will keep searching.
thanks!
Enjoy my life!------A fish with a tattooed retina
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
Please. C's Unicode/l10n/i18n support is about the worst implementation of it I have ever seen. It is fundamentally broken.Combuster wrote:I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
And all the decent internationalization libraries use UTF-16
Re: GCC UTF16
so,do you have some good advise?
say, how to use UTF16 in gcc?
thx
say, how to use UTF16 in gcc?
thx
Enjoy my life!------A fish with a tattooed retina
Re: GCC UTF16
Unicode characters are 20 bits long. For future compatibility I'd use UTF32 as my internal representation. Fixed width AND can store the entire unicode character set.Owen wrote:Please. C's Unicode/l10n/i18n support is about the worst implementation of it I have ever seen. It is fundamentally broken.Combuster wrote:I suggest you stick to wchar and wctype - they are meant to cover for the variation in standards the compiler uses. If you really need UTF-16, you'd have to convert it. There's no reason gcc or any compiler should use that standard in the next release.
And all the decent internationalization libraries use UTF-16
That's one thumbs-up from me!
lemonyii: GCC is designed for UNIX environments that do not use UTF-16. The entire userspace was built around ASCII, and UTF-8 is fully backwards compatible with ASCII. They're never going to change any time soon - this is why I'm not surprised there is no UTF16 support.
To be more helpful, I would create a small script in the language of your choice that recognises a u" ... " sequence and transforms it into the byte-ified equivalent in UTF-16. For example:
u"ab" -> "\00\65\00\66"
(my calculations are guesswork as to the UTF-16 representation, but you get the idea)
Re: GCC UTF16
maybe ... UTF32 is a good choice!
think about it... it will waste about 1M in my kernel when it grow to about 4M (several years later), 20M in my system at about 100M.
that's really nothing for a 64bit system with a memory above 4G.
but it will really take some time to accept this.
i am hesitating...
thank you all!
POST END
think about it... it will waste about 1M in my kernel when it grow to about 4M (several years later), 20M in my system at about 100M.
that's really nothing for a 64bit system with a memory above 4G.
but it will really take some time to accept this.
i am hesitating...
thank you all!
POST END
Enjoy my life!------A fish with a tattooed retina
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: GCC UTF16
Unicode characters are of variable (potentially infinite) length, in the form of a base character plus combining marks. In UCS-4, these are still variable length. Additionally, Unicode warrant that no character will ever be introduced which cannot be represented in UTF-16.JamesM wrote: Unicode characters are 20 bits long. For future compatibility I'd use UTF32 as my internal representation. Fixed width AND can store the entire unicode character set.
Unicode Scalar Values however are actually 21bits, from one base multilingual plane plus 16 "astral planes", each plane comprising 65536 scalar values, some of which are reserved for special purposes.
As I see it, you are going to be doing one of the following:
- Binary string equality checks. Ignores character sets.
- String sorts and linguistic/uniform collations. Needs to work with full characters. Whatever UTF you use, you are going to have to interpret it
- Render text to a graphic. Whether using UCS-4 or UTF-16, you're going to have a big state machine here.
Re: GCC UTF16
Maybe you're looking for the '-fshort-wchar' compiler switch?
Re: GCC UTF16
thank you so much, but why don't you tell me earlier?
i have been using UTF32 for some time,but i turned them into UTF16 just now.
luckly not too much work. it is so easy, and will spare much memory for me
thx
i have been using UTF32 for some time,but i turned them into UTF16 just now.
luckly not too much work. it is so easy, and will spare much memory for me
thx
Enjoy my life!------A fish with a tattooed retina