rdos wrote:The issue is that UTF-16 and UCS-2 are bulky representations of characters that are not compatible with simple text-editors.
Any text editor not capable of handling
at least UTF-8 and UTF-16 LE/BE should be ditched. ("Simple"? As in, "even VIM can do Unicode but I cannot be arsed to bother about it"?)
Most strings in a program can be represented with a single byte...
No they can't. You've got a customer in IJmuiden? No? You're lucky. A Danish person sees quite some difference between "gǿr" (barks) and "gør" (does), and good luck finding the former in an 8-bit encoding. (Danish people have become quite used to
not using ǿ because of this, but that's not an excuse.)
I won't even get into China, Japan, or any other language that has > 256 glyphs.
Switching to 16-bit wide characters or supporting both types, requires changes in many instances.
No, because today any software engineer worth his paycheck writes his software Unicode-aware from the beginning. There is no "changing" involved other than in patching stupid, brain-dead software that still wasn't patched, twenty years after Unicode 1.0.
I like to use "char" type for character strings, and to assume char is a byte.
Uh-huh. Easy for you, hard for others. Good design is the other way around.
I do not like custom defined types for strings.
Most languages, including C++, Java, C#, Perl and Python, have native string types that are quite capable of handling UTF-16/UCS-2.
Sorry for the aggressive tone, but I'm
fed up with stuff like this being impressed on the next generation of developers. Unicode is here to stay, has been around for the most parts (if not all) of our development lives, and any half-baked software should be able to handle it.
There shouldn't be a "Hello World" or I/O how-to in the web that does not handle
wide strings.