internationalisation
Posted: Fri Sep 05, 2003 2:49 pm
has anyone thought of internationalisation in their kernel?? I've looked at UTF-32 and it doesnt fix any of the problems of UTF-16 and UTF-8.
UTF-8 is great if your only using english.. as soon as you use another language, it blows encoding right out to be worthless. utf-16/32 cant even handle chinese properly still... and it can blow the encoding out...
i've got my own internal handling in my os.. and I only have what I need (ie: so problems still exist! haha)
basically it encodes each character as a 32bit number and uses a set of tags for definition...
but like anything, it breaks outside of itself. EG: bios only supports 8bit ascii... so you need your own fonts and display/output system..
then there is mapping internal representation onto font glyphs.. mmmm... (i havnyt got that far....)
my keyboard handler directly encodes it into this format right at the base level.. (key, keyboard flags (caps/shift/ctrl/etc), lang id, sublang id).
its a lot of work tho, each language needs its own sort routine. i disregarded language to language mapping (since you cant map english to chinese, why even try!).
eg:
anyway, just another useless thing I'm wasting my time on...
a language mapping file (character set) also maps localisation (currency sign, clock / date display, etc). mmm...
how has everyone else done internationalisation??
UTF-8 is great if your only using english.. as soon as you use another language, it blows encoding right out to be worthless. utf-16/32 cant even handle chinese properly still... and it can blow the encoding out...
i've got my own internal handling in my os.. and I only have what I need (ie: so problems still exist! haha)
basically it encodes each character as a 32bit number and uses a set of tags for definition...
but like anything, it breaks outside of itself. EG: bios only supports 8bit ascii... so you need your own fonts and display/output system..
then there is mapping internal representation onto font glyphs.. mmmm... (i havnyt got that far....)
my keyboard handler directly encodes it into this format right at the base level.. (key, keyboard flags (caps/shift/ctrl/etc), lang id, sublang id).
its a lot of work tho, each language needs its own sort routine. i disregarded language to language mapping (since you cant map english to chinese, why even try!).
Code: Select all
(a chunk of my header file)
// string must be encapsulated with start/end tags.
// inside each start tag must be language id, language pair
// zero sub lang id is default lang.
// lang tag is PAIR, not single tag.
// each character is 32bit wide.
// eg: <start><eng,australian>......<eng,uk>?<eng,us>aluminum<end>
// all initial tags have a value of equal or less than 255 to 0.
// all characters start at 256 and above up to 2^32.
// gives us a working dictionary range of 4294967040 individual characters.
// undefined behaviour.. when an unknown char in input stream not in
dictionary,
// replace with space? (current implementation does this..)
// mapping character set maps to font glyphs?? truetype?
#define DF32_TAG_END 0x00000000
#define DF32_TAG_START 0x00000001
#define DF32_TAG_SETLANG 0x00000002
#define DF32_LANG_EN 0x00000001
#define DF32_LANG_EN_UK 0x00000000
#define DF32_LANG_EN_AU 0x00000001
#define DF32_LANG_EN_US 0x00000002
#define DF32_LANG_JP 0x00000002
#define DF32_LANG_JP_KANA 0x00000000
#define DF32_LANG_JP_HIRAGANA 0x00000001
#define DF32_LANG_JP_KATAKANA 0x00000002
#define DF32_BASE 0x100
Code: Select all
0x00000001 //start
0x00000002 // set lang
0x00000001 // ENG
0x00000000 // UK (default)
.... // string (32bit characters)
0x00000000 // end
a language mapping file (character set) also maps localisation (currency sign, clock / date display, etc). mmm...
how has everyone else done internationalisation??