Re:Loading Arabic Fonts
Posted: Thu Jun 30, 2005 12:36 am
Hi,
Probably one of the most effective techniques for displaying Unicode code points in text modes is to dynamically re-define the font data used.
The basic idea would be to count how many times each code point occurs and then create font data that contains the 256 (or 512) most frequently used code points. For this method, if there's 600 unique code points needed you'd still be able to display most of them - the remaining code points would need to be replaced by an "undisplayable" character (typically a question mark or a square).
Despite this, using a graphical video mode would be far easier and lead to much better results, as graphics modes allow for anti-aliased fonts (better curves), proportional fonts (where some characters are wider than others - e.g. 'W' and 'i'), and allow any number of code points to be displayed (plus windows, menus, icons, etc).
For Arabic in particular, Unicode has 227 code points of which 45 are "combining". AFAIK this means that to display Arabic correctly you'd need to be able to display 182 unique characters. This rough calculation is most likely completely wrong (I know nothing of Arabic). Some notes:
a) There's different dialects of Arabic, and I'm not too sure how many of the Unicode code points are actually needed for a specific dialect. It might be possible to reduce the number of code points needed by only supporting one main dialect.
b) There's "subtending marks", which (I guess) are meant to underline a group of code points, for e.g. a number may consist of a group of numerical digits that are collectively underlined via. the Arabic number sign. With latin characters this might look like "1234," where the underline and comma are meant to represent the Arabic number sign. This would be incredibly difficult to do in text mode. There's actually 4 of these subtending marks, one for footnotes and the remaining 2 called "sanah" and "safha" (not sure where they'd be used). These subtending marks are not straight lines but are curved - you can't just use normal underlining.
Cheers,
Brendan
Probably one of the most effective techniques for displaying Unicode code points in text modes is to dynamically re-define the font data used.
The basic idea would be to count how many times each code point occurs and then create font data that contains the 256 (or 512) most frequently used code points. For this method, if there's 600 unique code points needed you'd still be able to display most of them - the remaining code points would need to be replaced by an "undisplayable" character (typically a question mark or a square).
Despite this, using a graphical video mode would be far easier and lead to much better results, as graphics modes allow for anti-aliased fonts (better curves), proportional fonts (where some characters are wider than others - e.g. 'W' and 'i'), and allow any number of code points to be displayed (plus windows, menus, icons, etc).
For Arabic in particular, Unicode has 227 code points of which 45 are "combining". AFAIK this means that to display Arabic correctly you'd need to be able to display 182 unique characters. This rough calculation is most likely completely wrong (I know nothing of Arabic). Some notes:
a) There's different dialects of Arabic, and I'm not too sure how many of the Unicode code points are actually needed for a specific dialect. It might be possible to reduce the number of code points needed by only supporting one main dialect.
b) There's "subtending marks", which (I guess) are meant to underline a group of code points, for e.g. a number may consist of a group of numerical digits that are collectively underlined via. the Arabic number sign. With latin characters this might look like "1234," where the underline and comma are meant to represent the Arabic number sign. This would be incredibly difficult to do in text mode. There's actually 4 of these subtending marks, one for footnotes and the remaining 2 called "sanah" and "safha" (not sure where they'd be used). These subtending marks are not straight lines but are curved - you can't just use normal underlining.
Cheers,
Brendan