Page 1 of 6

Implementing non-English language in OS

Posted: Thu Mar 03, 2016 8:41 am
by osdever
My friend tells me that my OS is more interested by Russians, so I need to implement the Russian language in it. I don't agree - who need Russian in alpha-stage OS? So, the question is - who I need to listen - me or my friend?

Re: Implementing non-English language in OS

Posted: Thu Mar 03, 2016 9:12 am
by iansjack
Listen to yourself.

Re: Implementing non-English language in OS

Posted: Thu Mar 03, 2016 9:24 am
by osdever
OK, thanks.

Re: Implementing non-English language in OS

Posted: Thu Mar 03, 2016 10:17 am
by Combuster
It can be a pain to introduce internationalisation later in development if precautions aren't taken early, and the alpha stage might be a good moment to try it out. But in the end it's just as iansjack says: Your project, your rules.

Psst, you can also look at it differently: it's worth several invisible tokens to beat the master on a niche subject and be really bilingual - who here can say they have better user support than for instance Brendan's OS? ; ) - of course, only if you're interested

Re: Implementing non-English language in OS

Posted: Thu Mar 03, 2016 12:33 pm
by osdever
I have a UTF-8 and Russian fonts now, but the second isn't used, and the first is very glitchy.

Re: Implementing non-English language in OS

Posted: Thu Mar 03, 2016 3:00 pm
by dseller
I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.

Re: Implementing non-English language in OS

Posted: Fri Mar 04, 2016 4:07 am
by Kevin
dseller wrote:I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.
If you're not really careful, the problem with that is that you'll only notice the problems in your framework when it's too late. Implementing only one language means that you're prone to write your code as if other languages were just English with different words. But they aren't.

Re: Implementing non-English language in OS

Posted: Fri Mar 04, 2016 7:20 am
by osdever
dseller wrote:I think if you intend to support foreign character sets and localization, you should at least keep it in mind while designing your code. Maybe make a framework to support this stuff and then just only write an English implementation for now? This way you can always implement Russian when you feel like it.
It's done a long time ago.

Re: Implementing non-English language in OS

Posted: Thu Mar 10, 2016 8:26 am
by Candy
I picked the name for my OS project to be specifically impossible to write until UTF8 support was properly added. It's Rødvin. That said, I've also implemented UTF8 support & drawing them, so that everything will be supporting any non-English UTF8 language.

Re: Implementing non-English language in OS

Posted: Thu Mar 10, 2016 1:35 pm
by Rusky
There is more to non-English languages than just rendering UTF-8 glyphs- layout and shaping can get pretty complex.

Re: Implementing non-English language in OS

Posted: Fri Mar 11, 2016 1:50 am
by Solar
Right-to-left and bidirectional writing. Characters that need to be larger than latin glyps. Different digit separators, date string formats, currency formats. Glyphs being digits but not being in the 0-9 range. Strings taking much more space than in English. Combining characters. The list goes on.

Re: Implementing non-English language in OS

Posted: Sat Mar 12, 2016 9:48 am
by onlyonemac
Rusky wrote:There is more to non-English languages than just rendering UTF-8 glyphs- layout and shaping can get pretty complex.
And of course with multi-language support your OS also needs to have a proper localisation framework.

Re: Implementing non-English language in OS

Posted: Fri May 13, 2016 3:07 am
by bellezzasolo
From my experience, I used UTF-16 as the internal character format, as that eased many of the issues. I used a simple Unicode bitmap font, freely available (I converted it to C format with a utility). Then, you want to load language packs. My approach was to place one in the initrd, and the kernel used that. The format doesn't have to be complex, you could just have a list of strings, each separated by a newline to begin with (you will probably want to move onto things like XML later though).

If you're mainly aiming at supporting Russian, RTL isn't so important. But, since I was attempting Hebrew, this was the nightmare bit. Enjoy!
Needless to say, this is all needing a graphics mode.

Of course, there is a difference between a language pack and a good language pack. Generally, you will want the strings to be complete sentences so you don't get a complete grammatical screw up.

Hope this helps.

Re: Implementing non-English language in OS

Posted: Sat May 14, 2016 9:47 pm
by ~
You can encode all languages efficiently in UTF-8, and even SQLite3 supports it fully (that's why there are so many end-user programs that make internal use of SQLite3).

You could port SQLite3 to your OS to add indexing, query and even "registry" capabilities for installed programs, configuration values, etc.

I prefer to use UTF-8. It's widely supported in the Web and that's why I should learn to encode it with my own code. UTF-16 as well.


I would recommend you to use a database engine like SQLite3 and make a database that contains all words in all languages. Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:

This is a simple table definition for that:

Code: Select all

CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";

.mode tabs
.import multilanguage_words.txt multilanguage_words



This is sample text of how to index all words in all languages in one same row (text line). Note how each word has a header between || which contains the language ID and the classification of the word (synonym, antonym, name, etc...). The numbers like 50 are the percentage of positive or negative emotions for each word by default that I felt when I wrote the text, but could be updated with A.I., and the last word like "skill" is an attempt to classify the words from the most basic existent human concept, to the most complex and emotive/subjective one, but is optional... all parameters after the language ID are meant to be optional and parsed for their presence:

Code: Select all

{:synonym:es:50:skill}programador+{:synonym:en}programmer+{:synonym:fr}programmeur+{:synonym:it}programmatore+{:synonym:eo}programisto	{:synonym:es}Persona que diseña los procedimientos a seguir por un dispositivo automatizado.
{:synonym:es:80:quality}listo+{:synonym:en:70}smart+{:homonym:es}listo,{:synonym:es}sagaz,{:synonym:es}astuto,{:synonym:es}ducho,{:synonym:es}despabilado,{:synonym:es}avivado,{:synonym:es}avezado+{:typo:es}avesado+{:synonym:es}avispado,{:synonym:es}perspicaz,{:synonym:es:75}vivo	{:synonym:es}Persona con gran agudeza y agilidad mental y práctica.
{:synonym:es:60:status}listo+{:synonym:es}ready,{:synonym:es}preparado+{:synonym:es}prepared,{:synonym:es}dispuesto+{:synonym:en}willing,{:synonym:es}complaciente	{:synonym:es}Estado de espera y disposición para llevar a cabo una tarea.
{:synonym:es:50:concept}palabra,{:synonym:es}fonema,{:synonym:es}vocablo,{:synonym:es}término,{:synonym:es}verbo,{:synonym:es}dicción,{:synonym:es}expresión,{:synonym:es}lengua,{:synonym:es}lenguaje,{:synonym:es}habla,{:synonym:es}promesa,{:synonym:es}pacto,{:synonym:es}oferta,{:synonym:es}juramento,{:synonym:es}ofrecimiento,{:synonym:es}compromiso	{:synonym:es}Elemento de todo lenguaje que comunica ideas, intenciones y acciones.
{:synonym:es:-60:action}desaparecer+{:synonym:es:-35:action}desaparecerse,{:synonym:es}esfumar+{:synonym:es}esfumarse,{:synonym:es}retirar+{:synonym:es}retirarse	{:synonym:es}Alejar algo de nuestra percepción de modo que no se pueda encontrar.
{:name:es:0:male}Rodolfo+{name:en:0:male}Rudolph
{:synonym:*:medication}Panadol+{:synonym:*:medication}Paracetamol
{:synonym:en:65}conversely+{:synonym:es:65}al contrario de+{:synonym:es:65}a diferencia de
{:synonym:es}amalgama+{:synonym:en}amalgam+{:synonym:es}amalgamation
{:synonym:es}natural+{:synonym:en}natural,{:synonym:es}sincero+{:synonym:en}sincere,{:synonym:es}espontáneo+{:synonym:en}spontaneous,{:synonym:es}genuino+{:synonym:en}genuine
{:synonym:es}calibración+{:synonym:en}calibration+{:synonym:es}calibrar+{:synonym:en}calibrate,{:synonym:es}equilibrio+{:synonym:es}equilibrar+{:synonym:en}equilibrium+{:synonym:en}equilibrate,{:synonym:es}balance+{:synonym:es}balancear+{:synonym:en}balance
{:surname:en}Sonnenreich+{:typo:en}Sonnereich
{:synonym:es}correspondiente+{:synonym:es}coincidente+{:synonym:es}concordante+{:synonym:pt}concorda+{:synonym:pt}concorde+{:synonym:es}+que concuerde+{:synonym:es}acierto+{:synonym:es}coincidencias+{:synonym:es}concordancias+{:synonym:es}acierto+{:synonym:es}aciertos+{:synonym:pt}de acordo+{:synonym:pt}concerta+{:synonym:pt}concertar+{:synonym:pt}concerte
{:synonym:es}endurar+{:synonym:es:0:verb}endurecer+{:synonym:es:0:verb}endurezco+{:synonym:es:0:verb}endureces+{:synonym:es:0:verb}endurece+{:synonym:es:0:verb}endurecemos+{:synonym:es:0:verb}endurecéis+{:typo:es:0:verb}endureceis+{:synonym:es:0:verb}endurecen+{:synonym:en:0:verb}harden+{:synonym:en}hard+{:synonym:en:0:verb}make hard+{:synonym:en:0:verb}to make hard+{:synonym:en:0:verb}make it hard+{:synonym:en:0:verb}making it hard{:synonym:es}endura+{:synonym:es}endurece+{:synonym:es}durar
{:name:*:0:font-face}Calibri
{:synonym:en}keep+{:synonym:en}keeping+{:synonym:en}kept+{:synonym:es:0:verb}mantener+{synonym:es}mantén+{:typo:es}manten+{:synonym:es}mantengo+{:synonym:es}mantienes+{:synonym:es}mantiene+{:synonym:es:0:verb}mantienen+{:synonym:es}mantenemos+{:synonym:es}mantenéis+{:typo:es:verb}manteneis+{:synonym:es}mantén
{:word:es}tu+{:word-plural:es}tus+{:word:en}your
{:name:en:50:organism}eye+{:name-plural:en:50:organism}eyes+{:name:es:50:organism}ojo+{:name-plural:es:50:organism}ojos
{:name:es:100:math}Álgebra+{:name:en:100:math}Algebra
{:name:es:100:math}Aritmética+{:name:en:100:math}Arithmetic
{:name:es:100:math}Cálculo+{:name:en:100:math}Calculus
{:name:en:100:artificial-intelligence}Situation Calculus+{:name:es:100:artificial-intelligence}Cálculo Situacional
{:name:en}dynamical domain+{:name:en}dominio dinámico+{:name:en}dynamical domains+{:name:en}dominios dinámicos
{:name:en}vedic math+{:name:es}matemática védica
{:name:en}Pizza Hut
{:name:en}Toto's Pizza
{:word:es}tan+{:word:en}as+{:word:es}tanto
{:word:es}también+{:typo:es}tambien+{:chat:es}tmb+{:word:en}as well+{:word:en}as well as
{:synonym:en:0:verb}close+{:antonym:en:0:verb}open+{:synonym:en}closed+{:synonym:es}cerrado+{:antonym:es}abierto+{:synonym-plural:es}cerrados+{:antonym-plural:es}abiertos
{:word:en}and+{:word:es}y+{:word:pt}e+{:word:fr}et
{:word:en}of+{:word:es}de
{:word:en}you+{:word:es}tú
{:word:en}state+{:word:es}estado+{:word-plural:en}states+{:word-plural:es}estados
{:name:en}day+{:name:es}día+{:name-plural:en}days+{:name-plural:es}días
{:synonym:en}ensure+{:synonym:en}make sure+{:synonym:en}making sure+{:synonym:es}asegurándose+{:synonym:es}asegurar+{:synonym:es}asegurarse
{:synonym:en}high+{:synonym-male:es}alto+{:synonym-female:es}alta+{:synonym-plural-male:es}altos+{:synonym-plural-female:es}altas
{:synonym:en}level+{:synonym-plural-male:en}levels+{:synonym-male:es}nivel+{:synonym-plural:es}niveles+{:synonym:es}nivelación+{:synonym-plural:es}nivelaciones
{:synonym-male:en}channel+{:synonym-plural-male:en}channels+{:synonym-male:es}canal+{:synonym-plural-male:es}canales
{:word:en}the+{:word-male:es}el+{:word-female:es}la+{:word-female:es}las+{:word:es}lo+{:word-plural:es}los
{:word:en}to+{:word:es}para+{:word:es}a
{:synonym:en:0:verb}deepen+{:synonym:es:0:verb}profundizar
{:synonym:en}least+{:synonym:es}menos+{:synonym:es}menor


Remember that the effect of relating the same word and its synonyms/antonyms/etc., in one same row/record/register in all existing languages (including typos, abbreviations and phrases) makes you find and search more in terms of the core concepts of the words, more than search, find and process for a specific word itself.

It's a very good basic A.I. filter for understanding and processing natural language but it needs a massive database containing ALL existing words in human kind related (one same word in all languages===one database record).

Why hasn't even Google released such a vital language database?

Re: Implementing non-English language in OS

Posted: Sun May 15, 2016 2:28 am
by max
~ wrote:Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:

This is a simple table definition for that:

Code: Select all

CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";

.mode tabs
.import multilanguage_words.txt multilanguage_words

[...]

Why hasn't even Google released such a vital language database?
This database structure is terrible. Putting all translations to one word in one row is exactly how you should not do it. A database must be properly normalized so you can effectively work with it, index it and search through it.

You need more than a massive database to do natural language processing. Why should Google release a giant file that is basically only a dictionary? Google has it's AI that properly translates from/to a lot of languages and always learns new stuff. Natural languages are very complex, and the algorithms to process them are as well.