You can encode all languages efficiently in UTF-8, and even SQLite3 supports it fully (that's why there are so many end-user programs that make internal use of SQLite3).
You could port SQLite3 to your OS to add indexing, query and even "registry" capabilities for installed programs, configuration values, etc.
I prefer to use UTF-8. It's widely supported in the Web and that's why I should learn to encode it with my own code. UTF-16 as well.
I would recommend you to use a database engine like SQLite3 and make a database that contains all words in all languages. Then put all synonyms, antonyms, paronyms, combinations, etc., in one same row in all languages. It will help you greatly in searches and indexing (will help you create automatic translations even for the GUI of your programs, to search more efficiently the documentation and code, etc.). It will make possible to search in one language, or search a word, and find what you searched in all languages and in all related variants of the word, and for all of its synonyms, antonyms, etc:
This is a simple table definition for that:
Code: Select all
CREATE TABLE multilanguage_words(wordlist TEXT DEFAULT "", dictionary_definition TEXT DEFAULT "", rowid INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL);
pragma encoding="UTF-8";
.mode tabs
.import multilanguage_words.txt multilanguage_words
This is sample text of how to index all words in all languages in one same row (text line). Note how each word has a header between || which contains the language ID and the classification of the word (synonym, antonym, name, etc...). The numbers like 50 are the percentage of positive or negative emotions for each word by default that I felt when I wrote the text, but could be updated with A.I., and the last word like "skill" is an attempt to classify the words from the most basic existent human concept, to the most complex and emotive/subjective one, but is optional... all parameters after the language ID are meant to be optional and parsed for their presence:
Code: Select all
{:synonym:es:50:skill}programador+{:synonym:en}programmer+{:synonym:fr}programmeur+{:synonym:it}programmatore+{:synonym:eo}programisto {:synonym:es}Persona que diseña los procedimientos a seguir por un dispositivo automatizado.
{:synonym:es:80:quality}listo+{:synonym:en:70}smart+{:homonym:es}listo,{:synonym:es}sagaz,{:synonym:es}astuto,{:synonym:es}ducho,{:synonym:es}despabilado,{:synonym:es}avivado,{:synonym:es}avezado+{:typo:es}avesado+{:synonym:es}avispado,{:synonym:es}perspicaz,{:synonym:es:75}vivo {:synonym:es}Persona con gran agudeza y agilidad mental y práctica.
{:synonym:es:60:status}listo+{:synonym:es}ready,{:synonym:es}preparado+{:synonym:es}prepared,{:synonym:es}dispuesto+{:synonym:en}willing,{:synonym:es}complaciente {:synonym:es}Estado de espera y disposición para llevar a cabo una tarea.
{:synonym:es:50:concept}palabra,{:synonym:es}fonema,{:synonym:es}vocablo,{:synonym:es}término,{:synonym:es}verbo,{:synonym:es}dicción,{:synonym:es}expresión,{:synonym:es}lengua,{:synonym:es}lenguaje,{:synonym:es}habla,{:synonym:es}promesa,{:synonym:es}pacto,{:synonym:es}oferta,{:synonym:es}juramento,{:synonym:es}ofrecimiento,{:synonym:es}compromiso {:synonym:es}Elemento de todo lenguaje que comunica ideas, intenciones y acciones.
{:synonym:es:-60:action}desaparecer+{:synonym:es:-35:action}desaparecerse,{:synonym:es}esfumar+{:synonym:es}esfumarse,{:synonym:es}retirar+{:synonym:es}retirarse {:synonym:es}Alejar algo de nuestra percepción de modo que no se pueda encontrar.
{:name:es:0:male}Rodolfo+{name:en:0:male}Rudolph
{:synonym:*:medication}Panadol+{:synonym:*:medication}Paracetamol
{:synonym:en:65}conversely+{:synonym:es:65}al contrario de+{:synonym:es:65}a diferencia de
{:synonym:es}amalgama+{:synonym:en}amalgam+{:synonym:es}amalgamation
{:synonym:es}natural+{:synonym:en}natural,{:synonym:es}sincero+{:synonym:en}sincere,{:synonym:es}espontáneo+{:synonym:en}spontaneous,{:synonym:es}genuino+{:synonym:en}genuine
{:synonym:es}calibración+{:synonym:en}calibration+{:synonym:es}calibrar+{:synonym:en}calibrate,{:synonym:es}equilibrio+{:synonym:es}equilibrar+{:synonym:en}equilibrium+{:synonym:en}equilibrate,{:synonym:es}balance+{:synonym:es}balancear+{:synonym:en}balance
{:surname:en}Sonnenreich+{:typo:en}Sonnereich
{:synonym:es}correspondiente+{:synonym:es}coincidente+{:synonym:es}concordante+{:synonym:pt}concorda+{:synonym:pt}concorde+{:synonym:es}+que concuerde+{:synonym:es}acierto+{:synonym:es}coincidencias+{:synonym:es}concordancias+{:synonym:es}acierto+{:synonym:es}aciertos+{:synonym:pt}de acordo+{:synonym:pt}concerta+{:synonym:pt}concertar+{:synonym:pt}concerte
{:synonym:es}endurar+{:synonym:es:0:verb}endurecer+{:synonym:es:0:verb}endurezco+{:synonym:es:0:verb}endureces+{:synonym:es:0:verb}endurece+{:synonym:es:0:verb}endurecemos+{:synonym:es:0:verb}endurecéis+{:typo:es:0:verb}endureceis+{:synonym:es:0:verb}endurecen+{:synonym:en:0:verb}harden+{:synonym:en}hard+{:synonym:en:0:verb}make hard+{:synonym:en:0:verb}to make hard+{:synonym:en:0:verb}make it hard+{:synonym:en:0:verb}making it hard{:synonym:es}endura+{:synonym:es}endurece+{:synonym:es}durar
{:name:*:0:font-face}Calibri
{:synonym:en}keep+{:synonym:en}keeping+{:synonym:en}kept+{:synonym:es:0:verb}mantener+{synonym:es}mantén+{:typo:es}manten+{:synonym:es}mantengo+{:synonym:es}mantienes+{:synonym:es}mantiene+{:synonym:es:0:verb}mantienen+{:synonym:es}mantenemos+{:synonym:es}mantenéis+{:typo:es:verb}manteneis+{:synonym:es}mantén
{:word:es}tu+{:word-plural:es}tus+{:word:en}your
{:name:en:50:organism}eye+{:name-plural:en:50:organism}eyes+{:name:es:50:organism}ojo+{:name-plural:es:50:organism}ojos
{:name:es:100:math}Álgebra+{:name:en:100:math}Algebra
{:name:es:100:math}Aritmética+{:name:en:100:math}Arithmetic
{:name:es:100:math}Cálculo+{:name:en:100:math}Calculus
{:name:en:100:artificial-intelligence}Situation Calculus+{:name:es:100:artificial-intelligence}Cálculo Situacional
{:name:en}dynamical domain+{:name:en}dominio dinámico+{:name:en}dynamical domains+{:name:en}dominios dinámicos
{:name:en}vedic math+{:name:es}matemática védica
{:name:en}Pizza Hut
{:name:en}Toto's Pizza
{:word:es}tan+{:word:en}as+{:word:es}tanto
{:word:es}también+{:typo:es}tambien+{:chat:es}tmb+{:word:en}as well+{:word:en}as well as
{:synonym:en:0:verb}close+{:antonym:en:0:verb}open+{:synonym:en}closed+{:synonym:es}cerrado+{:antonym:es}abierto+{:synonym-plural:es}cerrados+{:antonym-plural:es}abiertos
{:word:en}and+{:word:es}y+{:word:pt}e+{:word:fr}et
{:word:en}of+{:word:es}de
{:word:en}you+{:word:es}tú
{:word:en}state+{:word:es}estado+{:word-plural:en}states+{:word-plural:es}estados
{:name:en}day+{:name:es}día+{:name-plural:en}days+{:name-plural:es}días
{:synonym:en}ensure+{:synonym:en}make sure+{:synonym:en}making sure+{:synonym:es}asegurándose+{:synonym:es}asegurar+{:synonym:es}asegurarse
{:synonym:en}high+{:synonym-male:es}alto+{:synonym-female:es}alta+{:synonym-plural-male:es}altos+{:synonym-plural-female:es}altas
{:synonym:en}level+{:synonym-plural-male:en}levels+{:synonym-male:es}nivel+{:synonym-plural:es}niveles+{:synonym:es}nivelación+{:synonym-plural:es}nivelaciones
{:synonym-male:en}channel+{:synonym-plural-male:en}channels+{:synonym-male:es}canal+{:synonym-plural-male:es}canales
{:word:en}the+{:word-male:es}el+{:word-female:es}la+{:word-female:es}las+{:word:es}lo+{:word-plural:es}los
{:word:en}to+{:word:es}para+{:word:es}a
{:synonym:en:0:verb}deepen+{:synonym:es:0:verb}profundizar
{:synonym:en}least+{:synonym:es}menos+{:synonym:es}menor
Remember that the effect of relating the same word and its synonyms/antonyms/etc., in one same row/record/register in all existing languages (including typos, abbreviations and phrases) makes you find and search more in terms of the core concepts of the words, more than search, find and process for a specific word itself.
It's a very good basic A.I. filter for understanding and processing natural language but it needs a massive database containing ALL existing words in human kind related (
one same word in all languages===one database record).
Why hasn't even Google released such a vital language database?