I am guessing that the posts about Unicode here will get Jeffed out to a new thread by the mods (please do), but I wanted to mention
this discussion on the Daily WTF's fora (warning, NSFW) from shortly before I stopped posting there in order to safeguard my sanity.
This was actually one of the less violent and flamey threads on that group, with plenty of well-considered opinions. Go figure. In light of the amount of trollery that did occur in that thread, this should tell you a lot about why I left (though raised the question of why I was there in the first place). Thoughtful (about technical problems, that is), knowledgeable, and witty jerks are still jerks, which I suppose could be applied to myself as well (assuming one considers me thoughtful, knowledgeable, or witty, which I hope is the case but I wouldn't be surprised if most don't).
Note that I agree that Unicode is an improvement over ASCII, and indeed the point of that thread was that someone whose opinion I respect (Ted Nelson) had stated that in email that Unicode was a cultural disaster (the exact quote was, "Unicode has demolished cultures"), and I was asking for ideas as to why he said this (he hadn't given me much of any answer at that point). I knew that he'd been in Japan in the late 1990s, and I got the impression (which other seemed to corroborate) that his view was colored by resentment over the move from the existing national encodings to one which was seen as culturally insensitive, so I do think he placed far too much weight on that experience.
(Ted never did give me a direct answer on his opinions, but just pointed me to one of his books,
Geeks Bearing Gifts, which I have yet to read. To be fair, it isn't as if he has any reason to see me as anything other than a stranger asking odd questions. While we did meet a few times, it's been about fifteen years since we last spoke.)
One point I made that is possibly important (for those who don't want to wade through that quagmire) is that the idea of a single, linear, and stateless encoding doesn't fit a lot of languages well. For example:
- Hanzi character ordering is based on the radicals which make up the ideograms, and even if we treat Traditional and Simplified characters as separate sets¹, there is no single, linear ordering that is universally accepted. Also, most of these orderings are explicitly tabular, with relations in (at least) two dimensions.
- Japanese kanji - which is based on Traditional Hanzi, but has drifted and is not always the same as the modern versions of the 'traditional Hanzi' - has it's own approach to ordering the characters, and while I don't know the details, trying to use a Chinese ordering with Japanese, or vice versa, is going to cause problems.
- The Japanese Kana syllabary is also explicitly tabular in ordering, and forcing it into a linear form loses significant information about the relationships between the characters.
- A number of scripts - most notably Arabic - are stateful, in that different letters have different letter forms depending on their position in the word. I don't offhand know how this is solved in Unicode, so how much of a problem it is isn't clear to me. Perhaps someone else could speak up about it?
- Ligations and other kinds of merged digraphs are often a problem, though I gather it is mostly a solved one currently. I would expect Solar to be familiar with this problem (assuming it still is one) regarding the sharp-s character ('ß') in German.
Furthermore, from what
this blog post is claiming, as of at least 2015 there has never been anyone from anywhere other than the US, UK, Canada, and Europe on the Unicode committee, which is insane. Even the Oman Ministry of Religious Affairs - the only governmental agency with permanent representation on the group - is represented by someone from Netherlands who has only a rudimentary knowledge of Arabic. That's simply ludicrous.
As some on the thread pointed out, technological issues with characters isn't a new thing; the examples given include two from English, namely the discarding of the 'thorn' ('þ') and eth ('ð') due to them not being available in the metal typefaces imported from Europe in the 16th century², and the similar discarding of the medial 's' ('ſ') in the late 18th century³.
One could look even further back, to the reasons why some languages have left-to-right, top-to-bottom ordering, others have right-to-left/top-to-bottom, still others have boustrophedonic orderings⁴ (either consistently or depending on the writing) and yet others have top-to-bottom ordering (which could then go either right-to-left or left-to-right). For example, Chinese top-to-bottom, right-to-left ordering appears to have originated from the use of
bamboo slips strung into books which were sewn into rolls or folded codices, whereas in cuneiform, the original top-to-bottom ordering became right-to-left as a way of avoiding smudging of the clay tablets - while it is by no means certain that this carried over to proto-Hebrew and proto-Arabic writing forms, the theory that it did is fairly widespread (though it seems unlikely given that they both came more from the unrelated Phoenician
abjad and Phoenician was right-to-left too). How Greek and Latin, which both were based on Phoenician, came to reverse the order ins't known, IIUC.
I seem to recall some arguing that the Phoenician ordering relates to Egyptian, where it was used because of the manner in which their mural art and hieroglyphs work, but that seems unlikely given that there is no direct connection between the Phoenician and Egyptian writing forms. In any case, the claim regarding Egyptian is incorrect - the order of reading in Egyptian varied by context (with some words read in one direction and some in another), medium (papyrus writing was sometimes, but not always, top to bottom, while inscriptions and murals depended on the layout of the wall or monument), and era.
footnotes
1. Somewhat artificially, from what I've heard, but that also goes towards the political reasons for which first the Nationalist government, and then the Communist government, each introduced their own 'simplified' scripts - while the main reason was to promote literacy, without having to submit to an outside cultural influences while doing so, they also wanted to make it clear that they were in charge of everything, including the language.
2. Hence the use of 'Ye' for 'the' in some old documents, which is famously retained by the storefront signs of some traditional pubs (and tony boutiques).
3. In part because it was used inconsistently, in part because it was hard to distinguish from lowercase 'f' in most typefaces, but mostly because it was just too annoying for the typesetters to bother with, being a distinct exception in English - especially at a time when the rules about capitalization were still taking shape - which generally doesn't have positional letter form variations based on the position in the word after the first letter.
4. Alternating left-to-right and right-to-left per line. The term can be translated from Greek roughly something like 'in the manner of an ox plow'.