Page 2 of 4
Posted: Tue Oct 30, 2007 8:41 am
by Solar
I know promises are cheap, but to follow up on this:
Solar wrote:Dismissing very real requirements because you don't want to be bothered with the added complexity does not give an OS designer high marks in my book.
I know I am taking ages to get <stdio.h> out of the door with PDCLib. I know now where and why I failed, am making good progress, but will take some more time as I am relocating ATM, which makes me quite busy.
But once that is done, locales and wide-character I/O is the next step. This will most certainly include Unicode.
Posted: Tue Oct 30, 2007 8:54 pm
by mystran
If you want to store Unicode in some array-cell-is-a-codepoint encoding, then 16-bits won't do it. So your choices are:
- UTF-8, which is quite easy to handle
- UTF-16, which is not that much more simple, and only wins space against UTF-8 when you have lots of characters that UTF-8 can't represent in 2-bytes.
- UCS-32, which is easier to handle, but wastes space practically all the time
Besides, even with UCS-32, one can't assume array-cell-is-a-character (only that it's a codepoint), because one can compose characters using composition codepoints, which means one logical character that be practically any number of codepoints.
This means that you generally have to work with a variable width encoding anyway, so adding another layer of encoding/decoding for UTF-8 is more or less trivial.
Having written a string library which could handle both UTF-8 and UTF-16 properly, and deal with most scripts, I'd say it's not that hard. The primary disadvantage is that you need rather heavy weight tables to decode codepoint types, which you need in order to determine where characters and words end, so for a floppy distribution, full Unicode is kinda impractical to handle. You need even more if you want to do proper string matching, since you need to normalize characters that can be represented in several ways (say ä can be ä in latin1 or a with compose dots on it, but they are logically equivalent for purposes of string matching). But I'd say it's still worth the trouble to write code that can handle the stuff, even if you cut the tables in small distributions to only handle something like the western scripts.
Posted: Tue Oct 30, 2007 11:02 pm
by 01000101
Whoa, heated discussion here.
I agree with Solar on this one. If you are trying to hit the market with any sort of UI, you will need international support and compatibility. It may be slower, but it can't be THAT slow seeing as how all the major OS's out there use it and they aren't crawling along because of their keyboard libs.
Posted: Wed Oct 31, 2007 5:55 am
by Solar
mystran wrote:Besides, even with UCS-32, one can't assume array-cell-is-a-character (only that it's a codepoint), because one can compose characters using composition codepoints, which means one logical character that be practically any number of codepoints.
To clarify:
Unicode defines a "code point" as an integer value representing an abstract Unicode "character".
A Unicode "character"
can be "printing", i.e. have a graphical representation (called "glyph"). It
can be "non-printing", i.e. modify the next character instead. (Think of "dead keys" on the keyboard that add accents etc. to the next character typed.) It can even have a graphical representation of
more than one glyph, especially in the asian alphabets. Or it can be not defined / reserved for later expansions of the standard.
That means, while we have a 1:1 relationship of Unicode "code points" and "characters", we do
not have a 1:1 relationship of Unicode "characters" and
glyphs (which are what the layman might call printing character). In fact, we have a m:n relationship here.
It gets worse. A non-printing Unicode "character" can induce a "shift state", which might not end with the next character - i.e., a "dead key" influencing
more than one following character. (Not 100% sure if there are not even characters influencing
preceding characters...)
Furthermore, as mystran stated above, several (combinations of) Unicode "characters" might result in the same
glyph, yet still be very distinct (combinations of) "characters".
And finally, in encodings like UTF-8, one Unicode "code point" can be encoded by anywhere from 1 to 4
bytes.
And before you flame
Unicode for being so complex, remember that we don't have any other way for representing
all alphabets of the world in
one encoding system. If you have ever fiddled with a system where the kernel messages are in one encoding, the console in another, and the file system in a third, you will know why there simply is no way around Unicode if you want your system to go anywhere beyond Montana.
Posted: Wed Oct 31, 2007 10:51 am
by mystran
I'd like to add that an application that deals with UTF-8 internally, and uses indirections for finding next/previous codepoint/user-character/word-break and for mapping user-characters to glyphs (not always 1:1 either), even if not really implementing all the exotic scripts properly, is a LOT better than something that simply deals with ascii, because it's a LOT easier to extend it to support more exotic scripts later.
Posted: Wed Oct 31, 2007 12:14 pm
by Brendan
Hi,
bewing wrote:Yes, if I ever create an internationalized version (which I probably won't), it will certainly have a much more generalized, slower, uglier, more bloated keyboard driver written by some wageslave computer science grad, with all the features you speak of, and more. I simply think it is a mistake to design internationalization into an OS as a primary goal, unless your primary market is going to be the EU.
There may be a fundamental flaw in the way you approach OS design, and it's a flaw that (IMHO) other OS developers might share.
When designing an OS, you should/must try to take into account *everything*. If you don't you'll either end up with "hack on a hack on a hack" or "rewrite after rewrite after rewrite" as you try to add things you didn't originally consider.
For example, imagine you design your OS for "ASCII only" where the keyboard driver sends 8-bit characters. Then (eventually) want to add unicode and internationalization and need to change the interface between the keyboard driver and the rest of the system to handle 32-bit characters. Because you've changed the interface, you need to update everything that uses the interface - the kernel, the GUI and all applications. It can add up to years of rewriting to do it properly (especially if there's many applications, etc).
Of course in this case you could just write a hack. For e.g. you could make applications default to "ASCII mode" and allow newer applications to enable "Unicode mode", where no-one can remember which applications support internationalization and which don't, and where your applications programmers start thinking about writing thier own OS because your interfaces aren't "clean" (e.g. a messy collection of crud for backwards compatability purposes).
Please note: I'm not talking about implementation, I'm talking about design. Just because your OS (API/s, protocols, interfaces, etc) are designed to handle unicode and internationalization, doesn't mean that your initial implementation actually needs to support unicode and internationalization.
If your only going to support one language then it'd make the most sense to support Mandarin. Here's a table showing the percentage of the world's population that use each language as their primary language (in order from largest to smallest, not necessarily accurate):
- 32% Mandarin
11% Hindi/Urdu
11% Spanish
11% English
6% Bengali
6% Arabic
6% Portuguese
6% Russian
4% Japanese
3% German
2% French
Now, see if you can answer these questions:
- what happened on the 9th of November in 2001 (9/11/2001)?
- how is "colour" spelt?
- is 1.501 slightly more than one and a half, or slightly more than fifteen hundred?
- did Microsoft release DOS about 20 years before the year 2000, a few thousand years after the year 2000, or several hundred years before the year 2000?
- is 14:00 before or after midday?
- when does daylight savings start?
- if both of our computers send a fax to the same fax machine in Italy, would both computers dial the same phone number?
- if you lend me $1000 and I repay a quarter each week for 5 weeks, would you make a profit or a loss?
Cheers,
Brendan
Posted: Wed Oct 31, 2007 2:03 pm
by LordMage
Well, I am designing my OS with a very specific purpose of working for the dumb white American male. I am doing this because I don't want to have to worry about what people other than me want.
I do have a question though. Show of hands, who thinks they will actually make money with thier hobby OS?? You all talk like you expect to be millionaires. I think the days of fighting for your spot in the OS market are at a slow right now. Microsoft and Mac pretty have a corner on the getting paid market and Linux is the main one on the free market. I would love it if someday someone other than me uses my OS, or even if just I could just it for a practical purpose. Aren't we mostly doing this just to learn and feel like we've accomplished something?
Posted: Wed Oct 31, 2007 3:43 pm
by Solar
LordMage wrote:Well, I am designing my OS with a very specific purpose of working for the dumb white American male.
I am really, really sorry to see that someone who has enough brains to make it into the field of software can still be dumb enough to make such a remark and probably didn't even think twice about it.
So it's the dumb
white American male your OS is designed for. Probably because most others (black Americans, hispanic Americans, asian Americans, native Americans, and - gosh - even
females) are not dumb
enough to think they live in a cosy little nutshell where they can simply ignore the existence of other cultures.
If you don't want to bother with theory because you are just in it for the hacking fun, fine. It's a valid point of view. Others (me included) are in it because we think that OS design is a field that could still be much improved on, and because we actually find it enlightening to read about stuff like Unicode because it widens our horizons. Spare us your flames.
Oh, and by the way: Non-English locales were introduced to the "all-american" MS-DOS in v2.11, March 1984...
Posted: Wed Oct 31, 2007 4:45 pm
by LordMage
No flame intended, sorry if I offended you. I was not ignoreing the "Minorities", I was merely adopting the true spirit of EO and not leaving out little ole me that has to constantly worry about what the "African-Americans", the "Mexican-Americans", the "Hispanics" and the "Asian Pacific-Americans", ohh and lets not forget the females, womens rights and all, think about my little OS that probably no one other than me will ever use. I mean is it a crime to be proud that I am white, wait I will be politically correct "White Nonhispanic", well that is what it says on all the questionaires and applications. I actually prefer plain old "American" not to dog any other countries, I have been to a few of them and I think that each of the ones I have been to have every reason to be proud of thier nation. I just dislike the fact that a "Melting pot" which is what America is supposed to be should have to label everyone based on race. We should just be Americans. I doubt that the Chinese OS programers are worried about whether thier OS supports German. I know a greecian that made a distro of linux that is only in greek. He had/has no reason to adopt any other character map for his version of the OS because he knows who his intended market is. I have an inteded marked of just one person and that is me. I doubt I am the only one. So I am only worried about what I think about my OS. If you were a French man would you care what I thought of your OS? Probably not, I am pretty sure that whatever your nationality or background that at this point in time you could care less what I think of your OS. I would not fault you for that. There is a certain nobility in catering to the masses but it is much more rewarding personally to develop for a small group. Instead of making a large group of people marginally happy and have to hear about bugs and flaws all day, Microsoft, I choose to write for what I want and be happy with my version of an OS, linux from scratch. So, what is wrong with that.
I don't see Black, Asian, Hispanic or White. I see RGB.
Again, I don't feel that I am a "dumb white American" I was just poking fun at my race and nationality. By the way, excuse any misspellings and nonsensical sentances. I'm from Kentucky, and if that don't explain any questions just look at our history.
@Solar - Lighten up a little. Life is no fun with a chip on your shoulder. I have noticed in some of your other posts that you seem a little sensitive when people make what I would consider harmless comments.
Posted: Wed Oct 31, 2007 10:03 pm
by Alboin
LordMage wrote:I mean is it a crime to be proud that I am white
Yeah, kinda. I can see being proud of being of German, Polish, or Chinese, etc. heritage, but being proud of your race is somewhat different. It almost encourages racism, which isn't a nice thing.
The thing is, the rest of your post really doesn't agree with that one sentence. Maybe there is a miscommunication somewhere here...
LordMage wrote:@Solar - Lighten up a little. Life is no fun with a chip on your shoulder. I have noticed in some of your other posts that you seem a little sensitive when people make what I would consider harmless comments.
Ooohhh...Really? I wouldn't advise saying such comments to people who have ~3500 posts to your 43.
Posted: Wed Oct 31, 2007 11:58 pm
by Brendan
Hi,
LordMage wrote:Well, I am designing my OS with a very specific purpose of working for the dumb white American male. I am doing this because I don't want to have to worry about what people other than me want.
I'll assume that means white American males who happen to use English, don't travel to different time zones and are willing to change their computers clocks for daylight savings, don't have a multimedia keyboard with unusual keys (or a dvorak keyboard), aren't involved with a business that imports or exports, use words like "resume" when applying for jobs and "cafe lata" when asking for coffee, don't do any mathmatics or physics, and don't want to use speech recognition or handwriting recognition instead of (or in addition to) their keyboard.
If your OS is for educational purposes only (and not intended to ever be used), then I guess you could decide to learn how to do an OS properly (or not).
LordMage wrote:I do have a question though. Show of hands, who thinks they will actually make money with thier hobby OS?? You all talk like you expect to be millionaires. I think the days of fighting for your spot in the OS market are at a slow right now. Microsoft and Mac pretty have a corner on the getting paid market and Linux is the main one on the free market. I would love it if someday someone other than me uses my OS, or even if just I could just it for a practical purpose. Aren't we mostly doing this just to learn and feel like we've accomplished something?
For me, it's about maximizing the chance that my OS will be used by someone (which includes removing limitations that reduce the chance that the OS will be used).
However, if each year one person in every million people paid me $5 for my OS, then that'd add up to $30000 per year - more than enough for me to work on my OS full-time. How many people would pay $5 for something like Singularity? How many people would donate cash to SkyOS's "code ransom" scheme, or pay to become a SkyOS beta tester? How many people have bought the book containing MMURTL or the book containing MINIX?
I'm sure no-one here expects to become a millionare, but making a living from your OS isn't necessarily impossible.
Cheers,
Brendan
Posted: Thu Nov 01, 2007 12:59 am
by Solar
LordMage wrote:I just dislike the fact that a "Melting pot" which is what America is supposed to be should have to label everyone based on race.
I dislike the notion that someone who is apparently proud of his "melting pot" origins doesn't show much respect for all those cultures who contributed to this pot.
We should just be Americans.
Who might happen to be named José and living on the other coast, a couple of time zones away...
I doubt that the Chinese OS programers are worried about whether thier OS supports German.
I doubt that a Chinese OS programmer would settle for ASCII-7, and he would at least would add the
option for other locales.
If you were a French man would you care what I thought of your OS? Probably not, I am pretty sure that whatever your nationality or background that at this point in time you could care less what I think of your OS.
I'm a German, and yes, I value the feedback of people who have more insight on a subject than I have, which is why I hang around here, because when people here tell you you're doing things wrong, they're usually right.
There is a certain nobility in catering to the masses but it is much more rewarding personally to develop for a small group.
I miss something like "I think" or "IMHO" here.
I think it is
very rewarding to work on something useful for a wide range of people.
Which is why I design PDCLib in the way I do: Modifyable, extensible, and
complete. I could whip up a "C locale only" implementation of <locale.h> and <ctype.h> in an hour or so, but I would only betray myself, and everyone looking for a "real" C library.
Instead of making a large group of people marginally happy and have to hear about bugs and flaws all day, Microsoft, I choose to write for what I want and be happy with my version of an OS, linux from scratch. So, what is wrong with that.
That missing localization, at least as an option,
is a flaw. I don't have a problem if that is your decision, I have a problem with the way you are attacking those who want to do things differently.
I'm from Kentucky, and if that don't explain any questions just look at our history.
I'm from that area where they annihilated ~20,000 Roman legionaires in 9 A.D. - that doesn't mean I'm killing Italians with a longsword these days.
@Solar - Lighten up a little. Life is no fun with a chip on your shoulder. I have noticed in some of your other posts that you seem a little sensitive when people make what I would consider harmless comments.
I'm on a short fuse when I read
ignorant comments. I know that, and as long as I don't go overboard verbally (which unfortunately happens from time to time), I don't consider it a character flaw to speak up. (See my signature.)
Posted: Thu Nov 01, 2007 3:51 am
by JamesM
LordMage: Reading your posts you seem to be to be ignorant and racist, however poorly you try to cover it up. You seem to be the stereotype of an inbred, loud American that everybody (and I do mean everybody) hates.
Posted: Thu Nov 01, 2007 4:10 am
by Solar
Ahem. I'd say comparative ancestry is a bit OT, even in this case.
Posted: Thu Nov 01, 2007 5:10 am
by JamesM
Meh, I'm OK with it. I detest people like him, and have had the misfortune to meet several of the like in my life so far.