Regional Settings

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Regional Settings

Post by Brendan »

Hi,

I'm creating a database containing regional settings, so that my OS (and yours) can use the database to support users in different time zones, countries, etc (internationalization).

Unfortunately I need details for every location in the world! These details are fairly comprehensive, and take into account different time zones (including daylight savings), currencies, numerical formats, calanders, time and data formats, etc. The resulting database (machine readable ASCII source files and binary files), details of the format for the source and binary files, utilities to convert the source files into binary and all source code from my OS that uses the database will be freely available to anyone that wants it without any restrictions (public domain?).

To gather information for the database I've created a form. Please fill this form out and email it to me (email address is in the form). I will post the URL for the web page that the resulting freely available material (listed above) can be downloaded from when I've actually created it (in the next few days).


Thanks,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Neuromancer

Re:Regional Settings (please fill out the form!)

Post by Neuromancer »

I think you should add time information in your form.
I have just sent you an e-mail for Italy regional info, and time information you may need is:

Date format (HMS): HMS
Hour-minute sepator: ':'
Minute-second separator: ':'
12 or 24-hour format: 24 hour

Note: H is hour, M is minute, S in second
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Regional Settings (please fill out the form!)

Post by Brendan »

Hi,

Thanks to everyone that's filled out the form so far!
Neur0m'ancer wrote: I think you should add time information in your form.
You're right - I've added time formats, spots for people to show examples and national languages.

The updated form is attached.


Thanks,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regional Settings (please fill out the form!)

Post by Solar »

Note that daylight saving rules have changed repeatedly in the past; if you do time calculations into the past, you'd have to take those changes into account.

Depending on your licensing, you could have a look at available C standard libraries (glibc); most of the information you're looking for is available from the locale.h implementation.
Every good solution is obvious once you've found it.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Regional Settings (please fill out the form!)

Post by Brendan »

Hi,
Solar wrote: Note that daylight saving rules have changed repeatedly in the past; if you do time calculations into the past, you'd have to take those changes into account.
At the moment I only need current daylight savings information.

Accurate calculations into the past will need information that is not provided as part of the OS. Daylight savings time changes, leap seconds and changes to calendars account for most of the inaccuracies. The OS (networking, file system, scheduler, etc) uses 64 bit mS since K, where K is a time roughly 292 million years before christ (defined such that 0x8000000000000000 equates to the start of the first millisecond in year 2001 in UTC).
Solar wrote: Depending on your licensing, you could have a look at available C standard libraries (glibc); most of the information you're looking for is available from the locale.h implementation.
I've checked out local.h which defines a structure that would contain some of this information, but I can't find anything that contains the actual data for it. I assume the data that is returned by the C library originates from the OS's user settings (for e.g. windows regional settings are in the registry).

I've also got the source files for the *nix time zone package.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regional Settings (please fill out the form!)

Post by Solar »

Brendan wrote: The OS (networking, file system, scheduler, etc) uses 64 bit mS since K, where K is a time roughly 292 million years before christ (defined such that 0x8000000000000000 equates to the start of the first millisecond in year 2001 in UTC).
Note that most language standards expect their time to be "since 1.1.1970" - this including C. If you intend to provide a C library, you will have to provide some conversion function for your internal time stamp.
I've checked out local.h which defines a structure that would contain some of this information, but I can't find anything that contains the actual data for it. I assume the data that is returned by the C library originates from the OS's user settings (for e.g. windows regional settings are in the registry).
Since a C program is allowed to switch locales at runtime, the library should provide somewhat more than just "C locale" and "local locale". I admit I haven't fiddled with glibc to any extend so I can't give you any valuable pointers though.
Every good solution is obvious once you've found it.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Regional Settings (please fill out the form!)

Post by Brendan »

Hi,
Solar wrote: Note that most language standards expect their time to be "since 1.1.1970" - this including C. If you intend to provide a C library, you will have to provide some conversion function for your internal time stamp.
POSIX defines the time_t data type as seconds since 00:00:00 on 1/1/1970 (UTC), and most implementations use a 32 bit unsigned integer for it. Therefore it will turn to mush in the year 2106, and dates before 1970 can't be represented at all. IMHO this is brain-dead, and the brain-dead parts of a library end up becoming brain-dead parts of actual code. For example, the file system 'ext2' will have a very short year in 2106 (February won't happen) that will be followed by year 1970.

In addition seconds are not accurate enough for my file system/s. I could use signed 64 bit mS since 1/1/1970, but I just don't like signed numbers :). Conversion between formats isn't hard...
Since a C program is allowed to switch locales at runtime, the library should provide somewhat more than just "C locale" and "local locale". I admit I haven't fiddled with glibc to any extend so I can't give you any valuable pointers though.
I found the source of the information on Linux - /usr/lib/locale/*
It contains binary data :-(

If anyone knows where I can get the source code for this data....


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Regional Settings (please fill out the form!)

Post by Brendan »

Hi,

I've found a source for the /usr/lib/locale/* data (http://www-124.ibm.com/developerworks/projects/locale) and I'm writing a C program to convert it into my format. Then I'll need to add time zone information and "largest city in zone".

I can possibly do this with IBM's info, Linux's time zone data base and a decent ATLAS! :)

The added bonus is that my database will also support UTF8 from the start, and weekday names, month names, and a few other things.

It seems some Linux people started a regional info database, which is now being standardized (by ISO/IEC/POSIX/ICU). It's all in XML so my conversion utility should continue to work to build my database from theirs (and will be easy to modify when they actually finalize the standard/s). Unfortunately the standard/s involve over 150 seperate files with no indexing, and so I will not be using it directly.

BTW - I still intend making all related code and data freely accessable for everyone, it just might take weeks before anything is on the 'net as collating, converting and correcting all the information will take some time.


Thanks everyone,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regional Settings (please fill out the form!)

Post by Solar »

Brendan wrote: POSIX defines the time_t data type as seconds since 00:00:00 on 1/1/1970 (UTC)...
Unfortunately, that's not POSIX, but ISO/IEC 9899... which means, the C language itself:

* struct tm holds "years since 1900" in tm.tm_year (int);
* difftime() returns difference in seconds (double);
* mktime() and time() return time_t, which is required to be a signed integer type (-1 to represent an error value).

No matter what other, better functions your OS might provide, a C library implementation is *expected* to adhere to above rules, since otherwise perfectly good C programs could fail on your OS.

And yes, most implementations chose to typedef time_t int, which fails in 2036 (not 2106, since time_t has to be signed.

Alas, the ghoul of backward compatibility. Make a sane choice for the type of time_t (something with 64 bit, long long should do the trick with most compilers), and provide the necessary conversions from OS specific to what <time.h> requires...
Every good solution is obvious once you've found it.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Regional Settings (please fill out the form!)

Post by Brendan »

Hi,
Solar wrote: Unfortunately, that's not POSIX, but ISO/IEC 9899... which means, the C language itself:
* struct tm holds "years since 1900" in tm.tm_year (int);
* difftime() returns difference in seconds (double);
* mktime() and time() return time_t, which is required to be a signed integer type (-1 to represent an error value).

No matter what other, better functions your OS might provide, a C library implementation is *expected* to adhere to above rules, since otherwise perfectly good C programs could fail on your OS.

And yes, most implementations chose to typedef time_t int, which fails in 2036 (not 2106, since time_t has to be signed.
Can C programs that use these standard libraries be considered "perfectly good"? While they may work fine now they won't sooner or later, which (IMHO) makes these "perfectly good" programs BROKEN.

If you spent years writing an OS that supports far more advanced features than those that currently exist, and people ported crusty old C programs to your OS, and users used these crusty old C programs on your OS, then what would be the point of writing an OS that supports far more advanced features in the first place?

The majority of programs witten in C don't use multi-threading, don't bother with thread priorities, don't use non-blocking file IO, won't make use of my user interface features, won't maximise the use of idle time and will in general make my OS look like sludge. I do not want people to port code to my OS without making huge changes, and I have no interest in being compatible with any standard unless that standard is extremely similar to what I would have come up with anyway.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Regional Settings (please fill out the form!)

Post by Pype.Clicker »

just in case you could find it handy, here's the belgian keyboard layout ...
DennisCGc

Re:Regional Settings (please fill out the form!)

Post by DennisCGc »

Brendan wrote:
I'm creating a database containing regional settings, so that my OS (and yours) can use the database to support users in different time zones, countries, etc (internationalization).
And you want to port it from the GLIBC ?
Sorry, my project is non-gpl, which means I can't use it.
So, my question is: is it GPL'ed or not ? ::)
User avatar
Pype.Clicker
Member
Member
Posts: 5964
Joined: Wed Oct 18, 2006 2:31 am
Location: In a galaxy, far, far away
Contact:

Re:Regional Settings (please fill out the form!)

Post by Pype.Clicker »

note that not being GPL doesn't mean that you cannot use GLIB (which is L-GPL). It simply means that you cannot rawly cut'n'paste text from GLIB into your non-GPL kernel.

But nothing prevents you from derivating a L-GPL kglib from the GLIB and linking your kernel with the kglib. As long as things remain clearly separated, there's no problem.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Regional Settings (please fill out the form!)

Post by Brendan »

Hi,
DennisCGc wrote: And you want to port it from the GLIBC ?
Sorry, my project is non-gpl, which means I can't use it.
So, my question is: is it GPL'ed or not ? ::)
It's not ported from GLIBC (I think GLIBC uses the information supplied by the OS it's running on).

Part of the information that my database will originate from comes from IBM. Linux people formed a group called OpenI18N (formerly called "LI18NUX2000") who developed a standard for this database where the data is represented in XML. IBM is collating all this information in XML form for anyone to use. Linux converts the XML data into it's own/different binary format for "/usr/lib/locale/*" (which I assume GLIBC reads).

The copyright licence is not GPL - it's IBM's own licence, which appears less restrictive than GPL. As far as I can reasonably tell (I'm no lawyer) I can freely do what I like with IBM's database (as long as they can't be taken to court for warranty or anything).

What I am doing is converting it to intermediate files in my own format (and omitting some of it) via a crappy little utility I wrote, and then adding additional data to it to make my database's source files. Once this is done I will continue to maintain my own database, in it's own format. I think this means that some of the files initially used to create my database are the output of a utility that is a "derivative work". If this is the case then my database isn't a "derivative work" itself, and IBM's licence has no effect. If I'm wrong and my database is considered a derivative work, then IBM's licence gives me "a non-exclusive, worldwide, royalty-free copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, distribute and sublicense the Contribution of such Contributor, if any, and such derivative works, in source code and object code form.".

In addition to IBM's information I will also be adding information from the Linux time zone package (and information from volunteers). This time zone data (ASCII text) does not contain any copyright information or licence at all. The Linux time zone data will be used as reference material in conjunction with other information on time zones that I've downloaded from the internet (that also lacks any copyrights).

At the end of the day I will be putting my own copyright on the resulting database, which will effectively be public domain - "if you want to use it for anything please do, but don't blame me if it's not perfect and don't expect any warranty". As far as I can tell IBM's copyright gives me permission to do this. By the time I'm done no-one will be able to tell where this information came from anyway, and as it's all public knowledge I doubt anyone will care :)

I don't know where I downloaded the time zone database anymore (google should be able to find it). I've attached a copy of IBM's licence, as your legal knowledge may be better than mine. IBM's web page for this is http://www-124.ibm.com/developerworks/projects/locale


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:Regional Settings (please fill out the form!)

Post by Solar »

Brendan wrote: Can C programs that use these standard libraries be considered "perfectly good"?
Yes, and that's the problem. A strictly compliant C program must be able to run. The programmer went to some lengths to make his/her programm fully compliant so it can be compiled and run on the widest range of platform. No matter how much "better" your C library might be, the second you stop fully supporting the standard, it is your environment that is broken.

With the publicity of the C standard, you can rest assured that the standard will be extended in crucial areas before it becomes a problem. Until then, you better support the standard, or developers will get annoyed at you.
While they may work fine now they won't sooner or later, which (IMHO) makes these "perfectly good" programs BROKEN.
The standard requires extension in some places. But it's not upon an OS developer to do that extension, that's the job of the C standards committee. Remember how you flamed Microsoft whenever they "embraced and extended" a standard...
If you spent years writing an OS that supports far more advanced features than those that currently exist, and people ported crusty old C programs to your OS, and users used these crusty old C programs on your OS, then what would be the point of writing an OS that supports far more advanced features in the first place?
Supply those additional features as option. Microsoft supplies CString, but you can also use std::string. AmigaOS supplies AllocMem(), but you can also use malloc(). The standard is what people know, and fully expect. If your options actually are better, people will use them after a while. But an OS that can't run fully compliant C programs is simply broken.
The majority of programs witten in C don't use multi-threading, don't bother with thread priorities, don't use non-blocking file IO, won't make use of my user interface features, won't maximise the use of idle time and will in general make my OS look like sludge.
Too bad, but you can't force developers. The alternative is to provide an environment that feels so "broken" that no-one ever bothers to check out the nifty alternatives you provide.
I do not want people to port code to my OS without making huge changes...
Expect to create an OS without software, except for what you wrote yourself.

People start learning about a programming language, usually from a book or tutorial. If they fail to compile their "Hello World" because your OS doesn't support printf() but only your far superior (but unknown) printme() function, you'll end up being flamed, ignored, or both.
...and I have no interest in being compatible with any standard unless that standard is extremely similar to what I would have come up with anyway.
Erm... like, TCP/IP, POP3, SMTP, HTTP, ...?
Every good solution is obvious once you've found it.
Post Reply