Unicode

Discussions on more advanced topics such as monolithic vs micro-kernels, transactional memory models, and paging vs segmentation should go here. Use this forum to expand and improve the wiki!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Post by Brendan »

Hi,
SpooK wrote:PS: As for UTF-8 being internally "appropriate" in an OS, at least you will have a strlen function that will be worth the function calling overhead ;)
For a modern OS, the strlen function correctly returns the length of a zero terminated string in bytes (for both ASCII and UTF8).

For a modern OS, the strlen function never correctly returns the width that the string will be on the screen because modern OSs use proportional fonts - for example, the letter 'W' is a few pixels wider than the number '1'. You need to use a font engine to determine the width of the string on the screen for ASCII and/or UTF8.

Therefore, the strlen function can be a simple "find the first zero" function and it won't matter if you're using ASCII or UTF8.

Note: I am *not* accusing anyone of writing a modern OS... ;)


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

mbrlen(), wcslen(), both in <wchar.h>, which has been part of the standard C library since 1995 (Amendment 1)...

As for ASCII "not working", the problem with ASCII always was that you had to "guess" which ASCII variant a text was written in. When your guess was bad, your text was mangled.
Every good solution is obvious once you've found it.
jal
Member
Member
Posts: 1385
Joined: Wed Oct 31, 2007 9:09 am

Post by jal »

Brendan wrote:For a modern OS, the strlen function correctly returns the length of a zero terminated string in bytes (for both ASCII and UTF8).
True, as that's defined by the C standard.
For a modern OS, the strlen function never correctly returns the width that the string will be on the screen
True as well, but I don't think anyone would expect that, as it depends on font face, point size and, as you mention, the actual characters. However, sometimes you'd like to know how many characters there are in a string. Whether or not to call that function strlen is not that important (unless you want to keep stricly C standard), but if UTF-8, you'll have to traverse the string.


JAL
Post Reply