Unicode

Brendan · Post by **Brendan** » Sat Mar 01, 2008 10:41 pm

Hi,

SpooK wrote:PS: As for UTF-8 being internally "appropriate" in an OS, at least you will have a strlen function that will be worth the function calling overhead

For a modern OS, the strlen function correctly returns the length of a zero terminated string in bytes (for both ASCII and UTF8).

For a modern OS, the strlen function never correctly returns the width that the string will be on the screen because modern OSs use proportional fonts - for example, the letter 'W' is a few pixels wider than the number '1'. You need to use a font engine to determine the width of the string on the screen for ASCII and/or UTF8.

Therefore, the strlen function can be a simple "find the first zero" function and it won't matter if you're using ASCII or UTF8.

Note: I am *not* accusing anyone of writing a modern OS...

Cheers,

Brendan

Solar · Post by **Solar** » Sun Mar 02, 2008 5:17 am

mbrlen(), wcslen(), both in <wchar.h>, which has been part of the standard C library since 1995 (Amendment 1)...

As for ASCII "not working", the problem with ASCII always was that you had to "guess" which ASCII variant a text was written in. When your guess was bad, your text was mangled.

jal · Post by **jal** » Tue Mar 04, 2008 6:53 am

Brendan wrote:For a modern OS, the strlen function correctly returns the length of a zero terminated string in bytes (for both ASCII and UTF8).

True, as that's defined by the C standard.

For a modern OS, the strlen function never correctly returns the width that the string will be on the screen

True as well, but I don't think anyone would expect that, as it depends on font face, point size and, as you mention, the actual characters. However, sometimes you'd like to know how many characters there are in a string. Whether or not to call that function strlen is not that important (unless you want to keep stricly C standard), but if UTF-8, you'll have to traverse the string.

JAL