Some projects for a C Begginer ?

Programming, for all ages and all languages.
User avatar
Owen
Member
Member
Posts: 1700
Joined: Fri Jun 13, 2008 3:21 pm
Location: Cambridge, United Kingdom
Contact:

Re: Some projects for a C Begginer ?

Post by Owen »

Solar wrote:Sorry, but this reads like Cthulhu-induced gibberish.
Owen wrote:For you see, Windows is the only OS in the world in which the console truly understands Unicode...
Last time I looked, Windows still defines wchar_t to (UCS2) 16 bit width, a move that shot C++ wide strings in the foot so effectively that C++11 had to introduce another type of wide strings for fixing this mess, in a way so unabiguously that even Microsoft cannot screw it up (hopefully).

Besides, QNX does full Unicode since the early 90ies, just as an example.
At the time they defined wchar_t to be 2 bytes long, it was indeed sufficient to represent every character in Unicode, and the Unicode committee said it always would be. I can forgive Microsoft for that, and for not changing afterwards in the face of the mountain of backwards compatibility issues that would bring.

Practically, wide strings have never really been a "thing" in C++. Pretty much everyone regards UCS-4 as too inefficient to be a practical storage format - though I must admit the APIs for working with single characters can be useful.

Interestingly, the C standards committee seem to have almost abandoned wide character strings in C11, and the UTF-[16/32] support is half baked at best. The final standard doesn't provide any provisions for converting between any pair of [wchar_t/char16_t/char32_t], which is enormously annoying.
Solar wrote:
Owen wrote:...and has an API which works with Unicode characters, rather than Unix-style "The console system ignores the character set and requires the terminal and the application to negotiate it."
I am really not sure what you're talking about re console / terminal. I think you're confusing a couple of things, terminology-wise.

For one, what you get if you call cmd.exe on a Windows box is not a console. What you get when you open a terminal on Unix is not a console. A console is what you have on a physical Unix box when you press Ctrl-Alt-1, and it's meant to flash "login:" pretty much indefinitely unless there is an emergency that requires the admin to actually and physically address the box. Whether or not that supports Unicode isn't much of an issue either way, IMHO, somewhat akin to Windows' "rescue console" (or whatever it's called). Besides, I didn't have any issues with Unicode on my console...

Everything else is done via terminals, or rather, terminal emulators, either on the box itself or remotely. That, IMHO, Unix does beautifully, and if you find fault at it I would ask you to elaborate on it to quench my ignorance of any issues that might remain with the concept.
A case of varying terminology. Windows does, in fact, call that built in terminal system a console. In this case, there is no "pseudo terminal" system, like there is under POSIX. Console objects are directly related to a console window; when you call OpenConsole, the console host process (which is inexorably integrated with the Windows subsystem).

The difference here is that the Unix terminal layer doesn't understand Unicode; it is a simple layer through which bytes pass. The handling of text encoding is entirely up to the terminal emulator and the controlling process (as dictated by the various locale environment variables). This is an entirely reasonable approach, particularly given the backwards compatibility constraints involved.

The Windows console subsystem behaves differently. Internally, it operatse upon Unicode. It provides APIs for both writing and reading the console using Unicode (in this case UTF-16, but that's irrelevant) and the legacy "OEM" character set which is used for DOS compatibility. Whether this is a reasonable approach or support for the OEM character set should have been dropped is another matter altogether (one could probably make a convincing case that backwards compatibility support to enable the porting of old DOS apps should have been delegated to the compiler support library), but whatever. The important point is that the Windows console API supports, and natively works in, Unicode...

...and yet, in spite of this fact, Unicode I/O by their own compiler's C runtime library, using any of the wide character stdio functions provided, is flat out broken. Instead of using the Unicode API to implement Unicode I/O, their C runtime library is somehow sufficiently completely broken that its translation is to return "OEM" character set values cast to wchar_t.
Solar wrote:
Owen wrote:What wgetc does is call getc, then cast the result to a wide character, then return that. What wputc does is just pass the character to putc.
Uh... wputc, wgetc? What in blazes are you talking about? libncursesw? What does that have to do with consoles, Unix or otherwise?

/me confused...
I knew I should have looked up those APIs. Of course, what I actually meant is putwc/getwc. The character API to the C stdio layer is one area where there is absolutely no naming consistency.
Post Reply