Page 1 of 2

2 C runtime libraries

Posted: Sat Nov 14, 2020 4:06 am
by PeterX
Here's some blog entry about a comparison of musl and glibc (regarding isalnum()).
https://drewdevault.com/2020/09/25/A-st ... libcs.html

It's interesting about different coding "styles". Maybe there's a lesson to learn from that (about how to code and how not to code).

Greetings
Peter

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 4:59 am
by nullplan
I read the first bit, and immediately recall that calling the ctype macros with numbers not in range for an unsigned char (and unequal to EOF) is undefined behavior. I have seen libcs that access arrays with the argument, assuming it cannot be 256 or larger, and can at least be -1. And for the wide-character versions of these, musl will also use a giant array and index it with the argument.

That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into, and even I, who has worked on this stuff for half my life, cannot avoid all the pitfalls. But mostly I just look things up as needed. One example I always find intriguing is that it is undefined behavior to be converting a floating-point number into any integer type if the integer part of that number is not in range of the destination type. I have been in situations where converting a float to a short is undefined behavior, but converting it from a float to long and then a short is perfectly fine.

EDIT after reading: Dear lord! I had dived into glibc code before, so I kind of knew what to expect, but that stuff is always ridiculous. I wonder if that stuff just naturally happens to longstanding codebases. Features get added over time and the gap between the old and new interfaces is spackled over with some macros, until the entire unholy mess is no longer readable. The dependency on the byte order is probably just because they made the locale definitions machine-independent binary files, and instead of properly unmarshalling the array of bytes, they just mess up their bit definitions instead. Because that is just how some people like to write their code. The whole thing needs to be locale-dependent, because glibc supports character sets other than UTF-8. And suddenly everything makes sense.

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 6:00 am
by nexos
Just another reason why to very much dislike GNU. The only thing useful they have made is GCC and Binutils, which I like better then Clang.

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 7:39 am
by iansjack
nexos wrote: The only thing useful they have made is GCC and Binutils
I'd say that bash has proved to be moderately useful.

Apart from that, what have the Romans ever done for us?

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 7:55 am
by Solar
It should be mentioned that the musl implementation the author is so impressed with is assuming ASCII-7, which I am very much not impressed with.

But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.

If I might flaunt my own stuff here, this is PDCLib's take on the subject matter:

Code: Select all

int isalnum( int c )
{
    return ( isdigit( c ) || isalpha( c ) );
}

int isalpha( int c )
{
    return ( _PDCLIB_lc_ctype->entry[c].flags & _PDCLIB_CTYPE_ALPHA );
}

int isdigit( int c )
{
    return ( c >= _PDCLIB_lc_ctype->digits_low && c <= _PDCLIB_lc_ctype->digits_high );
}
Whereas _PDCLIB_lc_ctype is the current locale's lookup table. Yes, this segfaults when you go beyond UCHAR_MAX, but that's par for the course. (I prefer to have clients run into a segfault instead of "fixing" broken client code in PDCLib and have their code fail when compiled with a different lib.) And you shouldn't have difficulties finding out why your code bombed out.

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 9:54 am
by bzt
iansjack wrote:Apart from that, what have the Romans ever done for us?
ROTFL, perfect quote! You can like GNU or not, and I also admit they have many really bad and terrible projects, but without the GNU movement we would all be locked to proprietary software by now, that's for sure. GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org ;-)
Solar wrote:It should be mentioned that the musl implementation the author is so impressed with is assuming ASCII-7, which I am very much not impressed with.
I absolutely agree. You simply can't avoid UNICODE in the XXI. century, a libc should be able to understand UTF-8 imho.
Solar wrote:But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.
It could be done better, but I'm not sure it can be done without a mess. For example printf() must behave differently depending on previous setlocale() call, which cannot be done without global variables, which is a mess. So the standard itself forces significant part of the mess on the implementations. I mean POSIX defines errno as a TLS variable, but what happens if you set another (hidden) variable with setlocale() in one thread, and use printf() in another thread?

Cheers,
bzt

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 9:55 am
by nexos
iansjack wrote:
nexos wrote: The only thing useful they have made is GCC and Binutils
I'd say that bash has proved to be moderately useful.

Apart from that, what have the Romans ever done for us?
Yes bash and make are two other nice things by GNU :)
bzt wrote:GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org
GNU did do a lot for starting open source. nongnu, that's an interesting name! I guess they like open source, but not the GNU classic bloat :)
EDIT - now that I like at nongnu.org, it looks like it is run by the FSF. Did rms turn against his own project :? ?

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 10:42 am
by Korona
Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.

Code: Select all

src/fs-magic: Makefile
	@MANPAGER= man statfs \
	  |perl -ne '/File system types:/.../Nobody kno/ and print'	\
	  |grep 0x | perl -p						\
	    $(fs_normalize_perl_subst)					\
	  | grep -Ev 'S_MAGIC_EXT[34]|STACK_END'			\
	  | $(ASSORT)							\
	  > $@-t && mv $@-t $@

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 10:47 am
by PeterX
nexos wrote:
bzt wrote:GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org
GNU did do a lot for starting open source. nongnu, that's an interesting name! I guess they like open source, but not the GNU classic bloat :)
EDIT - now that I like at nongnu.org, it looks like it is run by the FSF. Did rms turn against his own project :? ?
No, Savannah is FSF's hosting platform. They divide free software into GNU and not GNU.

BTW They have made things like bison and GNUstep and mono and much more. And some folks like Emacs, which is part of GNU, too. (No editor wars, please!)
But that doesn't mean they have to write that kind of code.

And thanks iansjack for the quote from one of my favorite movies :)

And regarding UTF-8: Should isalpha(c) return true if it is a Chinese or German or whatever char? I guess so, but am not sure right now.

Greetings
Peter

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 10:58 am
by bzt
Solar wrote:

Code: Select all

int isdigit( int c )
{
    return ( c >= _PDCLIB_lc_ctype->digits_low && c <= _PDCLIB_lc_ctype->digits_high );
}
This made me wonder. Is this really work for non-English locales? For Chinese locale for example, isdigit(L'δΈ€') should return true, but what about isdigit('1')? Shouldn't it return true for both? I mean isn't there more intervals and separated code points for Chinese? There's 0x30-0x39 for sure, then U+4e00 (1), U+4e8c (2), U+4e09 (3) etc.? (Side note I have absolutely no clue why UNICODE hasn't defined Chinese numbers in a row... mixing up letters and numbers makes just no sense to me.)

Cheers,
bzt

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 11:19 am
by PeterX
Korona wrote:Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.

Code: Select all

src/fs-magic: Makefile
	@MANPAGER= man statfs \
	  |perl -ne '/File system types:/.../Nobody kno/ and print'	\
	  |grep 0x | perl -p						\
	    $(fs_normalize_perl_subst)					\
	  | grep -Ev 'S_MAGIC_EXT[34]|STACK_END'			\
	  | $(ASSORT)							\
	  > $@-t && mv $@-t $@
Woah! I know people who can probably make sense out of this stuff.
For me that's the sheer horror!

Greetings
Peter

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 11:31 am
by nexos
Korona wrote:Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.

Code: Select all

src/fs-magic: Makefile
	@MANPAGER= man statfs \
	  |perl -ne '/File system types:/.../Nobody kno/ and print'	\
	  |grep 0x | perl -p						\
	    $(fs_normalize_perl_subst)					\
	  | grep -Ev 'S_MAGIC_EXT[34]|STACK_END'			\
	  | $(ASSORT)							\
	  > $@-t && mv $@-t $@
Wow, you must be smart to understand that! I'm going to have nightmares :D .

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 11:57 am
by Korona
I don't think that this is smart at all, it's a mess. It would probably be faster to just copy the values than to write the script to extract them.

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 1:19 pm
by alexfru
nullplan wrote:That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into
+1. One should learn (to correctly use) their tools. The discussed implementation simply embraces the nature of the language. It isn't user-friendly by today's standards but hey today we've got tons of informative resources on C. In the 90's it was hard to get all that if you didn't have internet access and had to use whatever C literature someone had translated into your native language. We have wonderful stuff available online today. Free drafts of the C standard are there as well (it's true, they aren't an easy read either, but with some persistence you could figure it out and learn the important stuff that many introductory C books have omitted).

Re: 2 C runtime libraries

Posted: Sat Nov 14, 2020 2:05 pm
by PeterX
alexfru wrote:
nullplan wrote:That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into
+1. One should learn (to correctly use) their tools. The discussed implementation simply embraces the nature of the language. It isn't user-friendly by today's standards but hey today we've got tons of informative resources on C. In the 90's it was hard to get all that if you didn't have internet access and had to use whatever C literature someone had translated into your native language. We have wonderful stuff available online today. Free drafts of the C standard are there as well (it's true, they aren't an easy read either, but with some persistence you could figure it out and learn the important stuff that many introductory C books have omitted).
But does that explain why the GNUs use such a code, full of preprocessor instructions?

Greetings
Peter