2 C runtime libraries

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
PeterX
Member
Member
Posts: 590
Joined: Fri Nov 22, 2019 5:46 am

2 C runtime libraries

Post by PeterX »

Here's some blog entry about a comparison of musl and glibc (regarding isalnum()).
https://drewdevault.com/2020/09/25/A-st ... libcs.html

It's interesting about different coding "styles". Maybe there's a lesson to learn from that (about how to code and how not to code).

Greetings
Peter
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: 2 C runtime libraries

Post by nullplan »

I read the first bit, and immediately recall that calling the ctype macros with numbers not in range for an unsigned char (and unequal to EOF) is undefined behavior. I have seen libcs that access arrays with the argument, assuming it cannot be 256 or larger, and can at least be -1. And for the wide-character versions of these, musl will also use a giant array and index it with the argument.

That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into, and even I, who has worked on this stuff for half my life, cannot avoid all the pitfalls. But mostly I just look things up as needed. One example I always find intriguing is that it is undefined behavior to be converting a floating-point number into any integer type if the integer part of that number is not in range of the destination type. I have been in situations where converting a float to a short is undefined behavior, but converting it from a float to long and then a short is perfectly fine.

EDIT after reading: Dear lord! I had dived into glibc code before, so I kind of knew what to expect, but that stuff is always ridiculous. I wonder if that stuff just naturally happens to longstanding codebases. Features get added over time and the gap between the old and new interfaces is spackled over with some macros, until the entire unholy mess is no longer readable. The dependency on the byte order is probably just because they made the locale definitions machine-independent binary files, and instead of properly unmarshalling the array of bytes, they just mess up their bit definitions instead. Because that is just how some people like to write their code. The whole thing needs to be locale-dependent, because glibc supports character sets other than UTF-8. And suddenly everything makes sense.
Carpe diem!
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: 2 C runtime libraries

Post by nexos »

Just another reason why to very much dislike GNU. The only thing useful they have made is GCC and Binutils, which I like better then Clang.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
User avatar
iansjack
Member
Member
Posts: 4703
Joined: Sat Mar 31, 2012 3:07 am
Location: Chichester, UK

Re: 2 C runtime libraries

Post by iansjack »

nexos wrote: The only thing useful they have made is GCC and Binutils
I'd say that bash has proved to be moderately useful.

Apart from that, what have the Romans ever done for us?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: 2 C runtime libraries

Post by Solar »

It should be mentioned that the musl implementation the author is so impressed with is assuming ASCII-7, which I am very much not impressed with.

But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.

If I might flaunt my own stuff here, this is PDCLib's take on the subject matter:

Code: Select all

int isalnum( int c )
{
    return ( isdigit( c ) || isalpha( c ) );
}

int isalpha( int c )
{
    return ( _PDCLIB_lc_ctype->entry[c].flags & _PDCLIB_CTYPE_ALPHA );
}

int isdigit( int c )
{
    return ( c >= _PDCLIB_lc_ctype->digits_low && c <= _PDCLIB_lc_ctype->digits_high );
}
Whereas _PDCLIB_lc_ctype is the current locale's lookup table. Yes, this segfaults when you go beyond UCHAR_MAX, but that's par for the course. (I prefer to have clients run into a segfault instead of "fixing" broken client code in PDCLib and have their code fail when compiled with a different lib.) And you shouldn't have difficulties finding out why your code bombed out.
Every good solution is obvious once you've found it.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: 2 C runtime libraries

Post by bzt »

iansjack wrote:Apart from that, what have the Romans ever done for us?
ROTFL, perfect quote! You can like GNU or not, and I also admit they have many really bad and terrible projects, but without the GNU movement we would all be locked to proprietary software by now, that's for sure. GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org ;-)
Solar wrote:It should be mentioned that the musl implementation the author is so impressed with is assuming ASCII-7, which I am very much not impressed with.
I absolutely agree. You simply can't avoid UNICODE in the XXI. century, a libc should be able to understand UTF-8 imho.
Solar wrote:But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.
It could be done better, but I'm not sure it can be done without a mess. For example printf() must behave differently depending on previous setlocale() call, which cannot be done without global variables, which is a mess. So the standard itself forces significant part of the mess on the implementations. I mean POSIX defines errno as a TLS variable, but what happens if you set another (hidden) variable with setlocale() in one thread, and use printf() in another thread?

Cheers,
bzt
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: 2 C runtime libraries

Post by nexos »

iansjack wrote:
nexos wrote: The only thing useful they have made is GCC and Binutils
I'd say that bash has proved to be moderately useful.

Apart from that, what have the Romans ever done for us?
Yes bash and make are two other nice things by GNU :)
bzt wrote:GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org
GNU did do a lot for starting open source. nongnu, that's an interesting name! I guess they like open source, but not the GNU classic bloat :)
EDIT - now that I like at nongnu.org, it looks like it is run by the FSF. Did rms turn against his own project :? ?
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: 2 C runtime libraries

Post by Korona »

Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.

Code: Select all

src/fs-magic: Makefile
	@MANPAGER= man statfs \
	  |perl -ne '/File system types:/.../Nobody kno/ and print'	\
	  |grep 0x | perl -p						\
	    $(fs_normalize_perl_subst)					\
	  | grep -Ev 'S_MAGIC_EXT[34]|STACK_END'			\
	  | $(ASSORT)							\
	  > $@-t && mv $@-t $@
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
PeterX
Member
Member
Posts: 590
Joined: Fri Nov 22, 2019 5:46 am

Re: 2 C runtime libraries

Post by PeterX »

nexos wrote:
bzt wrote:GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org
GNU did do a lot for starting open source. nongnu, that's an interesting name! I guess they like open source, but not the GNU classic bloat :)
EDIT - now that I like at nongnu.org, it looks like it is run by the FSF. Did rms turn against his own project :? ?
No, Savannah is FSF's hosting platform. They divide free software into GNU and not GNU.

BTW They have made things like bison and GNUstep and mono and much more. And some folks like Emacs, which is part of GNU, too. (No editor wars, please!)
But that doesn't mean they have to write that kind of code.

And thanks iansjack for the quote from one of my favorite movies :)

And regarding UTF-8: Should isalpha(c) return true if it is a Chinese or German or whatever char? I guess so, but am not sure right now.

Greetings
Peter
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: 2 C runtime libraries

Post by bzt »

Solar wrote:

Code: Select all

int isdigit( int c )
{
    return ( c >= _PDCLIB_lc_ctype->digits_low && c <= _PDCLIB_lc_ctype->digits_high );
}
This made me wonder. Is this really work for non-English locales? For Chinese locale for example, isdigit(L'一') should return true, but what about isdigit('1')? Shouldn't it return true for both? I mean isn't there more intervals and separated code points for Chinese? There's 0x30-0x39 for sure, then U+4e00 (1), U+4e8c (2), U+4e09 (3) etc.? (Side note I have absolutely no clue why UNICODE hasn't defined Chinese numbers in a row... mixing up letters and numbers makes just no sense to me.)

Cheers,
bzt
PeterX
Member
Member
Posts: 590
Joined: Fri Nov 22, 2019 5:46 am

Re: 2 C runtime libraries

Post by PeterX »

Korona wrote:Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.

Code: Select all

src/fs-magic: Makefile
	@MANPAGER= man statfs \
	  |perl -ne '/File system types:/.../Nobody kno/ and print'	\
	  |grep 0x | perl -p						\
	    $(fs_normalize_perl_subst)					\
	  | grep -Ev 'S_MAGIC_EXT[34]|STACK_END'			\
	  | $(ASSORT)							\
	  > $@-t && mv $@-t $@
Woah! I know people who can probably make sense out of this stuff.
For me that's the sheer horror!

Greetings
Peter
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: 2 C runtime libraries

Post by nexos »

Korona wrote:Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.

Code: Select all

src/fs-magic: Makefile
	@MANPAGER= man statfs \
	  |perl -ne '/File system types:/.../Nobody kno/ and print'	\
	  |grep 0x | perl -p						\
	    $(fs_normalize_perl_subst)					\
	  | grep -Ev 'S_MAGIC_EXT[34]|STACK_END'			\
	  | $(ASSORT)							\
	  > $@-t && mv $@-t $@
Wow, you must be smart to understand that! I'm going to have nightmares :D .
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
Korona
Member
Member
Posts: 1000
Joined: Thu May 17, 2007 1:27 pm
Contact:

Re: 2 C runtime libraries

Post by Korona »

I don't think that this is smart at all, it's a mess. It would probably be faster to just copy the values than to write the script to extract them.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
alexfru
Member
Member
Posts: 1111
Joined: Tue Mar 04, 2014 5:27 am

Re: 2 C runtime libraries

Post by alexfru »

nullplan wrote:That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into
+1. One should learn (to correctly use) their tools. The discussed implementation simply embraces the nature of the language. It isn't user-friendly by today's standards but hey today we've got tons of informative resources on C. In the 90's it was hard to get all that if you didn't have internet access and had to use whatever C literature someone had translated into your native language. We have wonderful stuff available online today. Free drafts of the C standard are there as well (it's true, they aren't an easy read either, but with some persistence you could figure it out and learn the important stuff that many introductory C books have omitted).
PeterX
Member
Member
Posts: 590
Joined: Fri Nov 22, 2019 5:46 am

Re: 2 C runtime libraries

Post by PeterX »

alexfru wrote:
nullplan wrote:That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into
+1. One should learn (to correctly use) their tools. The discussed implementation simply embraces the nature of the language. It isn't user-friendly by today's standards but hey today we've got tons of informative resources on C. In the 90's it was hard to get all that if you didn't have internet access and had to use whatever C literature someone had translated into your native language. We have wonderful stuff available online today. Free drafts of the C standard are there as well (it's true, they aren't an easy read either, but with some persistence you could figure it out and learn the important stuff that many introductory C books have omitted).
But does that explain why the GNUs use such a code, full of preprocessor instructions?

Greetings
Peter
Post Reply