2 C runtime libraries
2 C runtime libraries
Here's some blog entry about a comparison of musl and glibc (regarding isalnum()).
https://drewdevault.com/2020/09/25/A-st ... libcs.html
It's interesting about different coding "styles". Maybe there's a lesson to learn from that (about how to code and how not to code).
Greetings
Peter
https://drewdevault.com/2020/09/25/A-st ... libcs.html
It's interesting about different coding "styles". Maybe there's a lesson to learn from that (about how to code and how not to code).
Greetings
Peter
Re: 2 C runtime libraries
I read the first bit, and immediately recall that calling the ctype macros with numbers not in range for an unsigned char (and unequal to EOF) is undefined behavior. I have seen libcs that access arrays with the argument, assuming it cannot be 256 or larger, and can at least be -1. And for the wide-character versions of these, musl will also use a giant array and index it with the argument.
That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into, and even I, who has worked on this stuff for half my life, cannot avoid all the pitfalls. But mostly I just look things up as needed. One example I always find intriguing is that it is undefined behavior to be converting a floating-point number into any integer type if the integer part of that number is not in range of the destination type. I have been in situations where converting a float to a short is undefined behavior, but converting it from a float to long and then a short is perfectly fine.
EDIT after reading: Dear lord! I had dived into glibc code before, so I kind of knew what to expect, but that stuff is always ridiculous. I wonder if that stuff just naturally happens to longstanding codebases. Features get added over time and the gap between the old and new interfaces is spackled over with some macros, until the entire unholy mess is no longer readable. The dependency on the byte order is probably just because they made the locale definitions machine-independent binary files, and instead of properly unmarshalling the array of bytes, they just mess up their bit definitions instead. Because that is just how some people like to write their code. The whole thing needs to be locale-dependent, because glibc supports character sets other than UTF-8. And suddenly everything makes sense.
That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into, and even I, who has worked on this stuff for half my life, cannot avoid all the pitfalls. But mostly I just look things up as needed. One example I always find intriguing is that it is undefined behavior to be converting a floating-point number into any integer type if the integer part of that number is not in range of the destination type. I have been in situations where converting a float to a short is undefined behavior, but converting it from a float to long and then a short is perfectly fine.
EDIT after reading: Dear lord! I had dived into glibc code before, so I kind of knew what to expect, but that stuff is always ridiculous. I wonder if that stuff just naturally happens to longstanding codebases. Features get added over time and the gap between the old and new interfaces is spackled over with some macros, until the entire unholy mess is no longer readable. The dependency on the byte order is probably just because they made the locale definitions machine-independent binary files, and instead of properly unmarshalling the array of bytes, they just mess up their bit definitions instead. Because that is just how some people like to write their code. The whole thing needs to be locale-dependent, because glibc supports character sets other than UTF-8. And suddenly everything makes sense.
Carpe diem!
Re: 2 C runtime libraries
Just another reason why to very much dislike GNU. The only thing useful they have made is GCC and Binutils, which I like better then Clang.
Re: 2 C runtime libraries
I'd say that bash has proved to be moderately useful.nexos wrote: The only thing useful they have made is GCC and Binutils
Apart from that, what have the Romans ever done for us?
Re: 2 C runtime libraries
It should be mentioned that the musl implementation the author is so impressed with is assuming ASCII-7, which I am very much not impressed with.
But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.
If I might flaunt my own stuff here, this is PDCLib's take on the subject matter:
Whereas _PDCLIB_lc_ctype is the current locale's lookup table. Yes, this segfaults when you go beyond UCHAR_MAX, but that's par for the course. (I prefer to have clients run into a segfault instead of "fixing" broken client code in PDCLib and have their code fail when compiled with a different lib.) And you shouldn't have difficulties finding out why your code bombed out.
But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.
If I might flaunt my own stuff here, this is PDCLib's take on the subject matter:
Code: Select all
int isalnum( int c )
{
return ( isdigit( c ) || isalpha( c ) );
}
int isalpha( int c )
{
return ( _PDCLIB_lc_ctype->entry[c].flags & _PDCLIB_CTYPE_ALPHA );
}
int isdigit( int c )
{
return ( c >= _PDCLIB_lc_ctype->digits_low && c <= _PDCLIB_lc_ctype->digits_high );
}
Every good solution is obvious once you've found it.
Re: 2 C runtime libraries
ROTFL, perfect quote! You can like GNU or not, and I also admit they have many really bad and terrible projects, but without the GNU movement we would all be locked to proprietary software by now, that's for sure. GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.orgiansjack wrote:Apart from that, what have the Romans ever done for us?
I absolutely agree. You simply can't avoid UNICODE in the XXI. century, a libc should be able to understand UTF-8 imho.Solar wrote:It should be mentioned that the musl implementation the author is so impressed with is assuming ASCII-7, which I am very much not impressed with.
It could be done better, but I'm not sure it can be done without a mess. For example printf() must behave differently depending on previous setlocale() call, which cannot be done without global variables, which is a mess. So the standard itself forces significant part of the mess on the implementations. I mean POSIX defines errno as a TLS variable, but what happens if you set another (hidden) variable with setlocale() in one thread, and use printf() in another thread?Solar wrote:But locale awareness is not that hard to implement either, and should not result in the wicked mess that is GNU code.
Cheers,
bzt
Re: 2 C runtime libraries
Yes bash and make are two other nice things by GNUiansjack wrote:I'd say that bash has proved to be moderately useful.nexos wrote: The only thing useful they have made is GCC and Binutils
Apart from that, what have the Romans ever done for us?
GNU did do a lot for starting open source. nongnu, that's an interesting name! I guess they like open source, but not the GNU classic bloatbzt wrote:GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org
EDIT - now that I like at nongnu.org, it looks like it is run by the FSF. Did rms turn against his own project ?
Re: 2 C runtime libraries
Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.
Code: Select all
src/fs-magic: Makefile
@MANPAGER= man statfs \
|perl -ne '/File system types:/.../Nobody kno/ and print' \
|grep 0x | perl -p \
$(fs_normalize_perl_subst) \
| grep -Ev 'S_MAGIC_EXT[34]|STACK_END' \
| $(ASSORT) \
> $@-t && mv $@-t $@
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: 2 C runtime libraries
No, Savannah is FSF's hosting platform. They divide free software into GNU and not GNU.nexos wrote:GNU did do a lot for starting open source. nongnu, that's an interesting name! I guess they like open source, but not the GNU classic bloatbzt wrote:GNU is and always was one of the biggest force behind Open Source and software freedom. Funny thing, the other force being nongnu.org
EDIT - now that I like at nongnu.org, it looks like it is run by the FSF. Did rms turn against his own project ?
BTW They have made things like bison and GNUstep and mono and much more. And some folks like Emacs, which is part of GNU, too. (No editor wars, please!)
But that doesn't mean they have to write that kind of code.
And thanks iansjack for the quote from one of my favorite movies
And regarding UTF-8: Should isalpha(c) return true if it is a Chinese or German or whatever char? I guess so, but am not sure right now.
Greetings
Peter
Re: 2 C runtime libraries
This made me wonder. Is this really work for non-English locales? For Chinese locale for example, isdigit(L'一') should return true, but what about isdigit('1')? Shouldn't it return true for both? I mean isn't there more intervals and separated code points for Chinese? There's 0x30-0x39 for sure, then U+4e00 (1), U+4e8c (2), U+4e09 (3) etc.? (Side note I have absolutely no clue why UNICODE hasn't defined Chinese numbers in a row... mixing up letters and numbers makes just no sense to me.)Solar wrote:Code: Select all
int isdigit( int c ) { return ( c >= _PDCLIB_lc_ctype->digits_low && c <= _PDCLIB_lc_ctype->digits_high ); }
Cheers,
bzt
Re: 2 C runtime libraries
Woah! I know people who can probably make sense out of this stuff.Korona wrote:Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.Code: Select all
src/fs-magic: Makefile @MANPAGER= man statfs \ |perl -ne '/File system types:/.../Nobody kno/ and print' \ |grep 0x | perl -p \ $(fs_normalize_perl_subst) \ | grep -Ev 'S_MAGIC_EXT[34]|STACK_END' \ | $(ASSORT) \ > $@-t && mv $@-t $@
For me that's the sheer horror!
Greetings
Peter
Re: 2 C runtime libraries
Wow, you must be smart to understand that! I'm going to have nightmares .Korona wrote:Without GNU, we would not have invented the astonishing technique of extracting values of symbolic constants out of the host's man pages at compile time.Code: Select all
src/fs-magic: Makefile @MANPAGER= man statfs \ |perl -ne '/File system types:/.../Nobody kno/ and print' \ |grep 0x | perl -p \ $(fs_normalize_perl_subst) \ | grep -Ev 'S_MAGIC_EXT[34]|STACK_END' \ | $(ASSORT) \ > $@-t && mv $@-t $@
Re: 2 C runtime libraries
I don't think that this is smart at all, it's a mess. It would probably be faster to just copy the values than to write the script to extract them.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: 2 C runtime libraries
+1. One should learn (to correctly use) their tools. The discussed implementation simply embraces the nature of the language. It isn't user-friendly by today's standards but hey today we've got tons of informative resources on C. In the 90's it was hard to get all that if you didn't have internet access and had to use whatever C literature someone had translated into your native language. We have wonderful stuff available online today. Free drafts of the C standard are there as well (it's true, they aren't an easy read either, but with some persistence you could figure it out and learn the important stuff that many introductory C books have omitted).nullplan wrote:That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into
Re: 2 C runtime libraries
But does that explain why the GNUs use such a code, full of preprocessor instructions?alexfru wrote:+1. One should learn (to correctly use) their tools. The discussed implementation simply embraces the nature of the language. It isn't user-friendly by today's standards but hey today we've got tons of informative resources on C. In the 90's it was hard to get all that if you didn't have internet access and had to use whatever C literature someone had translated into your native language. We have wonderful stuff available online today. Free drafts of the C standard are there as well (it's true, they aren't an easy read either, but with some persistence you could figure it out and learn the important stuff that many introductory C books have omitted).nullplan wrote:That's why you should know the programming language. C has so much undefined behavior that it is very easy to fall into
Greetings
Peter