C standard library redesign

nil · Post by **nil** » Mon Nov 25, 2013 1:20 am

I've decided that I don't care about compatibility with existing *nix operating systems. This gives me the opportunity to change the C standard library in any way I want without worrying about breaking existing code.

So, what I'm wondering is what improvements do you think could be made if one were to start from scratch? I'm interested both in minor, obvious things that most people would agree on and in more opinionated things. An example of the latter is storing lengths of strings rather than null-terminating, which is something I plan to do.

Brendan · Post by **Brendan** » Mon Nov 25, 2013 2:05 am

Hi,

nil wrote:I've decided that I don't care about compatibility with existing *nix operating systems. This gives me the opportunity to change the C standard library in any way I want without worrying about breaking existing code.

So, what I'm wondering is what improvements do you think could be made if one were to start from scratch? I'm interested both in minor, obvious things that most people would agree on and in more opinionated things. An example of the latter is storing lengths of strings rather than null-terminating, which is something I plan to do.

The first thing I'd do is get rid of "errno"; such that any/all functions that can return an error return an error code directly. The next thing is character and string handling - ASCII should be considered deprecated and "wchar" is a mess (better to use "uint8_t" for UTF-8 and "uint32_t" for UTF-32).

These 2 changes alone destroy most of the standard C library.

For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).

Cheers,

Brendan

nil · Post by **nil** » Mon Nov 25, 2013 2:37 am

Thanks for the suggestions, Brendan,

Brendan wrote:The first thing I'd do is get rid of "errno"; such that any/all functions that can return an error return an error code directly.

That sounds like a good idea, I think I'll definitely do that.

Brendan wrote:The next thing is character and string handling - ASCII should be considered deprecated and "wchar" is a mess (better to use "uint8_t" for UTF-8 and "uint32_t" for UTF-32).

Hmm... I do want to limit the scope of my project to avoid getting lost in feature creep, so I'm not sure if I'm planning to support Unicode. This would definitely be the way to go if I do support it though.

Brendan wrote:For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).

This is true, but you can do a similar thing without too much more effort when storing the length, as you just have set the length in your new string to (old_length - (new_ptr - old_ptr)). While you can cache the length it's definitely a pain to do so when you're passing the string around to other functions, for example. There are definitely benefits to both sides, I'll have to consider it carefully before picking one - I don't want to make hasty decisions that sound good on paper and then regret them 50k lines later when it would take substantial effort to rework.

iansjack · Post by **iansjack** » Mon Nov 25, 2013 3:00 am

The string+length vs null-terminated argument is an old one. There is a third possibility that you could consider - store the length and use a null terminator.

Owen · Post by **Owen** » Mon Nov 25, 2013 3:48 am

There is a great distinction between the C standard library (As defined by the ISO C standard) and the POSIX standard library (which encompasses all of the former and is about two orders of magnitude larger). I'd note that difference before I tried to redesign the "C standard library"

I agree, if doing over the C standard library today, I'd eliminate errno; but for the functionality contained within the C standard library its' not that problematic (and also recent functionality ignores it; and optional annexe K adds variants of most functions which return an error value)

nil wrote:
Brendan wrote:The next thing is character and string handling - ASCII should be considered deprecated and "wchar" is a mess (better to use "uint8_t" for UTF-8 and "uint32_t" for UTF-32).
Hmm... I do want to limit the scope of my project to avoid getting lost in feature creep, so I'm not sure if I'm planning to support Unicode. This would definitely be the way to go if I do support it though.

Supporting Unicode is easy. Certainly implementing a UTF-8 to UCS-4 codec is an order of magnitude less work than implementing, say, malloc or printf.

"wchar" is kind of a mess, mostly because there are two opinions of it (Windows: its' 16-bit, because if we change it the ABI for every damn function breaks; Everybody else: Its' 32-bit, because thats what the standard mandates if you're doing Unicode). However, for your platform you can just say "wchar_t is always UCS-4" (and be in no worse situation than you would be redesigning that portion of the standard library)

Note that C11 added char16_t and char32_t, which it recommends to implement using Unicode (and everybody does; there is a define you can use to verify this. Additionally, C++11 outright states that they're Unicode)

Again, if you're redesigning the library for you OS... Why not make them always Unicode first?

nil wrote:
Brendan wrote:For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).
This is true, but you can do a similar thing without too much more effort when storing the length, as you just have set the length in your new string to (old_length - (new_ptr - old_ptr)). While you can cache the length it's definitely a pain to do so when you're passing the string around to other functions, for example. There are definitely benefits to both sides, I'll have to consider it carefully before picking one - I don't want to make hasty decisions that sound good on paper and then regret them 50k lines later when it would take substantial effort to rework.

The "most powerful" abstraction is to have your string type be struct { char* string; size_t length}; (i.e. pointing to the byte buffer, but containing a length); of course, when dealing in substrings one must be careful to not mutate a shared string unexpectedly (etc).

bwat · Post by **bwat** » Mon Nov 25, 2013 5:07 am

Just rewriting the mathematics libraries with proper documentation should keep you busy for about 6 months full-time if you haven't implemented these sorts of things before. I'm not joking.

Combuster · Post by **Combuster** » Mon Nov 25, 2013 8:14 am

Just rewriting the mathematics libraries with proper documentation should keep you busy for about 6 months full-time if you haven't implemented these sorts of things before. I'm not joking.

You must be terrible at math or programming. Probably programming since the needed equations are one google search away.

rdos · Post by **rdos** » Mon Nov 25, 2013 2:23 pm

I'd remove the error-codes all together, replacing them with success (TRUE) and failure (FALSE). In regards to unicode, I would not support neither wchar nor 32-bit chars. UTF-8 is enough for any practical purpose, and doesn't require special string APIs.

BMW · Post by **BMW** » Mon Nov 25, 2013 4:38 pm

rdos wrote:I'd remove the error-codes all together, replacing them with success (TRUE) and failure (FALSE).

I wouldn't. There's nothing worse than something simpy failing with no error code.

bwat · Post by **bwat** » Tue Nov 26, 2013 12:04 am

Combuster wrote:
Just rewriting the mathematics libraries with proper documentation should keep you busy for about 6 months full-time if you haven't implemented these sorts of things before. I'm not joking.
You must be terrible at math or programming. Probably programming since the needed equations are one google search away.

You just simply haven't got a clue. You obviously have never done this yourself.

For those who are interested in these things, here's a list of references that I used to create my maths library from the bottom up: arbitrary precision integer arithmetic library, arbitrary precision floating-point arithmetic library, elementary functions library, special functions library, and statistics library --- each library builds upon the previous one:

Computer Approximations, Hart et al.
Elementary Functions, Muller.
Handbook of Mathematical Equations, Abramowitz and Stegun.
ISO/IEC 10967-1 Language Independent Arithmetic - Part 1: Integer and floating-point arithmetic, ISO.
ISO/IEC 10967-2 Language Independent Arithmetic - Part 2: Elementary numerical functions, ISO.
Modern Computer Arithmetic, Brent and Zimmerman.
Software Manual for the Elementary Functions, Cody and Waite.
The Art of Computer Programming, Vol. 2 Seminumerical Algorithms, Knuth.
TOPS-10/TOPS-20 Common Math Library Reference Manual, Digital Equipment Corporation.

Combuster · Post by **Combuster** » Tue Nov 26, 2013 4:32 am

Your argument already failed since you're suddenly requiring everything that's not in the standard library in an attempt to keep up appearances.

And -1 for the ad hominem.

bwat · Post by **bwat** » Tue Nov 26, 2013 5:06 am

Combuster wrote:Your argument already failed since you're suddenly requiring everything that's not in the standard library in an attempt to keep up appearances.

I stand by my claim about effort required. I also stand by my claim about your ignorance on the matter. You're free to make me look like a fool by giving us details about how long it took you to develop your first maths library from scratch.

Combuster wrote:And -1 for the ad hominem.

I claimed you "didn't have a clue" because it was obvious you were talking about something you have no practical experience of. There's 299 functions in libm on opensolaris http://www.unix.com/man-page/opensolaris/3lib/libm/. Now, 6 months full time is 26 weeks, that gives 11.5 functions designed, implemented, tested and documented a week on average - that's over 2 per day working a 5 day week. My claim was 6 months "if you haven't implemented these sorts of things before" so we have to include background reading on computer approximations and floating-point arithmetic.

With this in mind, I stand by my claim, you don't have a clue.

Brendan · Post by **Brendan** » Tue Nov 26, 2013 6:05 am

Hi,

bwat wrote:
Combuster wrote:Your argument already failed since you're suddenly requiring everything that's not in the standard library in an attempt to keep up appearances.
I stand by my claim about effort required. I also stand by my claim about your ignorance on the matter. You're free to make me look like a fool by giving us details about how long it took you to develop your first maths library from scratch.

For 80x86; the majority of it is just inline assembly wrappers (where the CPU's FPU does it for you). The time consuming part would be making sure it complies with the relevant standard/s. If someone doesn't care about the standards (e.g. they're redesigning everything and therefore creating their own standards) then they don't have to care about whether it complies with any existing standard/s.

Of course someone could also simply not bother supporting any or all of it (e.g. decide that the maths library is a third-party thing that isn't part of their new standard). In this case, I'm sure it won't take anyone very long to implement nothing.

Cheers,

Brendan

Combuster · Post by **Combuster** » Tue Nov 26, 2013 6:35 am

bwat wrote:There's 299 functions in libm on opensolaris http://www.unix.com/man-page/opensolaris/3lib/libm/.With this in mind, I stand by my claim, you don't have a clue.

Actually, there are only some 100 distinct functions in there. Several of them are more costly to document than to write because they aren't anything more than basic identities over other functions. Basically, 65% is copy-paste-substitute-type-with-type and another 15% is so unsophisticated that I could hardly do that faster than an intern.

bwat · Post by **bwat** » Tue Nov 26, 2013 6:38 am

Brendan wrote:Hi,
For 80x86; the majority of it is just inline assembly wrappers (where the CPU's FPU does it for you). The time consuming part would be making sure it complies with the relevant standard/s. If someone doesn't care about the standards (e.g. they're redesigning everything and therefore creating their own standards) then they don't have to care about whether it complies with any existing standard/s.

With maths libraries you have to worry not just about standards you choose to adopt but accuracy as well. I've personally not seen a standard that enforces accuracy of elementary or special functions (easy to understand why as the accuracy can be platform specific). Even if no standard is chosen, i.e., we take or leave functions at our leisure, the issue of accuracy has to be dealt with for each function. This can be very tricky.

Regardless of quality of implementation, there's the issue of quality of documentation as well. Good documentation takes time. Here's an example of what I think is good documentation for a maths library:http://bitsavers.trailing-edge.com/pdf/ ... _Sep83.pdf
Every one of the functions has its maximum relative error and its average relative error reported. You don't see that too often unfortunately.

Brendan wrote: Of course someone could also simply not bother supporting any or all of it (e.g. decide that the maths library is a third-party thing that isn't part of their new standard). In this case, I'm sure it won't take anyone very long to implement nothing.

I wouldn't support it in a new OS. One of the reasons I wrote my own was because I found too many differences between the results of elementary functions across different platforms. Now I get the same result regardless of platform.

OSDev.org

C standard library redesign

C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign

Re: C standard library redesign