C standard library redesign
C standard library redesign
I've decided that I don't care about compatibility with existing *nix operating systems. This gives me the opportunity to change the C standard library in any way I want without worrying about breaking existing code.
So, what I'm wondering is what improvements do you think could be made if one were to start from scratch? I'm interested both in minor, obvious things that most people would agree on and in more opinionated things. An example of the latter is storing lengths of strings rather than null-terminating, which is something I plan to do.
So, what I'm wondering is what improvements do you think could be made if one were to start from scratch? I'm interested both in minor, obvious things that most people would agree on and in more opinionated things. An example of the latter is storing lengths of strings rather than null-terminating, which is something I plan to do.
Re: C standard library redesign
Hi,
These 2 changes alone destroy most of the standard C library.
For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).
Cheers,
Brendan
The first thing I'd do is get rid of "errno"; such that any/all functions that can return an error return an error code directly. The next thing is character and string handling - ASCII should be considered deprecated and "wchar" is a mess (better to use "uint8_t" for UTF-8 and "uint32_t" for UTF-32).nil wrote:I've decided that I don't care about compatibility with existing *nix operating systems. This gives me the opportunity to change the C standard library in any way I want without worrying about breaking existing code.
So, what I'm wondering is what improvements do you think could be made if one were to start from scratch? I'm interested both in minor, obvious things that most people would agree on and in more opinionated things. An example of the latter is storing lengths of strings rather than null-terminating, which is something I plan to do.
These 2 changes alone destroy most of the standard C library.
For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: C standard library redesign
Thanks for the suggestions, Brendan,
That sounds like a good idea, I think I'll definitely do that.Brendan wrote:The first thing I'd do is get rid of "errno"; such that any/all functions that can return an error return an error code directly.
Hmm... I do want to limit the scope of my project to avoid getting lost in feature creep, so I'm not sure if I'm planning to support Unicode. This would definitely be the way to go if I do support it though.Brendan wrote:The next thing is character and string handling - ASCII should be considered deprecated and "wchar" is a mess (better to use "uint8_t" for UTF-8 and "uint32_t" for UTF-32).
This is true, but you can do a similar thing without too much more effort when storing the length, as you just have set the length in your new string to (old_length - (new_ptr - old_ptr)). While you can cache the length it's definitely a pain to do so when you're passing the string around to other functions, for example. There are definitely benefits to both sides, I'll have to consider it carefully before picking one - I don't want to make hasty decisions that sound good on paper and then regret them 50k lines later when it would take substantial effort to rework.Brendan wrote:For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).
Re: C standard library redesign
The string+length vs null-terminated argument is an old one. There is a third possibility that you could consider - store the length and use a null terminator.
- Owen
- Member
- Posts: 1700
- Joined: Fri Jun 13, 2008 3:21 pm
- Location: Cambridge, United Kingdom
- Contact:
Re: C standard library redesign
There is a great distinction between the C standard library (As defined by the ISO C standard) and the POSIX standard library (which encompasses all of the former and is about two orders of magnitude larger). I'd note that difference before I tried to redesign the "C standard library"
I agree, if doing over the C standard library today, I'd eliminate errno; but for the functionality contained within the C standard library its' not that problematic (and also recent functionality ignores it; and optional annexe K adds variants of most functions which return an error value)
"wchar" is kind of a mess, mostly because there are two opinions of it (Windows: its' 16-bit, because if we change it the ABI for every damn function breaks; Everybody else: Its' 32-bit, because thats what the standard mandates if you're doing Unicode). However, for your platform you can just say "wchar_t is always UCS-4" (and be in no worse situation than you would be redesigning that portion of the standard library)
Note that C11 added char16_t and char32_t, which it recommends to implement using Unicode (and everybody does; there is a define you can use to verify this. Additionally, C++11 outright states that they're Unicode)
Again, if you're redesigning the library for you OS... Why not make them always Unicode first?
I agree, if doing over the C standard library today, I'd eliminate errno; but for the functionality contained within the C standard library its' not that problematic (and also recent functionality ignores it; and optional annexe K adds variants of most functions which return an error value)
Supporting Unicode is easy. Certainly implementing a UTF-8 to UCS-4 codec is an order of magnitude less work than implementing, say, malloc or printf.nil wrote:Hmm... I do want to limit the scope of my project to avoid getting lost in feature creep, so I'm not sure if I'm planning to support Unicode. This would definitely be the way to go if I do support it though.Brendan wrote:The next thing is character and string handling - ASCII should be considered deprecated and "wchar" is a mess (better to use "uint8_t" for UTF-8 and "uint32_t" for UTF-32).
"wchar" is kind of a mess, mostly because there are two opinions of it (Windows: its' 16-bit, because if we change it the ABI for every damn function breaks; Everybody else: Its' 32-bit, because thats what the standard mandates if you're doing Unicode). However, for your platform you can just say "wchar_t is always UCS-4" (and be in no worse situation than you would be redesigning that portion of the standard library)
Note that C11 added char16_t and char32_t, which it recommends to implement using Unicode (and everybody does; there is a define you can use to verify this. Additionally, C++11 outright states that they're Unicode)
Again, if you're redesigning the library for you OS... Why not make them always Unicode first?
The "most powerful" abstraction is to have your string type be struct { char* string; size_t length}; (i.e. pointing to the byte buffer, but containing a length); of course, when dealing in substrings one must be careful to not mutate a shared string unexpectedly (etc).nil wrote:This is true, but you can do a similar thing without too much more effort when storing the length, as you just have set the length in your new string to (old_length - (new_ptr - old_ptr)). While you can cache the length it's definitely a pain to do so when you're passing the string around to other functions, for example. There are definitely benefits to both sides, I'll have to consider it carefully before picking one - I don't want to make hasty decisions that sound good on paper and then regret them 50k lines later when it would take substantial effort to rework.Brendan wrote:For strings, using a zero terminator is much more powerful, as a pointer to anywhere within the string is a valid zero terminated sub-string. Software can (and often should) cache "length" separately, but sadly programmers are lazy (which is the programmer's fault, not the language's fault).
Re: C standard library redesign
Just rewriting the mathematics libraries with proper documentation should keep you busy for about 6 months full-time if you haven't implemented these sorts of things before. I'm not joking.
Every universe of discourse has its logical structure --- S. K. Langer.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: C standard library redesign
You must be terrible at math or programming. Probably programming since the needed equations are one google search away.Just rewriting the mathematics libraries with proper documentation should keep you busy for about 6 months full-time if you haven't implemented these sorts of things before. I'm not joking.
Re: C standard library redesign
I'd remove the error-codes all together, replacing them with success (TRUE) and failure (FALSE). In regards to unicode, I would not support neither wchar nor 32-bit chars. UTF-8 is enough for any practical purpose, and doesn't require special string APIs.
Re: C standard library redesign
I wouldn't. There's nothing worse than something simpy failing with no error code.rdos wrote:I'd remove the error-codes all together, replacing them with success (TRUE) and failure (FALSE).
Currently developing Lithium OS (LiOS).
Recursive paging saves lives.
"I want to change the world, but they won't give me the source code."
Recursive paging saves lives.
"I want to change the world, but they won't give me the source code."
Re: C standard library redesign
You just simply haven't got a clue. You obviously have never done this yourself.Combuster wrote:You must be terrible at math or programming. Probably programming since the needed equations are one google search away.Just rewriting the mathematics libraries with proper documentation should keep you busy for about 6 months full-time if you haven't implemented these sorts of things before. I'm not joking.
For those who are interested in these things, here's a list of references that I used to create my maths library from the bottom up: arbitrary precision integer arithmetic library, arbitrary precision floating-point arithmetic library, elementary functions library, special functions library, and statistics library --- each library builds upon the previous one:
Computer Approximations, Hart et al.
Elementary Functions, Muller.
Handbook of Mathematical Equations, Abramowitz and Stegun.
ISO/IEC 10967-1 Language Independent Arithmetic - Part 1: Integer and floating-point arithmetic, ISO.
ISO/IEC 10967-2 Language Independent Arithmetic - Part 2: Elementary numerical functions, ISO.
Modern Computer Arithmetic, Brent and Zimmerman.
Software Manual for the Elementary Functions, Cody and Waite.
The Art of Computer Programming, Vol. 2 Seminumerical Algorithms, Knuth.
TOPS-10/TOPS-20 Common Math Library Reference Manual, Digital Equipment Corporation.
Every universe of discourse has its logical structure --- S. K. Langer.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: C standard library redesign
Your argument already failed since you're suddenly requiring everything that's not in the standard library in an attempt to keep up appearances.
And -1 for the ad hominem.
And -1 for the ad hominem.
Re: C standard library redesign
I stand by my claim about effort required. I also stand by my claim about your ignorance on the matter. You're free to make me look like a fool by giving us details about how long it took you to develop your first maths library from scratch.Combuster wrote:Your argument already failed since you're suddenly requiring everything that's not in the standard library in an attempt to keep up appearances.
I claimed you "didn't have a clue" because it was obvious you were talking about something you have no practical experience of. There's 299 functions in libm on opensolaris http://www.unix.com/man-page/opensolaris/3lib/libm/. Now, 6 months full time is 26 weeks, that gives 11.5 functions designed, implemented, tested and documented a week on average - that's over 2 per day working a 5 day week. My claim was 6 months "if you haven't implemented these sorts of things before" so we have to include background reading on computer approximations and floating-point arithmetic.Combuster wrote:And -1 for the ad hominem.
With this in mind, I stand by my claim, you don't have a clue.
Every universe of discourse has its logical structure --- S. K. Langer.
Re: C standard library redesign
Hi,
Of course someone could also simply not bother supporting any or all of it (e.g. decide that the maths library is a third-party thing that isn't part of their new standard). In this case, I'm sure it won't take anyone very long to implement nothing.
Cheers,
Brendan
For 80x86; the majority of it is just inline assembly wrappers (where the CPU's FPU does it for you). The time consuming part would be making sure it complies with the relevant standard/s. If someone doesn't care about the standards (e.g. they're redesigning everything and therefore creating their own standards) then they don't have to care about whether it complies with any existing standard/s.bwat wrote:I stand by my claim about effort required. I also stand by my claim about your ignorance on the matter. You're free to make me look like a fool by giving us details about how long it took you to develop your first maths library from scratch.Combuster wrote:Your argument already failed since you're suddenly requiring everything that's not in the standard library in an attempt to keep up appearances.
Of course someone could also simply not bother supporting any or all of it (e.g. decide that the maths library is a third-party thing that isn't part of their new standard). In this case, I'm sure it won't take anyone very long to implement nothing.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: C standard library redesign
Actually, there are only some 100 distinct functions in there. Several of them are more costly to document than to write because they aren't anything more than basic identities over other functions. Basically, 65% is copy-paste-substitute-type-with-type and another 15% is so unsophisticated that I could hardly do that faster than an intern.bwat wrote:There's 299 functions in libm on opensolaris http://www.unix.com/man-page/opensolaris/3lib/libm/.With this in mind, I stand by my claim, you don't have a clue.
Re: C standard library redesign
With maths libraries you have to worry not just about standards you choose to adopt but accuracy as well. I've personally not seen a standard that enforces accuracy of elementary or special functions (easy to understand why as the accuracy can be platform specific). Even if no standard is chosen, i.e., we take or leave functions at our leisure, the issue of accuracy has to be dealt with for each function. This can be very tricky.Brendan wrote:Hi,
For 80x86; the majority of it is just inline assembly wrappers (where the CPU's FPU does it for you). The time consuming part would be making sure it complies with the relevant standard/s. If someone doesn't care about the standards (e.g. they're redesigning everything and therefore creating their own standards) then they don't have to care about whether it complies with any existing standard/s.
Regardless of quality of implementation, there's the issue of quality of documentation as well. Good documentation takes time. Here's an example of what I think is good documentation for a maths library:http://bitsavers.trailing-edge.com/pdf/ ... _Sep83.pdf
Every one of the functions has its maximum relative error and its average relative error reported. You don't see that too often unfortunately.
I wouldn't support it in a new OS. One of the reasons I wrote my own was because I found too many differences between the results of elementary functions across different platforms. Now I get the same result regardless of platform.Brendan wrote: Of course someone could also simply not bother supporting any or all of it (e.g. decide that the maths library is a third-party thing that isn't part of their new standard). In this case, I'm sure it won't take anyone very long to implement nothing.
Every universe of discourse has its logical structure --- S. K. Langer.