Page 1 of 2
[solved]newlib print issue?
Posted: Sun Feb 21, 2021 11:04 pm
by xeyes
I'm seeing some issues with prints when using newlib. One of them is related to fprintf C strings via stderr.
The test program is very simple:
Code: Select all
#include <stdio.h>
int main()
{
char string[] = "test";
fprintf(stdout, "fprintf stdout %s\n", string);
fprintf(stderr, "fprintf stderr %s\n", string);
printf("printf %s\n", string);
return 0;
}
What actually gets printed:
fprintf stdout test
fprintf stderr printf test
As you can see the 2nd print (fprintf via stderr) didn't have the "test" string or the \n.
Below is what newlib requested to write() for the 3 prints:
Code: Select all
"fprintf stdout test\n" size 0x14
"fprintf stderr %s\n" size 0xf
"printf test\nut test\n" size 0xc
It looks like that the stderr one simply called write() when it saw %s, ignoring the %s and anything else in the format string. Also looks like it used a different buffer inside, as the 3rd printf is re-using the buffer used by the 1st one.
It probably has more to do with how I'm using newlib than how newlib works, so would appreciate some pointers, esp. if you've seen similar issues with it and know what might have been missed in porting/using it.
Re: newlib print issue?
Posted: Sun Feb 21, 2021 11:13 pm
by kzinti
Having a different buffer for stdout and stderr seems expected to me.
I run into some weirdness with printf/newlib myself, but it was different than what you see. I worked around the issue by disabling buffering entirely with setbuf(stdout, 0). You might want to give that a try just in case.
Re: newlib print issue?
Posted: Sun Feb 21, 2021 11:47 pm
by xeyes
kzinti wrote:Having a different buffer for stdout and stderr seems expected to me.
I run into some weirdness with printf/newlib myself, but it was different than what you see. I worked around the issue by disabling buffering entirely with setbuf(stdout, 0). You might want to give that a try just in case.
I played with setbuf and it didn't change the behavior. If you have time, could you try sending a %s to stderr and see what it prints?
What kind of weirdness did you see?
Re: newlib print issue?
Posted: Mon Feb 22, 2021 1:43 am
by xeyes
Solved, my console print function returns 0 on success, which propagated to the write() call.
So newlib thought the write of the fixed string before %s didn't work and stopped printing the rest.
Re: newlib print issue?
Posted: Mon Feb 22, 2021 2:35 am
by kzinti
The weirdness I saw had to do with multiple processes/threads printing to the same console. It looked as if things were being printed out of order... but of course it was the buffering getting in the way and delaying some output.
Glad you found the problem... These things can get tedious to track down.
Re: newlib print issue?
Posted: Mon Feb 22, 2021 1:17 pm
by nullplan
kzinti wrote:Having a different buffer for stdout and stderr seems expected to me.
In fact, stderr is defined to be unbuffered. Also, no connection between the standard streams is defined, so in the given program, there is no guarantee the outputs to stdout are going to show up before the ones to stderr. stdout certainly
could be line-buffered, but it doesn't have to be.
Re: newlib print issue?
Posted: Mon Feb 22, 2021 8:17 pm
by xeyes
kzinti wrote: multiple processes/threads printing to the same console.
This sounds challenging. I'm not serializing console writes for user threads yet to not slow them down. But that also means they print a random mix onto the console if they are lucky (unlucky?) enough to print around the same time.
kzinti wrote:The weirdness I saw had to do with multiple processes/threads printing to the same console. It looked as if things were being printed out of order... but of course it was the buffering getting in the way and delaying some output.
Glad you found the problem... These things can get tedious to track down.
True, not a big fan of stepping through library code. OTOH I'm grateful that these projects are available so I only need to deal with "a few" issues like this rather than having to write them myself.
After fixing this and a few other "posix compliance" issues the system was able to produce its first program using as and ld
Still a long way from getting gcc to even build but without newlib and binutils it would not have been possible to reach this point until much later, they add up to be more than 30x bigger than the kernel binary which I take as a sign that at least 30x of the time is needed to implement them.
Re: [solved]newlib print issue?
Posted: Mon Feb 22, 2021 9:37 pm
by kzinti
I am with you... I'm still using newlib for my user space but slowly starting to think about ditching it... It is a nice piece of code to have around so that one can work on more interesting stuff
.
Right now I basically have 2 libc (newlib and my own). This is because newlib doesn't have all the pthread functionality I needed to properly support C++ (exceptions and stdlibc++). One step at a time.
Re: [solved]newlib print issue?
Posted: Thu Feb 25, 2021 1:50 am
by xeyes
kzinti wrote:I am with you... I'm still using newlib for my user space but slowly starting to think about ditching it... It is a nice piece of code to have around so that one can work on more interesting stuff
.
Right now I basically have 2 libc (newlib and my own). This is because newlib doesn't have all the pthread functionality I needed to properly support C++ (exceptions and stdlibc++). One step at a time.
One step at a time for sure. Too many interesting stuff, too little time.
As you can tell I'm not familiar with C libraries at all, but I do have a basic question here: reasons for not porting a more complete libc, like glibc?
Re: [solved]newlib print issue?
Posted: Thu Feb 25, 2021 11:31 am
by kzinti
glibc is really tired to GNU/Linux. I want to stay as far away from both as possible.
I was considering MUSL seriously at some point. This is also for Linux, but it's smaller and cleaner.
In the end, a big chunk of your C library is tied to your OS. So it's not clear that using an existing one is the right thing.
Newlib is nice because there aren't many system calls to support... But it is missing a lot (crt0, pthread, TLS, ...), which is what I am focusing on for now. Eventually I suspect it will make less and less sense to keep newlib around.
Re: [solved]newlib print issue?
Posted: Thu Feb 25, 2021 12:28 pm
by nullplan
xeyes wrote:reasons for not porting a more complete libc, like glibc?
glibc is tied to POSIX. If your OS is not POSIX, glibc is probably not right for you. It also has a lot of features, some of which you might call unnecessary, and the complexity that goes into implementing these is eye-watering.
kzinti wrote:glibc is really tired to GNU/Linux
It's really not, it has tons of ports, for weird CPUs and weird OSes. It does depend on your OS being POSIX, though.
kzinti wrote:In the end, a big chunk of your C library is tied to your OS. So it's not clear that using an existing one is the right thing.
I'm a regular on the musl mailing list, and being there has opened my eyes to the sheer difficulty of writing a libc yourself. I don't think writing a libc is a simple thing, even if you are a kernel developer. Kernels are easy, in a way: Once the interfaces are taken care of, whatever you do in kernel space is up to you. libc is hard.[1] I was also on the dietlibc mailing list for a while, and that project certainly showed its weaknesses with the lackadaisical approach taken by its lead dev. dietlibc is likely what you'd get if you gave a seasoned C programmer with no prior experience in standards implementations the task of building a libc. musl is designed with far more care.
[1]That is to say, libc is actually easy, but doing libc right is hard.
Re: [solved]newlib print issue?
Posted: Thu Feb 25, 2021 1:12 pm
by kzinti
I must say I am not looking forward to implementing my own libc if I can get away with it. I might take another closer look at musl eventually.
Do you know of any other libc library that is mature and well built? What is/was your own approach for libc? I am under the impression that you might be using musl but I am not sure.
Re: [solved]newlib print issue?
Posted: Thu Feb 25, 2021 1:56 pm
by vvaltchev
I used dietlibc at the beginning, but then I switched to libmusl because dietlibc is not maintained. Overall, I like libmusl because it's compact, actively maintained, well-supported (e.g. there are pre-built gcc-musl toolchains) and it's not bloated: statically linked binaries with it are usually small (~20 KB) compared to uclibc-ng and glibc, of course.
BUT, from the kernel point of view, it required quite an effort to support: it's not configurable and require plenty of kernel features. For example, TLS. Despite I have no multi-thread support, I had to implement full support for set_thread_area() just to make programs to reach main(), no way to fake it.
And that was not even the biggest problem with libmusl: at some point, I built micropython for Tilck, but I realized that while the REPL works perfectly, it cannot run scripts because libmusl's realpath() implementation requires /proc to exist, and it has no fall-back. So, at the end, I forked and patched micropython with a realpath() impl. not requiring /proc, but I didn't like doing that, at all. That's a limitation even on Linux, because in some cases (e.g. containers) it's preferable to avoid mounting procfs.
In conclusion, while I like libmusl overall, I'd say it's quite demanding from the kernel point of view and I'm not sure if it's the best choice for small and simple (in theory) kernel projects. Maybe uclibc-ng was a slightly better choice? I don't know.
Anyway, I never considered newlib, simply because I didn't knew it. How it is compared to libmusl? (if anyone has experience with both of them)
Re: [solved]newlib print issue?
Posted: Thu Feb 25, 2021 2:30 pm
by nullplan
vvaltchev wrote:I used dietlibc at the beginning, but then I switched to libmusl because dietlibc is not maintained.
Aw, poor Fefe. There is some work still being done on it, the most recent commits are from a week ago. And I thank you for that invitation to a refresher on just how bad cvs is. Even finding out that much took a Google search. I still don't know what was changed, and I no longer care to know.
vvaltchev wrote:Despite I have no multi-thread support, I had to implement full support for set_thread_area() just to make programs to reach main(), no way to fake it.
Rich Felker was of the opinion that a single-threaded program is merely a multi-threaded program in waiting. Anyway, set_thread_area() on AMD64 should boil down to setting FS.Base. Admittedly, on x86, it is more involved.
vvaltchev wrote:it cannot run scripts because libmusl's realpath() implementation requires /proc to exist, and it has no fall-back.
That was changed recently (committed Nov 30), the current implementation only requires getcwd() and readlink().
vvaltchev wrote:Anyway, I never considered newlib, simply because I didn't knew it. How it is compared to libmusl? (if anyone has experience with both of them)
I have only read the source code. Based on that alone, musl wins by a landslide. Newlib has so many #ifdefs all over the place, and finding what you are looking for takes ages. Whereas musl has straightforward code, is not configurable as you said, and that means it has no bloody #ifdefs if it can possibly help it, which aids readability.
Also the choice of algorithms is better. newlib's sorting algorithm is a standard quicksort, whereas musl is implementing smoothsort. Newlib's malloc() is a bog-standard dlmalloc (requiring sbrk()) whereas musl has its own version. The old one I could read and understand on its own (the new one requires a bit more brain power than I have yet given it). Unfortunately, the old malloc has a tiny little race condition that can cause unconstrained heap growth in multi-threaded applications. But it could be fixed with a big malloc lock.
Re: [solved]newlib print issue?
Posted: Fri Feb 26, 2021 5:13 am
by vvaltchev
nullplan wrote:Aw, poor Fefe. There is some work still being done on it, the most recent commits are from a week ago. And I thank you for that invitation to a refresher on just how bad cvs is. Even finding out that much took a Google search. I still don't know what was changed, and I no longer care to know.
Ouch. By just looking at:
https://www.fefe.de/dietlibc/, I had no idea he was still working on it: I switched to libmusl in 2018, before the release of dietlibc 0.34. The last release in 2018 was from 2013 (5 years old) and yeah, I didn't even consider to checkout the CVS repo. I searched a little online and people complained about that library (bugs etc.) so I thought that it was the right choice to look for something else. Thanks for pointing out that he is still working on it. Libmusl is much better, but at least we know that dietlibc is not a completely abandoned project. Maybe we should suggest the guy to move dietlibc to Github?
nullplan wrote:Rich Felker was of the opinion that a single-threaded program is merely a multi-threaded program in waiting. Anyway, set_thread_area() on AMD64 should boil down to setting FS.Base. Admittedly, on x86, it is more involved.
I can't talk about AMD64, but on x86 it literally means allowing each process to own a limited set of GDT entries. The tricky part is it has to work also in fork-ed children, because some libc functions use TLS variables after _start, obviously. That meant also implementing a ref-count mechanism for GDT entries, dynamic expansion of the GDT etc. A hell lot of work. I just still wonder: why don't set_thread_area() set LDT instead of GDT? Actually, I wanted to implement this way initially, but I quickly realized that's impossible because the LDT/GDT bit is in the selector itself, there's nothing I can do. But why? It's an architecture specific feature anyway, why don't just having a small LDT per process? Maybe there's some overhead is setting the LDT on every task switch?
nullplan wrote:That was changed recently (committed Nov 30), the current implementation only requires getcwd() and readlink().
That's great news! Thanks for sharing. That means that the super-stable pre-built toolchains I use will have musl 1.2.2, at some point. But, I'll have to wait since version 1.2.2 has been released on Jan 15, 2021, just a month ago.
nullplan wrote:I have only read the source code. Based on that alone, musl wins by a landslide. Newlib has so many #ifdefs all over the place, and finding what you are looking for takes ages. Whereas musl has straightforward code, is not configurable as you said, and that means it has no bloody #ifdefs if it can possibly help it, which aids readability.
Also the choice of algorithms is better. newlib's sorting algorithm is a standard quicksort, whereas musl is implementing smoothsort. Newlib's malloc() is a bog-standard dlmalloc (requiring sbrk()) whereas musl has its own version. The old one I could read and understand on its own (the new one requires a bit more brain power than I have yet given it). Unfortunately, the old malloc has a tiny little race condition that can cause unconstrained heap growth in multi-threaded applications. But it could be fixed with a big malloc lock.
I'm happy to hear that you believe libmusl is better. I don't regret my choice, in particular now that I know it has also a better realpath() implementation. And yeah, I totally understand the advantages of being non-configurable: it's easier to maintain and test. In one of their FAQs I read at the time that the decision to not being configurable is based on the fact that even simple binary options cause an exponential (2^N) growth in the number of possible configurations. There's no way to test all the 2^N configurations and testing the N configurations independently is not the same thing.