[solved]newlib print issue?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
nullplan
Member
Member
Posts: 1790
Joined: Wed Aug 30, 2017 8:24 am

Re: [solved]newlib print issue?

Post by nullplan »

vvaltchev wrote:Maybe we should suggest the guy to move dietlibc to Github?
Not even that. He already has a web server (which is powerful enough to host one of Germany's most popular blogs; when he posts links to smaller sites, he regularly takes down those sites just by his readers clicking those links), so he could just install cgit and be done with it. Hell, even SVN would be an improvement over CVS.
vvaltchev wrote:I can't talk about AMD64, but on x86 it literally means allowing each process to own a limited set of GDT entries.
Yeah, I looked it up again. The kernel interface there is horrible. I suppose the Linux guys really dropped the ball on this, and Rich was only playing the hand he was dealt. set_thread_area() is defined to modify the GDT. It gets an augmented GDT entry as argument and returns a GDT index back, that then still has to be converted into a selector. This is such a leaky abstraction it isn't even funny.

By the way, on AMD64, the __set_thread_area() function doesn't even call the set_thread_area() syscall. It just calls arch_prctl(SET_FS, arg). Way more sane.
vvaltchev wrote: The tricky part is it has to work also in fork-ed children,
Wouldn't you just copy the GDT on fork()? No ref-counting needed when each process has its own copy from the start right? And I can't imagine a GDT being big enough to worry about wasting space.
vvaltchev wrote:But why? It's an architecture specific feature anyway, why don't just having a small LDT per process? Maybe there's some overhead is setting the LDT on every task switch?
Possibly, but reloading segment descriptor caches on every task switch can't be cheap, either. Anyway, the interface is designed to modify the GDT now, so now it is no longer possible to change. Binaries exist now that depend on set_thread_area() modifying the GDT instead of the LDT. So changing it would break ABI, and breaking ABI with userspace is the one thing Linus will not do.

Incidentally, there is a modify_ldt() that musl will fall back on if set_thread_area() fails, but then it is no longer capable of multi-threading. It assumes to always get the first entry in the LDT, because that code is supposed to only run on very old kernels that lack set_thread_area() and therefore also lack clone().
Carpe diem!
vvaltchev
Member
Member
Posts: 274
Joined: Fri May 11, 2018 6:51 am

Re: [solved]newlib print issue?

Post by vvaltchev »

nullplan wrote:By the way, on AMD64, the __set_thread_area() function doesn't even call the set_thread_area() syscall. It just calls arch_prctl(SET_FS, arg). Way more sane.
Yep, there are plenty of advantages to use AMD64.
nullplan wrote:Wouldn't you just copy the GDT on fork()? No ref-counting needed when each process has its own copy from the start right? And I can't imagine a GDT being big enough to worry about wasting space.
Yes, it's possible, but it would mean allocating 3 GDT entries each time a process is created. Why 3? That's hard-coded in Linux, not mentioned anywhere in the man pages. And.. yeah, I don't like the idea of pre-allocating GDT entries that likely won't be used. Ideally, I'd like to have a GDT as small as possible. So, that's why I implemented a ref-counting mechanism.
nullplan wrote:Possibly, but reloading segment descriptor caches on every task switch can't be cheap, either. Anyway, the interface is designed to modify the GDT now, so now it is no longer possible to change. Binaries exist now that depend on set_thread_area() modifying the GDT instead of the LDT. So changing it would break ABI, and breaking ABI with userspace is the one thing Linus will not do.
I totally agree, there's nothing we can do. I just wondered the reasons for just a design.
nullplan wrote:Incidentally, there is a modify_ldt() that musl will fall back on if set_thread_area() fails, but then it is no longer capable of multi-threading. It assumes to always get the first entry in the LDT, because that code is supposed to only run on very old kernels that lack set_thread_area() and therefore also lack clone().
Yeah, I remember something about modify_ldt(), but at the end I decided that supporting set_thread_area() was the right thing. Now that you mentioned it, and I don't remember a lot about that decision, I opened modify_ldt()'s man page and look what I've found:
modify_ldt() should not be used for thread-local storage, as it slows down context switches and only supports a limited number of
threads. Threading libraries should use set_thread_area(2) or arch_prctl(2) instead, except on extremely old kernels that do not
support those system calls.

The normal use for modify_ldt() is to run legacy 16-bit or segmented 32-bit code. Not all kernels allow 16-bit segments to be in‐
stalled, however.

Even on 64-bit kernels, modify_ldt() cannot be used to create a long mode (i.e., 64-bit) code segment. The undocumented field
"lm" in user_desc is not useful, and, despite its name, does not result in a long mode segment.
Probably, I read the same thing three years ago when I implemented set_thread_area().
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: [solved]newlib print issue?

Post by kzinti »

This is what i meant by musl being tied to Linux. I guess it is not strictly true as it supports other systems... But if you want to add your own, you have to patch the source code and/or provide fake/stub function calls to emulate the Linux functionality. It is unreasonable to expect a libc implementation to be a simple drop-in thing, but the lack of configuration makes it somewhat more difficult.

When I implemented pthread and crt0, I got inspired by musl and basically use the same idea to set fs (or gs on ia32) with a system call that looks like this:

Code: Select all

__syscall1(SYSCALL_INIT_USER_TCB, (long)thread) 
This is basically the equivalent of arch_prctl(SET_FS, arg) for set_thread_area().
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: [solved]newlib print issue?

Post by xeyes »

vvaltchev wrote:I used dietlibc at the beginning, but then I switched to libmusl because dietlibc is not maintained. Overall, I like libmusl because it's compact, actively maintained, well-supported (e.g. there are pre-built gcc-musl toolchains) and it's not bloated: statically linked binaries with it are usually small (~20 KB) compared to uclibc-ng and glibc, of course.

BUT, from the kernel point of view, it required quite an effort to support: it's not configurable and require plenty of kernel features. For example, TLS. Despite I have no multi-thread support, I had to implement full support for set_thread_area() just to make programs to reach main(), no way to fake it.

And that was not even the biggest problem with libmusl: at some point, I built micropython for Tilck, but I realized that while the REPL works perfectly, it cannot run scripts because libmusl's realpath() implementation requires /proc to exist, and it has no fall-back. So, at the end, I forked and patched micropython with a realpath() impl. not requiring /proc, but I didn't like doing that, at all. That's a limitation even on Linux, because in some cases (e.g. containers) it's preferable to avoid mounting procfs.

In conclusion, while I like libmusl overall, I'd say it's quite demanding from the kernel point of view and I'm not sure if it's the best choice for small and simple (in theory) kernel projects. Maybe uclibc-ng was a slightly better choice? I don't know.

Anyway, I never considered newlib, simply because I didn't knew it. How it is compared to libmusl? (if anyone has experience with both of them)
Just curious how many different (in the sense that fstat and stat can share a lot of implementation so they aren't different) syscalls did you implement to get musl to be usable (not necessarily 100% functional but not throwing errors and warnings left and right either)?

I saw newlib on the wiki page (https://wiki.osdev.org/C_Library) and were like "implementing less than 20 syscalls gets me 400 library functions? Sounds like a good deal!" :lol:
xeyes
Member
Member
Posts: 212
Joined: Mon Dec 07, 2020 8:09 am

Re: [solved]newlib print issue?

Post by xeyes »

kzinti wrote:glibc is really tired to GNU/Linux. I want to stay as far away from both as possible.

I was considering MUSL seriously at some point. This is also for Linux, but it's smaller and cleaner.

In the end, a big chunk of your C library is tied to your OS. So it's not clear that using an existing one is the right thing.

Newlib is nice because there aren't many system calls to support... But it is missing a lot (crt0, pthread, TLS, ...), which is what I am focusing on for now. Eventually I suspect it will make less and less sense to keep newlib around.
Yes that's a good point that libc is closely tied to the kernel.

Not a huge fan of GNU but I'm still trying to be POSIX-like, or "single unix like"?

As one thing I'm sure I won't be able to do by myself, is the user space apps.

The toolchain combo I built is over 1GB, running on top of the kernel which is just over 1MB.

What's more, this is just the first "real" user space app. There probably need to be 1000 more to make the OS usable in the typical sense.

Spend 1 million times the time spent on the kernel to implement the user space? A thought that I cringe at for sure :(

Reasoning down this path, I should probably bite the bullet someday and port glibc rather try to extend newlib or write one myself and in the process implement more "POSIX compliance bugs".

But if you aren't aiming for POSIX I guess it makes a lot more sense to build a specific libc.
vvaltchev
Member
Member
Posts: 274
Joined: Fri May 11, 2018 6:51 am

Re: [solved]newlib print issue?

Post by vvaltchev »

xeyes wrote:Just curious how many different (in the sense that fstat and stat can share a lot of implementation so they aren't different) syscalls did you implement to get musl to be usable (not necessarily 100% functional but not throwing errors and warnings left and right either)?
Good question! Well, it really depends on the complexity of the program that I had to run. Let's agree that being able to run without any issues the "hello world" program is the bare minimum. To run something more complex like the ASH shell, I had to implement many more syscalls, obviously.
So, back to the "hello world" program, I just traced it with my syscall tracer, and I'm copy-pasting what happens after the fork(), in the child process:

Code: Select all

00037.162 [0043] CALL gettid() -> 43
00037.162 [0043] CALL rt_sigprocmask(how: 2, set: 0xbffffbe0, oldset: NULL, sigsetsize: 8) -> 0
00037.162 [0043] CALL rt_sigaction(signum: 2, act: 0xbffffb18, oldact: NULL, sigsetsize: 8) -> 0
00037.162 [0043] CALL rt_sigaction(signum: 15, act: 0xbffffb18, oldact: NULL, sigsetsize: 8) -> 0
00037.162 [0043] CALL rt_sigaction(signum: 3, act: 0xbffffb18, oldact: NULL, sigsetsize: 8) -> 0
00037.162 [0043] ENTER execve(filename: "/initrd/hello", argv: 0x080a9268, envp: 0x080a9274)
00037.162 [0043] CALL set_thread_area(u_info: 0xbffffd58) -> 0
00037.162 [0043] CALL set_tid_address(tidptr: 0x0804a588) -> 0x0000002b
00037.162 [0043] CALL ioctl(fd: 1, request: 0x00005413, argp: 0xbffffe18) -> 0
00037.162 [0043] ENTER writev(fd: 1, iov: (struct iovec[2]) {
   {base: "hello world", len: 11}, 
   {base: "\n", len: 1}
}, iovcnt: 2)
00037.162 [0043] EXIT writev(fd: 1) -> 12
00037.162 [0043] ENTER exit_group(status: 0)
The bare-minimum is:
- fake support for signals with the latest rt_* interface
- gettid()
- execve(), of couse
- fork() and wait4(), for the parent process
- set_thread_area() [full support]
- set_tid_address() [a fake impl. is enough]
- full support for vectored I/O (readv, writev)

Of course, with that you cannot do much more than printing hello world. Cannot use malloc().
For it, you need to implement not only sys_brk() but also sys_mmap_pgoff() and all the problems that come with it, like support for un-mapping pages in the middle of a mapped range, finding an available `vaddr` for the mapping in user space etc.

Also, while with older versions of libmusl implementing sys_clock_gettime() was enough, at some point in late 2018 or 2019 (don't remember) the Linux kernel guys start working on new interfaces for i686 to avoid the Y2038 problem. Therefore, libmusl was quickly updated to always use the new interfaces which use 64-bit integers for time_t on i686 systems, renaming the old ones to sys_clock_gettime32 et similar. That's not particularly complex to support, but still requires some extra effort if you want to support the old and the new interface.

In conclusion, libmusl almost always uses the latest (better) syscalls that the Linux kernel has to offer. That's GREAT for serious software and that's why I stuck with libmusl, paying that price. BUT, it's not "cheap" to support. Supporting just read() and write() instead of the scatter/gather I/O was just simpler, but I couldn't do that because libmusl uses the most efficient and modern interface. It's a trade-off.

If I had to compare supporting libmusl vs. dietlibc, I'd say that while Tilck was already able to run several simple programs with dietlibc, the switch to libmusl required 1-2 weeks of work to catch up, without considering the corner cases I missed which I had to fix later.
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck
Post Reply