implementing a linux-based os from scratch, how to start?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

mmdmine wrote:This was what I want to hear. I dont need to have a library same as glibc and others e.g. I can have memory_allocate() instead of malloc() or have a different string library. only problem is that it's not compatible with existing softwares that's not my goal,
Not that easy. That's why I recommended to fork musl and tailor it's interface to your needs. There are functions which are required by the C standard. Some compilers, like gcc can provide built-ins for those, others, like Clang demand a user-space implementation.
mmdmine wrote:instead I can allow user to have a 3rd party C Library to install these softwarrs.
Not really. With your final kernel, you'll have to provide your own C Library which looks like any other C Library from above, but uses your kernel interface under the hood (or alternatively you could use the same kernel API as Linux does, and then you could allow the user to freely choose musl / glibc etc.) Only libcs with exactly the same kernel interface are interchangable.

The idea behind libc is to provide a standardized interface to user-space programs so that they don't have to worry about the actual kernel interface. Printf() is printf() for applications, regardless if the underlying kernel interface uses int 0x80 or syscall, or if console is implemented as a pipe write or something else. For example, you can't use musl with a vanilla BSD kernel either, because it's kernel interface differs to Linux's (however an installable compatibility layer exists). Of course, libc also has many low-level routines that doesn't use the kernel, but are extremely useful, like qsort or strlen for example (those are the easiest to port). You could choose not to implement those in your libc, but then writing applications for your user-space would be problematic.

So to summarize, there are three kinds of functions in a libc:
1. required by the C standard (like memset or the C runtime _start)
2. ones that hide the kernel interface behind a standardized API (like printf, malloc)
3. convenient and useful functions (like bsearch or atoi)

You can't avoid 1., you must implement those without modifying their API (except if you can add a new ABI to the compilers or if you compile ALL user-space applications in freestanding mode, not a good idea).
You might redefine 2., but you'll be still tied to a specific kernel interface implementation with those. But certainly doable, Windows for example does not have opendir() but has FindFirstFileA() / FindNextFileA().
You can omit 3., but life is going to be miserable without those (but feel free to avoid or redefine those).

NOTE: for the functions in 2. I deliberately used the phrase "standardized API" and not "POSIX", because it could be another standard as well. Windows has it's own standard for example, calling malloc MemAlloc. EFI for example has another, where memcmp is called CompareMem. Except for the functions defined by the C standard, your OS (or Linux user space) may define it's own standard.

Cheers,
bzt
mmdmine
Member
Member
Posts: 47
Joined: Sat Dec 28, 2019 5:19 am
Location: Iran
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by mmdmine »

@bzt thank you. looks right. surely I should add standards to have a working C Runtime.
User avatar
iocoder
Member
Member
Posts: 208
Joined: Sun Oct 18, 2009 5:47 pm
Libera.chat IRC: iocoder
Location: Alexandria, Egypt | Ottawa, Canada
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by iocoder »

bzt wrote: You can't avoid 1., you must implement those without modifying their API (except if you can add a new ABI to the compilers or if you compile ALL user-space applications in freestanding mode, not a good idea).
You might redefine 2., but you'll be still tied to a specific kernel interface implementation with those. But certainly doable, Windows for example does not have opendir() but has FindFirstFileA() / FindNextFileA().
You can omit 3., but life is going to be miserable without those (but feel free to avoid or redefine those).
Not at all. It all depends on how you want the end users to develop software for your operating system. You can have a development environment that is based on BASIC... You can have another one that is a derivative of the C language (same syntax, different APIs), or you can choose to support ANSI C, for instance.

Not every operating system has to ship with C support.

Concerning the "miserable life" part you've mentioned, the quality of my life (on physiological and psychological levels) has become much better since I tried to get out of the C prison and UNIX prison. The world has a lot of fancy stuff to explore. I am not an opponent of C, but I support FREE MIND.
bzt wrote: NOTE: for the functions in 2. I deliberately used the phrase "standardized API" and not "POSIX", because it could be another standard as well. Windows has it's own standard for example, calling malloc MemAlloc. EFI for example has another, where memcmp is called CompareMem. Except for the functions defined by the C standard, your OS (or Linux user space) may define it's own standard.
This will not hide the fact that C is closely tied to UNIX' philosophy :twisted:
mmdmine wrote: surely I should add standards to have a working C Runtime.
No. Neither is it a "should" rule nor is it a "must". It depends on whether you want to support the "standard" C language or NOT.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

iocoder wrote:You can have a development environment that is based on BASIC...
...then you don't want to use libc. Simple as that ;-)
iocoder wrote:Concerning the "miserable life" part you've mentioned, the quality of my life (on physiological and psychological levels) has become much better since I tried to get out of the C prison and UNIX prison. The world has a lot of fancy stuff to explore. I am not an opponent of C, but I support FREE MIND.
You misunderstood. Not having a function that converts strings into integers or for example returns the size of a string will make the life of the developmers more difficult without a doubt.
iocoder wrote:This will not hide the fact that C is closely tied to UNIX' philosophy :twisted:
Never said it isn't. What I said was you can compile native applications with C under Windows which is not UNIX, so it's doable.
iocoder wrote:
mmdmine wrote: surely I should add standards to have a working C Runtime.
No. Neither is it a "should" rule nor is it a "must". It depends on whether you want to support the "standard" C language or NOT.
Exactly. The OP was talking about libc, which implies C.

Cheers,
bzt
mmdmine
Member
Member
Posts: 47
Joined: Sat Dec 28, 2019 5:19 am
Location: Iran
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by mmdmine »

I called it C Library by mistake. As I mentioned before my design has three layers, first a library that calls low-level kernel functions (say Kernel Library), an API on top of it to make life easier (I would name it Pacify or something else, still thinking about it) and on top of this API, my applications will run. And also I said I can allow user to install a 3rd party C Libraries to port existing C Applications to my OS.
By this, still I need to define a new ABI or write user space in freestanding? I found some linker scripts in my /usr/lib/ directory (looks like they got installed with llvm), adding new ABI is just writing a new linker script like those?
Writing a new C Library is a stupid work because already they are written. Even we have multiple implementations.
Octocontrabass
Member
Member
Posts: 5578
Joined: Mon Mar 25, 2013 7:01 pm

Re: implementing a linux-based os from scratch, how to start

Post by Octocontrabass »

bzt wrote:There are functions which are required by the C standard. Some compilers, like gcc can provide built-ins for those, others, like Clang demand a user-space implementation.
Both compilers demand userspace implementations of the standard C library when using builtin functions. The builtins are optimized versions that only work in specific circumstances, and both compilers will generate ordinary library calls when those circumstances do not apply. (There is also no guarantee that the builtin version will generate a call to the same library function. For example, __builtin_printf() may emit a call to puts() if the arguments allow it.)
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

Octocontrabass wrote:The builtins are optimized versions that only work in specific circumstances, and both compilers will generate ordinary library calls when those circumstances do not apply.
Not for GCC, read this
With the exception of built-ins that have library equivalents such as the standard C library functions discussed below, or that expand to library calls, GCC built-in functions are always expanded inline and thus do not have corresponding entry points and their address cannot be obtained.

GCC includes built-in versions of many of the functions in the standard C library.
So it is not only the calls that are generated, but the function implementations too if there's no corresponding library (unless you also pass the "-fno-builtin" command line flag). Gave it a try! Call strcmp in a minimal C code, then compile it with Clang as well as with GCC (without libc). Clang will complain about referenced but unimplemented function, while GCC will inline the implementation into the final executable automatically.
mmdmine wrote:And also I said I can allow user to install a 3rd party C Libraries to port existing C Applications to my OS.
Yes, but there's no known C library that depends on any kernel API library. They are usually stand alone libraries, with "hardcoded" kernel calls (simply put). If you separate your kernel API into a separate library, then you'll need a C library that relies on that. Your users won't be able to install any 3rd party C libraries just as-is (except when you use exactly the same kernel API interface those libraries expect in the "hardcoded" calls, but in that case there's no point in a separate kernel API library).
mmdmine wrote:By this, still I need to define a new ABI or write user space in freestanding?
Ok, a little explanation in order here:

ABI: is the way how a function is called. Are the arguments passed in registers? And if so, which ones? If it is a kernel call, is it via int XX or via syscall? In both cases, how do you pass the function number to it, in which register? And things like that. Because it's a low-level spec, ABI is always architecture dependent. In order to create a new ABI, you'll have to change the compiler (there's simply no other way). To support multiple ABIs, compilers tend to use an intermediate representation (or IR in short). They compile the source to that, and then they convert that IR into a specific architecture's ABI in the final step. GCC for example supports several IRs, like RTL, GIMPLE, SPIR-V etc.

API: is the way how your functions are named and which arguments they require. This is an abstract definition independent to the architecture. For example strlen() will expect a string as argument, that's the API, regardless if that argument is passed in rdx, rcx or rsi registers, or in stack (that's the ABI part). To change the API, you just need to write a new library. Also note that the API is language independent: let's assume you have a library for strlen(), that will expect a string address regardless if you call it from C or Pascal (however Pascal does not use the zero-terminated string representation, so you might get into trouble calling this directly from Pascal).

Freestanding: in theory your code will not generate any calls to the libc, unless you also implement those functions. I said in theory, because there are a few exceptions, like the aformentioned memset (which you must provide, or with GCC, it can provide that for you), but otherwise you are limited to only the functions what you implement. In contrast, in hosted mode you can use functions which will be linked (either statically or dynamically) later, and also some defaults on linking is assumed (that's why you don't need a linker script in hosted mode, and why you'll sooner or later need one in freestanding mode). Now for the functions that the compiler generates calls (like memset), you can't change the API unless you also change the ABI. For all the other functions, the API is up to you in freestanding mode. For hosted mode, you'll have to create your own "target" (which includes your own linker script for user-space programs among other things).

Cheers,
bzt
mmdmine
Member
Member
Posts: 47
Joined: Sat Dec 28, 2019 5:19 am
Location: Iran
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by mmdmine »

Already I know what are they.
You said "you must
implement those without modifying
their API (except if you can add a
new ABI to the compilers or if you
compile ALL user-space
applications in freestanding mode,
not a good idea)."
Those linker script was named in "arch-os-format" style, like "aarch64-gnu-elf64" so I though it's an ABI defination.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

mmdmine wrote:Those linker script was named in "arch-os-format" style, like "aarch64-gnu-elf64" so I though it's an ABI defination.
It is a bit more than that, but defining the ABI is definitely part of it. Some formats (the last part of the triplet) specify which ABIs you can choose from, and some formats (like the elf) allow multiple ABIs (see elf header byte 7, e_ident[EI_OSABI]). But this does not mean a new ABI, just selecting one from the list of available ABIs in the compiler. (Which is okay, I wouldn't recommend creating a new ABI in your first iteration, use the well-known SysV ABI). Selecting the ABI has nothing to do with the default set of functions (as that's the API part), so you'll be able to create your own library with your own functions no matter which one you choose. The second part of the triplet supposed to refer to the OS, and with that, which API to use; however this is just informal (specially for the POSIX compatible systems which share the same API anyway) and not as strict as the arch and format parts.
(Note that the triplet is just a name, it is the linker script by the same name that actually defines anything. Theoretically you could name a triplet -elf64, and define PE output format in the script, that will work, but never do that. You can also override the default libraries from the linker script as well as the statically linked run-time part that provides _start, which is often called crt0. See INPUT, OUTPUT_ARCH, OUTPUT_FORMAT, STARTUP, ENTRY etc.. Normally gcc generates that script for you using the shell scripts and other configuration files (read more), but you can write that script by hand if you want to.)

Cheers,
bzt
Octocontrabass
Member
Member
Posts: 5578
Joined: Mon Mar 25, 2013 7:01 pm

Re: implementing a linux-based os from scratch, how to start

Post by Octocontrabass »

bzt wrote:Not for GCC, read this
With the exception of built-ins that have library equivalents such as the standard C library functions discussed below, or that expand to library calls, GCC built-in functions are always expanded inline and thus do not have corresponding entry points and their address cannot be obtained.

GCC includes built-in versions of many of the functions in the standard C library.
Hold on a minute. You stopped before the important part.
Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
In other words, you must still provide a standard C library even with GCC's built-in functions.

You suggested I try it with strcmp, so here's my results.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

Octocontrabass wrote:
Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
In other words, you must still provide a standard C library even with GCC's built-in functions.
Most definitely not. It just says it won't inline the function; however that quote tells absolutely nothing about where the function implementation is located at all. It could add exactly the same code that would be inlined as a separated function and emit a call to that. The compiler knows everything in order to do that.

I've have remembered strcmp, but I've actually used memcmp in my actual code. Doesn't really matter, as both are libc functions and both are defined in string.h. So this code uses simply

Code: Select all

#define memcmp __builtin_memcmp
And using "-ffreestanding -nostdlib" and without any library specified (so no C library provided), yet gcc compiles this without errors.

I've recompiled it with the latest gcc too just to be sure, and injecting built-in libc functions still works in gcc, see for yourself. On the other hand, Clang requires a function implementation, it won't compile the source without that hence the ifdef guard.

I'd like to point out this is not the general case, only a few libc functions are treated this way; that's why I wrote you can't change the prototypes of those functions. But using "-fno-builtin" there'll be no inline nor calls to those, so you can redefine everything (which is the default for Clang).

Cheers,
bzt
Octocontrabass
Member
Member
Posts: 5578
Joined: Mon Mar 25, 2013 7:01 pm

Re: implementing a linux-based os from scratch, how to start

Post by Octocontrabass »

bzt wrote:I've have remembered strcmp, but I've actually used memcmp in my actual code. Doesn't really matter, as both are libc functions and both are defined in string.h. So this code uses simply

Code: Select all

#define memcmp __builtin_memcmp
And using "-ffreestanding -nostdlib" and without any library specified (so no C library provided), yet gcc compiles this without errors.
Okay, here's an example using __builtin_memcmp() instead of strcmp.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

But if you have an example which does not work, how does that prove anything? If there's at least one example that does not need libc and works, then we have a PoC and we can conclude this feature exists.

Think about this: let's assume there's a supposed security hole. There are 100 exploits that don't work, and there's one which does. Would that 100 non-working examples prove that the security hole is not exploitable, when there's at least one PoC that can exploit the sechole? Does it makes sense to you now?

Cheers,
bzt
Octocontrabass
Member
Member
Posts: 5578
Joined: Mon Mar 25, 2013 7:01 pm

Re: implementing a linux-based os from scratch, how to start

Post by Octocontrabass »

bzt wrote:But if you have an example which does not work, how does that prove anything?
Here are some things you've claimed:
bzt wrote:There are functions which are required by the C standard. Some compilers, like gcc can provide built-ins for those, others, like Clang demand a user-space implementation.
bzt wrote:Clang will complain about referenced but unimplemented function, while GCC will inline the implementation into the final executable automatically.
bzt wrote:It just says it won't inline the function; however that quote tells absolutely nothing about where the function implementation is located at all. It could add exactly the same code that would be inlined as a separated function and emit a call to that.
Now, perhaps I'm just misreading things, but it sounds like you're claiming that GCC will always provide a built-in implementation for whichever function. My example shows that this is not true: the built-in version is only an optimization, and you must provide a proper implementation for the situations where the optimization does not apply.

Speaking of optimizations, try changing -O2 to -Os and see what happens.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: implementing a linux-based os from scratch, how to start

Post by bzt »

Octocontrabass wrote:Now, perhaps I'm just misreading things
Probably.
Octocontrabass wrote:but it sounds like you're claiming that GCC will always provide a built-in implementation
Nope, my point was that for some libc functions gcc can inject the code, therefore there are functions for which you can't change the API. So POSIX or not, some libc functions (as well as their prototypes) are hardwired into the compiler. The emphasis is on defining the API, as the OP was interested in implementing a C library from scratch (which later turned out to be a not-so-much-C library :-)).

(FYI, it looks like Clang can be compiled to support builtins the same way as gcc does, using so called VARIANTs, read the comment on this test case for example. I haven't tried this feature yet, I just found it interesting. And Clang is also known to emit calls to undefined or unwanted functions, read this for example, although I think bcmp is not POSIX but BSD libc standard if I'm not mistaken.)

Cheers,
bzt
Post Reply