Page 1 of 2

Newbie questions

Posted: Sun Sep 16, 2018 7:15 am
by Thpertic
I'm a total newbie. I've followed Bare Bones and I kind of studied Meaty Skeleton. Now I want to create mine, but I have a few questions. To compile libk.a I just have to add the -lk flag to the gcc compile? If I make my own stdio library, not equal to the original, it won't work with other pre-built applications? How the .d files are created? Thanks all.

Re: Newbie questions

Posted: Sun Sep 16, 2018 7:55 am
by TheCool1Kevin
Thpertic wrote:I'm a total newbie.
Welcome.
Thpertic wrote:To compile libk.a I just have to add the -lk flag to the gcc compile?
Yes, GCC can be a linker too.
Thpertic wrote:If I make my own stdio library, not equal to the original, it won't work with other pre-built applications?
No, it will not work on most applications. It really depends on how much you deviate from the original standard library (stdlib).
Many applications also rely on Linux system calls (ioctl etc) that aren't implemented in off the shelf portable stdlibs (Newlib). It's best to not worry too much about the standard user library when developing the basic kernel for now.
Thpertic wrote:How the .d files are created?
They are Makefile dependencies.

Re: Newbie questions

Posted: Sun Sep 16, 2018 11:51 am
by Thpertic
So your advice is to keep working on my versions of the libraries and later think about the actual compatibility?

Re: Newbie questions

Posted: Sun Sep 16, 2018 12:02 pm
by Brendan
Hi,
Thpertic wrote:So your advice is to keep working on my versions of the libraries and later think about the actual compatibility?
For kernel; there's no need for any standard library - you're able to create functions that are actually good instead of blindly regurgitating standard library functions that have been crippled by historical baggage for decades.

For user-space; if you care about compatibility then you have to provide compatibility.

To understand what I mean here, consider "printf()". In user-space this would just use "fprintf(stdout, ..", which would probably use lower level functions (e.g. maybe "sprintf()" then "fwrite()"), which would use lower level functions ("write()"). In a kernel "write()" doesn't make any sense so the way "printf()" is implemented for user-space doesn't make any sense in the kernel. To work around that (if you had the same code for kernel and user-space) you'd have to make sure you never use "printf()" in the kernel and always use something else (e.g. "snprintf()").

However, for all of the formatted output functions, using run-time generated format strings (e.g. like "printf(myString);") is a potential security disaster, and if the format string is constant it's more efficient to explicitly break it up into smaller pieces (so that you avoid the unnecessary overhead of having to parse the format string, search for escapes, etc at run-time while also avoiding "variable args" nonsense); so you can just implement "add_string_to_kernel_log()", "add_integer_to_kernel_log()" and "flush_kernel_log()" style functions that are significantly easier to implement anyway.


Cheers,

Brendan

Re: Newbie questions

Posted: Sun Sep 16, 2018 1:23 pm
by Thpertic
So you're saying that I should add libraries with the user-space after I made the kernel, right? Sorry but I followed Meaty Skeleton and there are some standards libraries

Re: Newbie questions

Posted: Sun Sep 16, 2018 1:42 pm
by Brendan
Hi,
Thpertic wrote:So you're saying that I should add libraries with the user-space after I made the kernel, right?
In my opinion, yes.
Thpertic wrote:Sorry but I followed Meaty Skeleton and there are some standards libraries
There's an extremely tiny number of C standard library functions that:
  • don't depend on an already existing kernel
  • don't use "errno"
  • do provide functionality that is likely to actually be needed by a kernel
  • have a signature (inputs and outputs) that is hard to improve
It'd make sense to implement these in the kernel so that they match the C standard library functions (same name, same behaviour, same signature); even if the only reason for making it match is familiarity (so people know what the function does). Of course they also wouldn't need to be in a library - they could just be in a small "misc.c" file no different to any of the other source code.


Cheers,

Brendan

Re: Newbie questions

Posted: Sun Sep 16, 2018 11:48 pm
by glauxosdever
Hi,


libk is a stripped down build of libc, used for the kernel. sortie, who is the author and maintainer of Meaty Skeleton, decided it this way in order to avoid code duplication and to use familiar functions in the kernel. It's probably something derived from his own OS, Sortix, where he goes into great lengths to supply as many libc functions as possible in the kernel (even malloc() and most of the stdio.h stuff, see FREEOBJS in the Makefile from Sortix libc). If you want to just do POSIX, it's indeed feasible. But if you want to do something different, you will probably want different functions in the kernel than in libc, and possibly not even have a libc if you don't care about compatibility with programs not written specifically for your OS.

Hope this helps. :-)


Regards,
glauxosdever

Re: Newbie questions

Posted: Mon Sep 17, 2018 5:25 am
by Antti
Brendan wrote:However, for all of the formatted output functions, using run-time generated format strings (e.g. like "printf(myString);") is a potential security disaster, and if the format string is constant it's more efficient to explicitly break it up into smaller pieces (so that you avoid the unnecessary overhead of having to parse the format string, search for escapes, etc at run-time while also avoiding "variable args" nonsense); so you can just implement "add_string_to_kernel_log()", "add_integer_to_kernel_log()" and "flush_kernel_log()" style functions that are significantly easier to implement anyway.
As a general comment, I used those kind of procedures ("add_string", "add_integer", "flush") in my boot loader and it was very awkward to build output strings manually. Things would change if a programming language (or macros) made this automatically, e.g. a programmer-friendly representation of formatted strings got expanded into a set of those simple calls. Then this "hard to work with" argument would not matter that much, although the overall code size is likely to increase when compared to a simple printf-like implementation. However, that depends on calling conventions but perhaps those conventions could be optimized enough to make this argument false too.

Re: Newbie questions

Posted: Mon Sep 17, 2018 7:38 am
by Thpertic
I want compatibility. Your advice is to continue use libk or use other functions in kernel (despite code duplication) and then add libc? Sorry but I couldn't understand...

Re: Newbie questions

Posted: Mon Sep 17, 2018 8:03 am
by glauxosdever
Hi,


Sorry if I wasn't clear. I mean, you can already start implementing a libc that also comes as libk, if you want (sortie has done it in Sortix and recommends it in Meaty Skeleton). You may or may not want to do that however to the extent sortie has done it. It all boils down to choice and you will see later what works best for you. Do you want to have malloc() in the kernel? Put that in libk too (possibly with a different implementation). Do you want to have a different function instead of malloc(), possibly one that specifies more parameters? Don't put malloc() in libk then.

But since you are taking a well-documented route, that is doing a UNIX-like system as far as I understand, since you want compatibility, you might want to listen to one of the masters of doing a UNIX-like system. So, yes, do a libc already.

Hope this helps. :-)


Regards,
glauxosdver

Re: Newbie questions

Posted: Mon Sep 17, 2018 8:20 am
by Thpertic
Ok then... But in the libc to work properly I have to add the exact stdlib, stdio and so on, right?

I'm thinking to not worry about the actual compatibility and the code duplication since I'm still a starter. Just use a sort of printf only in the kernel for now. Thanks for all the answers

Re: Newbie questions

Posted: Mon Sep 17, 2018 1:41 pm
by Brendan
Hi,
Thpertic wrote:Ok then... But in the libc to work properly I have to add the exact stdlib, stdio and so on, right?
For the C standard library, there's a large difference between "hosted" and "freestanding".

Hosted is for normal user-space software, where you've got a massive amount of stuff that the software/library can rely on (file systems, time zone database, unicode/wchar handling, scheduler, etc). For this you need everything.

Freestanding is for when there is nothing the software/library can rely on (e.g. kernels that implement things like file systems, etc). It only includes float.h, iso646.h, limits.h, stdalign.h, stdarg.h, stdbool.h, stddef.h, stdint.h, and stdnoreturn.h.

Essentially; when the language was designed (about 46 years ago), the people that designed it knew that a lot of the C standard library (including stdlib, stdio, etc) doesn't make any sense for kernels.
Thpertic wrote:I'm thinking to not worry about the actual compatibility and the code duplication since I'm still a starter. Just use a sort of printf only in the kernel for now. Thanks for all the answers
For a kernel, the idea of "printing" doesn't really make sense once you get past "temporary hello world experiment". After boot (e.g. after user starts a GUI, etc) you don't want kernel or drivers ruining the screen with trash just because (e.g.) someone plugged in a USB device. Instead, you want some kind of logging system that appends data to file/s on disk, so that users/admin can do (e.g.) "cat /log/kern.log" whenever they feel like to see what happened. During boot (where you might want to display kernel initialisation stuff, even though it's all going to scroll off the top of the screen faster than anyone can read it and will look a lot uglier than a nice splash screen) you mostly want to add text to a log in memory (so that it can handed off to a "system logger" daemon and/or saved to disk later), and then have multiple different things in your boot code that can show the log to the user (e.g. one piece of code that shows the log using a local video card in text mode, another piece of code that can show the log using a local video card in graphics mode, another piece of code that sends the log to a serial port if it's a headless system without any video card, etc).

When you get tasks working you'll have to worry about race conditions and synchronisation. For example if one task is initialising the network and wants to add a multiple line message to the log and another task is initialising a file system and wants to add a different message to the log at the same time, then you don't want these messages to end up as jumbled mess (e.g. you don't want "Network info:\nIP address is Bad blocks found at sectors: 123.45.67.89\n #84, #85, Netmask is #86, #87, 255.255.255.0\n #88\n" when one task is trying to say "Network info:\nIP address is 123.45.67.89\nNetmask is 255.255.255.0" and another task is trying to say "Bad blocks found at sectors: #84, #85, #86, #87, #88\n"). For this reason you end up needing to have "local string builders" that create the text as a single string, and then an "atomically append string to log" in the kernel.

Also note that good operating systems don't use plain text for logging either. For example, each message in the log might be an entry with a header; where the header might have a 32-bit entry size field, a 64-bit timestamp field, a 32-bit "where it came from" field, an 8-bit "type" field, and an 8-bit "severity" field; with the text after the header. This allows you to quickly index, filter and sort the messages (e.g. so that the user can look at "all message from e1000 driver only" or "all messages from anywhere sorted in order of severity" or "all messages with severity > informational sorted in order of where it came from", or "all messages from 8:00 am to 10:00 am yesterday morning when the server seemed incredibly slow" or ...).

Essentially; you might end up with function/s to build the text (that might use something like "ksprintf()" multiple times to build the whole piece of text but might use a set of significantly simpler and more efficient "add_string(), add_decimal_integer(), add_hex_integer()" functions instead); then have a function like "atomically_append_to_log(int source_ID, int severity, char *message);" that determines the length of the string, gets a time stamp, acquires a lock, allocates space at the old end of the log (using something like "address = log_end; log_end += sizeof(header)+length_of_text;"), copies all the "header + text" data into the allocated space, then releases the lock, then broadcasts some kind of "log was updated" notification to whatever is listening (so that it can be displayed and/or sent to a serial port and/or added to files on disk and/or whatever else).


Cheers,

Brendan

Re: Newbie questions

Posted: Mon Sep 17, 2018 4:05 pm
by Thpertic
Brendan wrote: then have a function like "atomically_append_to_log(int source_ID, int severity, char *message);" that determines the length of the string, gets a time stamp, acquires a lock, allocates space at the old end of the log (using something like "address = log_end; log_end += sizeof(header)+length_of_text;"), copies all the "header + text" data into the allocated space, then releases the lock, then broadcasts some kind of "log was updated" notification to whatever is listening (so that it can be displayed and/or sent to a serial port and/or added to files on disk and/or whatever else).
Sorry, but I'm really starting now so I couldn't understand the part about the allocated space (do I have to make a buffer and then use it for everywhere I want to print it?), the question of the lock (like... cpu's interrupts?) and the "allocates space at the end of the old log" (can't i just print/use that previous buffer, free it and reuse it or just take another buffer? Another question is adding numbers in the buffer is ok, but should I add them in a printable mode (1 loaded as '1')?

Re: Newbie questions

Posted: Mon Sep 17, 2018 5:26 pm
by Brendan
Hi,
Thpertic wrote:
Brendan wrote: then have a function like "atomically_append_to_log(int source_ID, int severity, char *message);" that determines the length of the string, gets a time stamp, acquires a lock, allocates space at the old end of the log (using something like "address = log_end; log_end += sizeof(header)+length_of_text;"), copies all the "header + text" data into the allocated space, then releases the lock, then broadcasts some kind of "log was updated" notification to whatever is listening (so that it can be displayed and/or sent to a serial port and/or added to files on disk and/or whatever else).
Sorry, but I'm really starting now so I couldn't understand the part about the allocated space (do I have to make a buffer and then use it for everywhere I want to print it?), the question of the lock (like... cpu's interrupts?) and the "allocates space at the end of the old log" (can't i just print/use that previous buffer, free it and reuse it or just take another buffer?
Depending on how you feel like doing memory management, "allocating space" could just be "realloc()" (but I'd probably reserve an area of kernel space and map pages at the end of the buffer if/when necessary instead). The only place that adds data to the buffer would be the kernel's "atomically_append_to_log()" function and nothing else. Other code (e.g. device drivers) would just call the kernel's "atomically_append_to_log()" function and would never have any reason to touch the buffer directly.

The entire point of a log is to allow people (in the future) to see what happened (in the past). You can't destroy the data by recycling a previous buffer (and ruin the entire point of having a log) until/unless you store that data somewhere else first (e.g. in a file on disk).

Note that displaying the log on the screen is a mostly useless anti-feature (e.g. normal users hate having a wall of gibberish shoved in their face). It's just something OS developers do to reassure themselves that the code they wrote 2 minutes ago actually does what they hoped it would; and when you gain more experience/confidence you probably won't bother displaying it at all (unless there's a major error that causes the OS to be unable to boot, where you might display the error message on its own without the hundreds of lines of irrelevant stuff that preceded it).
Thpertic wrote:Another question is adding numbers in the buffer is ok, but should I add them in a printable mode (1 loaded as '1')?
You'd convert everything (numbers, etc) into ASCII or Unicode characters.


Cheers,

Brendan

Re: Newbie questions

Posted: Tue Sep 18, 2018 1:51 am
by Thpertic
I understood, but how can you explain me how to differentiate kernel and user space? Should I set like CS to an address and protect it? Or set a protected buffer and load the kernel there? I don't know #-o