Page 3 of 3

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Sun Feb 17, 2013 6:18 pm
by Love4Boobies
zeusk wrote:I use C because it's simple,
Its apparent simplicity comes from a lack of (often useful) features. But it's actually not simple at all; it has the most obscene pitfalls which programmers are forced to learn if they wish to avoid them. And I'm not talking about one or two---I'm talking about hundreds of such instances. It is this reason for which most people think they know C when they actually only have a superficial understanding of it. And I wouldn't blame them too much as many of the assumptions they make about the language are natural.
zeusk wrote:i know it well
Just because you know something doesn't mean you should close yourself from learning new things. Suppose you want to eat some steak. If you know how to use a spoon but not a fork, do you think it is a better idea to use the spoon or learn about the fork?
zeusk wrote:and it gets the job done.
Just like a spoon can get the job done. But often, it gets the job done in a lousy manner.
zeusk wrote:I've also used various other languages such as C#, C++ etc at work but never felt that the functionality they provide over C are worth the size, performance and portability issues they bring with them for use inside an OS.
First of all, most general-purpose languages don't make performance requirements. So, we're really talking about language implementations. Once upon a time, C++ was shunned for being considered inherently slow. Nowadays, programs written in C++ seem to exhibit performance characteristics that are extremely similar to their C equivalents. You can't really argue about languages, only the current unavailability of satisfactory tools.

Now that that's out of the way, let's focus on what people really care about. I have the following points to make:
  • The primary cause for slow software is bad design. The next, is the use of inefficient algorithms.
  • Things don't need to be perfect, they need to be good enough (even C makes compromises that reflect this reality). According to the 80-20 rule, 80% of the time is spent on 20% of the code. So why should 80% of the code be slower to develop, more difficult to maintain, and contain more bugs?
If you want high-quality software, the conclusion is unescapable. You should properly handle the upstream prerequisites of your project (problem definition, project planning, requirements, architecture, and design). You should use the appropraite tools (in the case of programming languages, you want them to be very expressive, and to do as much as possible to make your life easier, e.g., automatic memory management). You want the code to have deep readability/comprehensibility (remember, code is merely a textual specification of software behavior). You want to profile your code and improve the parts that matter (are there any design-induced bottlenecks that you could avoid and are you using good algorithms and data structures?). Finally, for the remaining situations where this is not enough (this happens quite rarely), provide optimized routines in an appropriate language (this could even mean assembly...). Huzzah! The number of lines from your code base written in bad languages (like C) is either zero or much, much closer to zero.
zeusk wrote:(or if OS code needs to be portable at all, usually it's well defined for a few architectures/VM only),
Linux seems to be very successful in contradicting your little rule of thumb. It has been ported to a huge amount of platforms, so it can be done. I'm sure you think it's a good idea, too. Why consider what you already know to be a bad requirement?
zeusk wrote:just more easier to port than equivalent C#, C++ stuff.
I am uncertain about what you mean here so I don't know how to respond. Do you mean in general or for embedded systems, which you mention in the following sentence?
zeusk wrote:Having worked with embedded stuff gave me another reason to use C, C compilers support far more architectures/CPUs than C# (which is mostly run on a MS VM) or C++.
I will grant you that C# is not as prevalent, esp. in the embedded world (in the non-embedded world, you still have a few cross-platform options, such as Mono). And this is not because people are afraid of managed langauges in that arena... Were you aware that the software that runs on most SIM cards was written in Java?

As for, C++ is commonplace even on obscure architectures. It's used for everything from the fuel injectors in your car to the Large Hadron Collider.
zeusk wrote:although one could argue my C++ code sucks as I am no expert with it
It's not your fault. Very few people actually are because C++ is an overly complex beast. I was being unbiased above when defending C++---please do not interpret the points I made as affection towards it. I think it is a plague.
zeusk wrote:
Love4Boobies wrote: I think it is mainly useful for embedded programming on resource-scarce platforms, as no better alternatives exist there.
Yes, it certainly is. But i don't get your argument of strictly confining it to resource-scarce platforms. Just because you have a few hundred thousand extra cycles you'll use something that isn't efficient and hence waste energy + time ?
I believe I covered this above but the short answer is: "usually, yes!" since, as I've said, things only need to be good enough. Not only that, but even with C implementations you don't escape this since they do the same compared to hand-optimized assembly. Furthermore, the waste is more than reasonable; we're not talking about orders of magnitude. Where do you draw the line? Simple: Focus on the development process and implicitly on reliability; only care about micro-optimizing when you must. I know I've pretty much repeated what I've said several paragraphs above but I want to stress that managing complexity is a software engineer's greatest responsibility.
zeusk wrote:Although the argument hasn't been raised here, The advantage of standard libraries in such managed languages and C++ doesn't matter at all, If done perfectly, You can have a well written library in C too. (ie. dlmalloc)
Well, the most important thing standard libraries do is to provide functionality which either describes the underlying platform or uses the underlying platform in some non-portable way. A regular library can surely do all this but, since it is not standard, it is not guaranteed to exist on all platforms where the language is implemented. Things like random number generator, sorting routines, etc. are there only there for the sake of convenience.

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Mon Feb 18, 2013 2:48 pm
by OSwhatever
rootnode wrote:I beg to differ. At Mosa we are writing a managed operating system in C#, with a complete toolchain (written in C#). To make it short: it's 100% C#. No C/C++ involved.
How do you inject architecture specific code if you use 100% C#? This would be simple things like disable interrupts, synchronization instructions, CPU settings etc?

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Mon Feb 18, 2013 3:06 pm
by jnc100
OSwhatever wrote:How do you inject architecture specific code if you use 100% C#? This would be simple things like disable interrupts, synchronization instructions, CPU settings etc?
You can't do this in standard C either, without using compiler-specific inline assembly. As regards doing it in C#, various projects do it in different ways. You can either call assembly functions to do the required functionality (exactly the same as some do from C) or attempt to inline the assembly yourself. Given that each project is writing the compiler themselves, it is relatively easy to intercept calls to various methods defined as 'extern' to actually inline certain asm statements into the code. For example, by referencing a support library from the C# code my compiler provides a number of 'extern' calls which can be used to either emit inline assembly or call an external assembly function if the inlining framework for that particular function is not implemented. An example for x86_64 is here.

Regards,
John.

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Mon Feb 18, 2013 8:26 pm
by Brendan
Hi,
jnc100 wrote:
OSwhatever wrote:How do you inject architecture specific code if you use 100% C#? This would be simple things like disable interrupts, synchronization instructions, CPU settings etc?
You can't do this in standard C either, without using compiler-specific inline assembly. As regards doing it in C#, various projects do it in different ways. You can either call assembly functions to do the required functionality (exactly the same as some do from C) or attempt to inline the assembly yourself. Given that each project is writing the compiler themselves, it is relatively easy to intercept calls to various methods defined as 'extern' to actually inline certain asm statements into the code. For example, by referencing a support library from the C# code my compiler provides a number of 'extern' calls which can be used to either emit inline assembly or call an external assembly function if the inlining framework for that particular function is not implemented. An example for x86_64 is here.
So basically you're saying that the only way for the kernel to be 100% safe/managed code is to have unsafe/unmanaged code?

There is one other way - shift the unsafe code into the compiler so they're part of the language. Of course even in this case the unsafe code still exists (it's just in a different place, hidden from the programmer). Most "managed language" advocates would probably think this counts as "safe", because these advocates have a nasty habit of ignoring the fact that any bugs in the compiler's implementation would lead to "safe" source code being converted into "unsafe" executable code.

For me personally, I like to think of it as "total surface area" - the total amount of code where bugs could lead to vulnerabilities in the final system. For something like a micro-kernel written in assembly there's maybe 100 KiB of kernel code plus 700 KiB of assembler that could have bugs/vulnerabilities. For something like a monolithic kernel written in C you're looking at maybe 5 MiB of kernel code and 70 MiB of compiler and linker; and for managed languages you could be looking at 75 MiB of compiler and linker (which is no better than unmanaged C, and about 100 times worse than the "micro-kernel in assembly" best case). :roll:

Of course this isn't too realistic. It's very easy to check if an assembler does what it should (e.g. compare a dissassembly with the original source code) and very hard to guarantee a complex optimising compiler is correct; so in practice the "micro-kernel in assembly" might be 200 times safer than the "monolithic kernel in C#" approach. :mrgreen:

[EDIT]: I also think that sometimes trolling can be an effective way to get people to re-evaluate the hype they've been told.[/EDIT]


Cheers,

Brendan

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Mon Feb 18, 2013 9:26 pm
by Love4Boobies
Brendan wrote:So basically you're saying that the only way for the kernel to be 100% safe/managed code is to have unsafe/unmanaged code?
Why does 100% of a kernel need to be written in a safe language? The point of using safe languages is to improve reliability. If the majority of your code is written in a safe language, then the majority of your code is more likely to be correct. If you are going to rely on a VM for protection, then it's applications which must be fully managed, not the kernel, which is trusted by definition.
Brendan wrote:There is one other way - shift the unsafe code into the compiler so they're part of the language. Of course even in this case the unsafe code still exists (it's just in a different place, hidden from the programmer). Most "managed language" advocates would probably think this counts as "safe", because these advocates have a nasty habit of ignoring the fact that any bugs in the compiler's implementation would lead to "safe" source code being converted into "unsafe" executable code.
Actually, I don't think most managed language advocates would consider this an elegant solution. At any rate, even if you rely on virtual memory for protection, you're not out of the woods. CPU's have bugs, too.
Brendan wrote:For me personally, I like to think of it as "total surface area" - the total amount of code where bugs could lead to vulnerabilities in the final system. For something like a micro-kernel written in assembly there's maybe 100 KiB of kernel code plus 700 KiB of assembler that could have bugs/vulnerabilities. For something like a monolithic kernel written in C you're looking at maybe 5 MiB of kernel code and 70 MiB of compiler and linker; and for managed languages you could be looking at 75 MiB of compiler and linker (which is no better than unmanaged C, and about 100 times worse than the "micro-kernel in assembly" best case). :roll:
Except the state of today's compiler is quite good. Almost all bugs I've encountered and researched, including my own, came from the programs rather than the compilers used on them. So, if you use a mature safe language compiler, you'll really be decreasing the amount of bugs despite the increased amount of machine code. You should probably be comparing SL / (SBp + SBc) vs. AL / (ABp + ABa), where
  • SL is the amount of safe code (measured in LoC).
  • SBp is the estimated average number of bugs per line introduced by the programmer.
  • SBc is the estimated average number of bugs per line introduced by the compiler.
  • AL is the amount of assembly code (measured in LoC).
  • ABp is the estimated average number of bugs per line introduced by the programmer.
  • ABa is the estimated average number of bugs per line introduced by the assembler.
This is for application programs. For kernels, there would also be a very small constant representing the likelihood that a MMU bug is triggered (a managed OS wouldn't use one). The following relations are almost certain to be true:
  • SL < AL (by many orders of magnitude)
  • SBp < ABp (by a large degree, although I have not made a survey)
  • SBc > ABa (the latter should be very close to zero)
A code base that gets assembled to 700 KiB actually has more bug opportunities for bugs because:
  • It is less expressive/intuitive.
  • A lot of work that could be automated is not (e.g., memory management).
  • Tools used for tracking down bugs are not particularly useful for assembly (I'm talking about many things here ranging from compilers enforcing type safety and out of bounds checks to static analysis and computer-aided verification).
  • Every small step in a HLL spans over several assembly steps (this equivalence only needs to be implemented once in the compiler, whereas assembly programmers have to do the same tedious things over and over again; e.g., routine prologues and epilogues, preparing the input for routines or different operations, etc.).

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Mon Feb 18, 2013 9:36 pm
by Love4Boobies
Since I went on about managed operating systems for some time, I think this would be a good time to mention that although I think they're a good idea for the most part, safe languages are what I really want to advocate.

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Mon Feb 18, 2013 10:21 pm
by Brendan
Hi,
Love4Boobies wrote:
Brendan wrote:So basically you're saying that the only way for the kernel to be 100% safe/managed code is to have unsafe/unmanaged code?
Why does 100% of a kernel need to be written in a safe language?
For a summary of preceding conversation (for context) - notes in italics are mine:

Casm: The point is that so called "managed" operating systems can only run on top of something which is effectively an unmanaged operating system. [Suggesting that "100% safe" is impossible when you look at the big picture]

rootnode: I beg to differ. At Mosa we are writing a managed operating system in C#, with a complete toolchain (written in C#). To make it short: it's 100% C#. No C/C++ involved. [Claiming that "100% safe" is possible]

OSwhatever: How do you inject architecture specific code if you use 100% C#? This would be simple things like disable interrupts, synchronization instructions, CPU settings etc? [Questioning how "100% safe" can be possible]

jnc100: You can't do this in standard C either, without using compiler-specific inline assembly [Suggesting "100% safe" is possible by not being 100% safe]

Brendan: So basically you're saying that the only way for the kernel to be 100% safe/managed code is to have unsafe/unmanaged code? [Pointing out that jnc100 is agreeing with OSwhatever and refuting rootnode's "100% safe" claim]

Brendan (continued): For me personally, I like to think of it as "total surface area" [Attempting to highlight the "bigger picture" that rootnode failed to notice, which is the compiler responsible for providing the "safe abstract machine", where "compiler providing safe abstract machine" is loosely analogous to the "effectively an unmanaged operating system" in Casm's original comment]
Love4Boobies wrote:
Brendan wrote:For me personally, I like to think of it as "total surface area" - the total amount of code where bugs could lead to vulnerabilities in the final system. For something like a micro-kernel written in assembly there's maybe 100 KiB of kernel code plus 700 KiB of assembler that could have bugs/vulnerabilities. For something like a monolithic kernel written in C you're looking at maybe 5 MiB of kernel code and 70 MiB of compiler and linker; and for managed languages you could be looking at 75 MiB of compiler and linker (which is no better than unmanaged C, and about 100 times worse than the "micro-kernel in assembly" best case). :roll:
Except the state of today's compiler is quite good. Almost all bugs I've encountered and researched, including my own, came from the programs rather than the compilers used on them. So, if you use a mature safe language compiler, you'll really be decreasing the amount of bugs despite the increased amount of machine code. You should probably be comparing SL / (SBp * SBc) vs. AL / (ABp * ABa), where
  • SL is the amount of safe code (measured in LoC).
  • SBp is the estimated average number of bugs per line introduced by the programmer.
  • SBc is the estimated average number of bugs per line introduced by the compiler.
  • AL is the amount of assembly code (measured in LoC).
  • ABp is the estimated average number of bugs per line introduced by the programmer.
  • ABa is the estimated average number of bugs per line introduced by the assembler.
The following relations are almost certain to be true:
  • SL < AL (by many orders of magnitude)
  • SBp < ABp (by a large degree, although I have not made a survey)
  • SBc > ABa (the latter should be very close to zero)
I disagree with "SL < AL" (the idea that a monolithic kernel will have less code than a micro-kernel seems rather unlikely to me). I'd assume SBp = ABp; however SBp and ABp should be "estimated average number of bugs that weren't detected at compile/assemble time" (and with this modification, SBp should be less than ABp).

With these changes, using "SL / (SBp * SBc) vs. AL / (ABp * ABa)", the micro-kernel in assembly probably is safer than (or at least similar to) the monolithic kernel in C# with a mature compiler. Of course I think jnc100 is right ("Given that each project is writing the compiler themselves") and that SBc is likely to be far worse (and therefore the micro-kernel in assembly is likely to be much safer).
Love4Boobies wrote:A code base that gets assembled to 700 KiB actually has more bug opportunities for bugs because:
  • It is less expressive/intuitive.
  • A lot of work that could be automated is not (e.g., memory management).
  • Tools used for tracking down bugs are not particularly useful for assembly (I'm talking about many things here ranging from compilers enforcing type safety and out of bounds checks to static analysis and computer-aided verification).
  • Every small step in a HLL spans over several assembly steps (this equivalence only needs to be implemented once in the compiler, whereas assembly programmers have to do the same tedious things over and over again; e.g., routine prologues and epilogues, preparing the input for routines or different operations, etc.).
Do people in the Arctic shiver more than people living near the equator; or do they get used to local conditions and wear more clothes, and end up doing a similar amount of shivering? Do people using very low level languages write more bugs than people using very high level languages, or do they just become more careful and end up with the same amount of bugs?


Cheers,

Brendan

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Wed Feb 20, 2013 5:16 pm
by jnc100
Brendan wrote:jnc100: You can't do this in standard C either, without using compiler-specific inline assembly [Suggesting "100% safe" is possible by not being 100% safe]

Brendan: So basically you're saying that the only way for the kernel to be 100% safe/managed code is to have unsafe/unmanaged code? [Pointing out that jnc100 is agreeing with OSwhatever and refuting rootnode's "100% safe" claim]
I was not trying to make a point about safe/unsafe code, rather than just that as far as I am aware it is impossible to completely write an operating system in any high level language due to lack of abstractions of underlying processor features (e.g. lgdt/mov cr3/wrmsr etc). This does not mean that resorting to either compiler-specific mechanisms to emit these instructions in the outputted code or calling an assembly stub is inherently 'unsafe' (in a memory-safe kind of way). Memory safety is a function of the code, rather than the language. The benefit of languages like java or those which target the CIL (e.g. C#/VB.net/F#/ipy etc), however, is that they make it very easy to determine, at load-time, whether code is memory-safe or whether it cannot be proved to be memory-safe (e.g. those programs using constructs like the C# 'unsafe' keyword), and a loader can simply reject those which cannot be proven to be safe. This has great benefits for things like device drivers, for example, if you consider that the vast majority of blue screens in Windows are due to dodgy drivers rather than the kernel itself.

Going back to my original point, its entirely possible to prove that the insertion of small amounts of asm code into the output stream is safe. For example, consider the code to switch cr3. If this is merely a function like mov cr3, a_value; ret; all you need to do to prove its safety is that 1) a_value is a valid PD/PDPT/PML4T (depending on architecture), 2) RIP is still valid and 3) all variables used by the following code are where they are expected to be. All of these points are entirely provable, and thus we can say that this particular piece of asm is safe. What we cannot do, on the other hand, is easily determine the safety of any arbritrary piece of asm code provided at run time.

In conclusion I am refuting the '100% C#' claim (excluding any compiler-specific extensions) but not the '100% safe' claim (although I accept its very difficult when things like DMA come into play).

Regards,
John.

Re: C pros and cons (was: I want to create an OS!!!)

Posted: Wed Feb 20, 2013 11:11 pm
by Brendan
Hi,
jnc100 wrote:Going back to my original point, its entirely possible to prove that the insertion of small amounts of asm code into the output stream is safe. For example, consider the code to switch cr3. If this is merely a function like mov cr3, a_value; ret; all you need to do to prove its safety is that 1) a_value is a valid PD/PDPT/PML4T (depending on architecture), 2) RIP is still valid and 3) all variables used by the following code are where they are expected to be. All of these points are entirely provable, and thus we can say that this particular piece of asm is safe. What we cannot do, on the other hand, is easily determine the safety of any arbritrary piece of asm code provided at run time.

In conclusion I am refuting the '100% C#' claim (excluding any compiler-specific extensions) but not the '100% safe' claim (although I accept its very difficult when things like DMA come into play).
I'd refute both. For example, consider the code to switch CR3 - a compiler can't guarantee the (tiny) inline assembly is safe and also can't guarantee that the virtual address space being switched to is safe either. Maybe the PML4 contains entries that point to the wrong PDPTs and the system crashes even though the inline assembly itself is perfectly correct, because there's bugs in the "100% safe" C# code responsible for creating the PML4.

The compiler can guarantee that a PML4 is created safely but can't guarantee that the resulting PML4 is safe to use as a PML4; the compiler can guarantee that data describing a new thread is created safely but can't guarantee that the data describing the new thread is safe for the scheduler's task switching code to use; the compiler can guarantee that an IDT entry is created safely but can't guarantee that the IDT entry is safe for the CPU to use; etc.

Basically, to ensure safety the compiler relies on assumptions that are true for normal software; but a kernel has to break those assumptions. For all cases where these assumptions are being broken there's some unsafe code (e.g. assembly) somewhere - e.g. loading CR3, doing a task switch, doing "LIDT" or "LGDT", etc.

The fact that unsafe code was needed is enough (on its own) to indicate that you're doing something that the compiler isn't aware of, and that "100% safe" is no longer possible.


Cheers,

Brendan