What do you think about managed code and OSes written in it?

willedwards · Post by **willedwards** » Tue Jan 27, 2015 3:50 pm

One can argue about the exact definition of a "managed language", and perhaps LISP isn't one, but its an interesting departure from mainstream C for OS development nonetheless

https://github.com/froggey/Mezzano has just been put on github. Screenshot:

(I am not the author; I just saw it, and thought it super cool and worth sharing

)

Another OS that was written in a garbage-collected bounds-checked language was Oberon http://en.wikipedia.org/wiki/Oberon_%28 ... _system%29

Oberon had an interesting graphical tiling manager, code-in-doc, pcode ... lots so far ahead of its time that its a shame the world went Windows

Screenshot:

Brendan · Post by **Brendan** » Tue Jan 27, 2015 5:15 pm

Hi,

embryo wrote:
Brendan wrote:Bugs can be categorised as follows:
Problems that can be detected at compile time

Problems that can't be detected at compile time, but can be detected by either software or hardware at run-time

Problems that can't be detected at compile time or by hardware, but can be detected by software

Problems that can't be detected in an automated/systematic way, where human effort (in the form of unit tests, bug reports from end users, etc) is required
A bit shorter it can be present as a matrix with rows and columns as such:
Rows:
- compile time detection
- run time detection
Columns:
- automated detection
- manual operations

So your phrase:
Brendan wrote:For a suitably well designed language (e.g. not C with all its silly undefined behaviour) that's used by a suitably competent programmer, the majority of problems fall into the first category or last category.
Can be compared with this - the majority of problems fall into the manual operations column.

You've shifted "automatically detected at compile time" (my first category) into your "manual operations" column, and this mistake has led you to a very incorrect conclusion.

embryo wrote:
Brendan wrote:For the minority of problems that fall into the second and third categories, you can't assume that the code isn't malicious (e.g. deliberately designed to exploit inevitable bugs in the compiler or environment) and hardware security/isolation is a significant part of the defence against this. Also, the security/isolation provided by the hardware is often unavoidable (e.g. paging is required for other reasons that don't involve security/isolation so it costs nothing extra to also use it for security/isolation). For these reasons; for problems in the second category it would be foolish to rely on software checks alone (rather than hardware checks alone, or both software and hardware checks).
Of course we can use hardware protection even within a managed environment. But the question is about unmanaged environment's insistence on the hardware protection only. And everything that is outside of the hardware protection is supposed to be handled to a programmer. Would it be security, isolation or whatever. Here again I should repeat my automation hungry position - just why I should do those bothering things instead of some program to do it for me?

Because the amount of extra work that "managed" helps you avoid is almost nothing (because very few problems were in my third category) and because you take pride in the quality of the end product (and don't want to push unnecessary overhead onto a large number of end users just because you lack either competence or confidence).

embryo wrote:
Brendan wrote:For bugs in the third category; the additional complexity in the compiler/environment, the overhead of run-time checks, and the performance loss caused by preventing the programmer from using lower level approaches (e.g. inline assembly); combined with the very small number of bugs that fall into the third category; mean that the advantages of software checks at run-time ("managed") are not justified by the disadvantages.
I see some overcomplication here. A lot of complex entities are mixed in one phrase. And next follows a short conclusion about "not justified by the disadvantages". But it's too much entities here to draw such simple conclusion without any detailed explanation.

So I ask a simple question - do you want that a software were able to free you from some tedious work? If yes, then it's just all about making our development environment managed. Just as simple as such.

Given the choice between an "unmanaged" language that frees me from some tedious work and detects most problems at compile-time, and a managed language that is bloated with boilerplate and hassles that only detects problems at run-time; I'll choose whichever language gives me the freedom I need to produce the highest quality/best performing software because the quality/performance of the end product is more important than me being lazy and not bothering to do my job.

Cheers,

Brendan

Brendan · Post by **Brendan** » Tue Jan 27, 2015 5:50 pm

Hi,

Rusky wrote:
Brendan wrote:For overflows all of the bugs can be found at compile time, and fixed by increasing variable sizes or reducing input values with no (manually inserted or automatically inserted) run-time checks. For statically allocated arrays all the bugs can be found at compile time, and fixed by using ranged types for indexing with no run-time (manually inserted or automatically inserted) checks.
None of these are true, because of (among other things) I/O.

For example, statically-sized stack buffers used to read in or operate on data from the user, the network, the disk, other programs, etc. must be bounds-checked at run-time, either directly through an index range check, or indirectly through ranged types or simply the structure of the code working with the buffer.

Wrong. You're thinking of specific problems with a specific language that don't apply to all possible unmanaged languages.

For an example; for the message passing I typically use for my OSs, all IO is transferred via. "message buffers" . The messages appear as fixed size (2 MiB) arrays of bytes (regardless of actual message size) and the compiler can statically check that you're not reading/writing beyond the end of the 2 MiB array of bytes, and no run-time bounds-checking is needed for any IO of any kind.

For an alternative example; a higher level language could be designed that does IO via. something like "myArray << fileDescriptor" and "myArray >> fileDescriptor" where the compiler infers the max. number of bytes to be read/written from the size of the array.

Rusky wrote:Ranged types to prevent overflow/overrun can be a great tool to manage those checks, but you often end up with a value whose range is too big, either from I/O or from normal operations on the value. How do you convert it to a type with the correct range? A run-time check, somewhere on the spectrum between manually and compiler-inserted.

As far as I'm concerned, all IO is bytes, and for input those bytes typically needed to be converted into another form and validated during that conversion. If you end up with a value whose range is too big, either from I/O or from normal operations, then you have a bug that can and should be detected at compile time.

Rusky wrote:
Brendan wrote:Except "slightly slower" can be a 20 times performance difference
In a properly-written system that is rare and bypassable without turning off the checks by default.

If you can turn the checks off, then it's no longer "managed" (e.g. malicious code can turn the checks off).

Rusky wrote:
Brendan wrote:"accidental remote code execution vulnerabilities" is a straw man.
No, it is the leading result of not doing proper bounds checking, and is thus the biggest problem solved by either compile-time or run-time, manual or enforced or automatic, overflow/bounds checking.

It's the leading result of not doing proper bounds checking, and the leading cause is languages like C that suck. Switch to a better unmanaged language and the problem can disappear (a managed language is *not* necessary).

Cheers,

Brendan

Rusky · Post by **Rusky** » Tue Jan 27, 2015 6:50 pm

Brendan wrote:For an example; for the message passing I typically use for my OSs, all IO is transferred via. "message buffers" . The messages appear as fixed size (2 MiB) arrays of bytes (regardless of actual message size) and the compiler can statically check that you're not reading/writing beyond the end of the 2 MiB array of bytes, and no run-time bounds-checking is needed for any IO of any kind.

The compiler can only statically check that you're not reading/writing outside the 2MiB array if the values you use for indices never originate from or pass through any operations that could push them out of bounds. Ranged types can statically verify when this does happen, but if you ever need to calculate such an index from a value that didn't originate as a numeric literal in the program source, or whose calculation is undecidable at compile time, there must be a check somewhere, whether it's directly "0 <= i < 2MiB" or indirectly "i & 0x1fffff".

Brendan wrote:As far as I'm concerned, all IO is bytes, and for input those bytes typically needed to be converted into another form and validated during that conversion. If you end up with a value whose range is too big, either from I/O or from normal operations, then you have a bug that can and should be detected at compile time.

That conversion and validation is precisely the sort of run-time check I'm talking about.

Brendan wrote:If you can turn the checks off, then ... malicious code can turn the checks off.

Not necessarily. One example is iterators- normal array access can be bounds-checked by default, but for the case of iterating through an array you can use iterators instead which fold all the checks into the loop condition, for no overhead. And in any case, you can do things like force untrusted code not to disable the checks, while allowing it in specifically-marked sections of trusted programs the compiler can point out to you.

This is the approach taken by Mozilla Servo, for example- they use a compiler switch to disallow Rust unsafe blocks everywhere but in specific places where they're using it to build safe abstractions (sort of like how you allow the compiler to generate code that would generally be unsafe but is not because of its analysis).

Brendan wrote:It's the leading result of not doing proper bounds checking, and the leading cause is languages like C that suck. Switch to a better unmanaged language and the problem can disappear (a managed language is *not* necessary).

If you'll go back and read my posts, you'll notice I never said anything about managed languages being necessary to solve the problem, only about run-time bounds checking being necessary. On the other hand, I did say that run-time bounds checking was a relatively small part of the overhead of a managed language, because you claimed it was a big reason not to use managed languages.

Brendan · Post by **Brendan** » Tue Jan 27, 2015 8:30 pm

Hi,

Rusky wrote:
Brendan wrote:For an example; for the message passing I typically use for my OSs, all IO is transferred via. "message buffers" . The messages appear as fixed size (2 MiB) arrays of bytes (regardless of actual message size) and the compiler can statically check that you're not reading/writing beyond the end of the 2 MiB array of bytes, and no run-time bounds-checking is needed for any IO of any kind.
The compiler can only statically check that you're not reading/writing outside the 2MiB array if the values you use for indices never originate from or pass through any operations that could push them out of bounds. Ranged types can statically verify when this does happen, but if you ever need to calculate such an index from a value that didn't originate as a numeric literal in the program source, or whose calculation is undecidable at compile time, there must be a check somewhere, whether it's directly "0 <= i < 2MiB" or indirectly "i & 0x1fffff".

The range/s of the result of an expression is always decidable at compile time (unless the range/s of variables used within the expression aren't known, which is impossible).

Note 1: A compiler must to be able to figure out how much space a variable consumes and whether it's signed/unsigned or floating point. If it doesn't know these things it can't generate code. From the information it must know the compiler can determine the worst case range of the variable. E.g. if the only thing you know about a variable is that it's an unsigned integer that takes up 2 bytes, then you can determine all values the variable could possibly hold will be in the range from 0 to 65535.

Note 2: I've said "range/s" here meaning "one range or 2 ranges" (and never more than 2 ranges). Tracking one range is enough for everything except signed division.

Rusky wrote:
Brendan wrote:As far as I'm concerned, all IO is bytes, and for input those bytes typically needed to be converted into another form and validated during that conversion. If you end up with a value whose range is too big, either from I/O or from normal operations, then you have a bug that can and should be detected at compile time.
That conversion and validation is precisely the sort of run-time check I'm talking about.

For example, if my application creates a dialog box asking the user to enter a prime number and the user enters the number 12, then my application has to:

check to see if the number was a prime number of not
construct a nice human readable error message if it isn't a prime number
display that error message in an appropriate way (e.g. another dialog box, and not just spewing it out of STDERR)

For another example, if my HTTP server receives a request that's supposed to contain a file name for a web page, then my HTTP server has to:

check to see if the file is present/readable
construct a nice machine readable error packet (e.g. a "404" response)
send that error packet in an appropriate way (e.g. using sockets, and not just spewing it out of STDERR)

Managed languages are inadequate for the checking, inadequate for constructing a suitable error, and inadequate for delivery the error appropriately. They're awesome!

Rusky wrote:
Brendan wrote:If you can turn the checks off, then ... malicious code can turn the checks off.
Not necessarily. One example is iterators- normal array access can be bounds-checked by default, but for the case of iterating through an array you can use iterators instead which fold all the checks into the loop condition, for no overhead. And in any case, you can do things like force untrusted code not to disable the checks, while allowing it in specifically-marked sections of trusted programs the compiler can point out to you.

Things we've mistakenly assumed today:

Iterators are necessary
Iterators can't be checked at compile time
End users can tell the difference between trusted code and malicious code just by seeing if (e.g.) the project's makefile tells the compiler to allow its checks to be disabled.

Cheers,

Brendan

Rusky · Post by **Rusky** » Wed Jan 28, 2015 12:03 am

That's a wonderful list of red herrings. Here's what I actually said:

Regardless of how well-known a range is at compile time, the range can end up outside what it needs to be for the next stage of an operation. This requires a runtime check, whether it got there by direct user input or some other way that may or may not exist in your magical rainbow unicorn language.
I'm not talking about managed languages, I'm talking about bounds and overflow checking. Unmanaged languages still have to do bounds checks in some form or another, and when they're not done you get security vulnerabilities.
I never said iterators were necessary, only that they are a tool that can eliminate bounds checks statically, because you complained that run-time bounds checked could cause a 20x performance drop (I'd like a citation for that number, by the way).
I never said iterators couldn't be compile-time range-checked. In fact (again, are you listening this time?) I used them as an example of how to do exactly that by moving unnecessary run-time checks to compile time.
I never said that the user would be the one to care about whether code is trusted. In my example of a browser engine, the application developers use the concept of trusted vs untrusted to limit the amount of code they have to check manually. In the context of someone putting together all the pieces of an OS for distribution, they might choose to trust some core libraries but not arbitrary applications, for maintenance and/or performance reasons.

Roman · Post by **Roman** » Wed Jan 28, 2015 12:54 am

After this thread I became a bit confused about the need of managed code. And a question, why not just include native boundary checking (if we know the exact size of buffers)? For what do we need managed code/VMs to do it? And I disagree with Brendan, it's not possible to always know the size at compile-time, buffers can be dynamic.

Roman · Post by **Roman** » Wed Jan 28, 2015 1:06 am

Initially, I was going to develop a managed OS, but now (after some more research) I think, that superior security can be achieved without permofance lacks and interpreted code, with hardware security/isolation techniques, if they are used properly.

Combuster · Post by **Combuster** » Wed Jan 28, 2015 1:21 am

Note that whatever the church of Rusky, and in particular the church of Brendan have been preaching, when all other factors are equal toolchain-enforced security always comes at the cost of some performance, and that not having toolchain-enforced security is always less secure because of human error factors.

Strike your own balance.

Brendan · Post by **Brendan** » Wed Jan 28, 2015 1:23 am

Hi,

Rusky wrote:
Regardless of how well-known a range is at compile time, the range can end up outside what it needs to be for the next stage of an operation. This requires a runtime check, whether it got there by direct user input or some other way that may or may not exist in your magical rainbow unicorn language.

The value can end up outside of the variable's range if:

the compiler/language doesn't support compile time checking to prevent it, even though it's 100% possible to do so; or
other code (the kernel, a bus mastering driver, the compiler) is buggy; or
the hardware is faulty

This only "requires" a run-time check if you redefine the word "requires" to mean "if you didn't feel like doing something to prevent it at compile time and actually care in the first place". Note that there are languages (e.g. C) where the language designers didn't care in the first place.

Rusky wrote:
I'm not talking about managed languages, I'm talking about bounds and overflow checking. Unmanaged languages still have to do bounds checks in some form or another, and when they're not done you get security vulnerabilities.

Where "in some form or another" includes just letting the hardware do it (e.g. dereferencing a null pointer), and "security vulnerabilities" is an very unlikely worse case in a properly designed system and isn't even close to a guaranteed outcome for extremely badly designed systems.

Rusky wrote:
I never said that the user would be the one to care about whether code is trusted. In my example of a browser engine, the application developers use the concept of trusted vs untrusted to limit the amount of code they have to check manually. In the context of someone putting together all the pieces of an OS for distribution, they might choose to trust some core libraries but not arbitrary applications, for maintenance and/or performance reasons.

In your example of a browser engine; a managed OS (that relies on a managed language for security/isolation) can not trust the browser because the browser is not "100% managed". Whether or not the malicious code's author thinks you should trust their virus (because they wrote most of the virus in "safe" code and a small part in "unsafe" code) is not particularly relevant when it comes to the OS and/or end user's trust.

Cheers,

Brendan

Roman · Post by **Roman** » Wed Jan 28, 2015 2:33 am

JavaScript/Unsafe code, interpreted by safe managed code, becomes safe too, if the interpreter is coded properly.

willedwards · Post by **willedwards** » Wed Jan 28, 2015 3:54 am

JavaScript/Unsafe code, interpreted by safe managed code, becomes safe too, if the interpreter is coded properly.

I think we are equating "managed code" to "languages with memory safety" in this thread. Assuming this definition, I think this claim that unsafe code interpreted by safe code is safe is flawed

When you compile a C program to Javascript with emscripten, the vulnerabilities in the C program are ported too.

A memory-safe language running on a VM which is itself written in an unsafe language (e.g. the JVM is written in C, most Javascript engines are written in C/C++) is also vulnerable to memory safety bugs in the VM itself.

A memory-safe language running on a VM written in a memory-safe language is still vulnerable to memory safety bugs in the compiler toolchain itself.

Generally, think of it as a chain only being as strong as the weakest link.

HoTT · Post by **HoTT** » Wed Jan 28, 2015 4:04 am

the compiler/language doesn't support compile time checking to prevent it, even though it's 100% possible to do so; or

Let's not forget that array bounds checking is undecidable in the general case. If you tell us that one should use a programming language that statically prevents array out of bounds errors, you need to talk about the tradeoffs as well.

Many languages use bound checking by default and remove the checks where it can be statically proven they are not needed. Another approach would be to reject all programs where it cannot be proven, which usually will require to add a manual check.

Comes down to the ~same number of run time checks.

However none of this requires managed code. Whatever that is.

embryo · Post by **embryo** » Wed Jan 28, 2015 6:43 am

willedwards wrote:One can argue about the exact definition of a "managed language", and perhaps LISP isn't one, but its an interesting departure from mainstream C for OS development nonetheless

https://github.com/froggey/Mezzano has just been put on github.

Nice OS

I hope it's author will announce the OS here and then we can talk about it's advantages.

embryo · Post by **embryo** » Wed Jan 28, 2015 7:23 am

HoTT wrote:What exactly makes a language a managed one? I think there are at least two definitions flying around, both constantly changing. The discussion makes not much sense this way.

Originally it was Microsoft who coined this term. They assume a runtime engine performs some management tasks.

But it is clear that part of runtime management can be avoided if it is known what exactly a program does. Brendan insists that it is all management tasks that can be avoided at runtime, but my position is that his statement is overconfident and I try to show where he is missing the point.

Brendan wrote:You've shifted "automatically detected at compile time" (my first category) into your "manual operations" column, and this mistake has led you to a very incorrect conclusion.

But how can I shift something named using word "automatically" from a column with "automatic operations" name?

Brendan wrote:Given the choice between an "unmanaged" language that frees me from some tedious work and detects most problems at compile-time, and a managed language that is bloated with boilerplate and hassles that only detects problems at run-time; I'll choose whichever language gives me the freedom I need to produce the highest quality/best performing software because the quality/performance of the end product is more important than me being lazy and not bothering to do my job.

Well, let's look closely at those intermixed entities that you have coupled within this reply.

1) It is "managed" vs "unmanaged" languages discussion.
2) It is a discussion about how much work can be performed at compile time.
3) It is a discussion about what would be left to the runtime management.
4) It is about how much managed environments are "bloated with boilerplate and hassles".
5) It is about what existing managed environment compilers can detect at compile time.
6) It is about your understanding of terms in the phrase "highest quality/best performing software".
7) It is about your understanding of word "freedom" in context of programming tools.
8 ) It is the question about what is more important for a majority of developers and how those priorities are related to your vision of importance of some features of an end product.
9) It is about how far you can extend your fight with laziness and "not bothering to do my job".
10) It is about how eager are other developers to accept your fight with laziness and how long they would be agree to follow your standards of quality and performance (in your understanding).

This statement can be accepted as your position declaration, but it in no way can be accepted as a discussion related proof.

For a correct discussion it is important to determine the discussion goal and it's space. The 10 mentioned entities define a very broad space and make the goal very unclear.

But I'll try to return the discussion on track. And for doing it I can represent your view of a your preferred language as something that prevents all bugs just by applying some checks at compile time. And having such definition we can make some conclusions. First, as I see it, it is impossible to detect all bugs at compile time. If some bugs are still with us we can not trust our code at run time. And if we can not trust our code then we should take care of it's execution and manage some possible outcomes of the uncaught bug existence. So my statement is about the need for a management environment, that is able to manage code at run time and prevent it's bugs from being too dangerous. And another benefit of the managed environment is it's abilities to manage software maintenance, software performance monitoring, software bug reporting, software development cycle, detection of software usage patterns, software run time optimization, enhanced environment security and reliability, hardware independence and most probably something else that I've just missed here.

OSDev.org

What do you think about managed code and OSes written in it?

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in

Re: What do you think about managed code and OSes written in