Functional Programming (was Pitfalls of lacking a MMU)

Rusky · Post by **Rusky** » Fri Dec 02, 2011 8:52 am

Brendan wrote:In the most general possible terms, I'd question whether very fine grained isolation (e.g. "per object") is worth the extra/unavoidable overhead (regardless of how it's done - with segmentation, managed code or something else); and whether coarse grained isolation (e.g. isolating processes from each other) is a much better compromise between granularity and overhead.

In situations where managed code (in the sense of compiler-verified memory access, not wrt GC, et. al.) is acceptable, it's actually a pretty easy decision. When your compiler can verify, with no runtime overhead, that all memory accesses are in bounds, why would you not use that method?

Brendan · Post by **Brendan** » Fri Dec 02, 2011 5:51 pm

Hi,

Rusky wrote:
Brendan wrote:In the most general possible terms, I'd question whether very fine grained isolation (e.g. "per object") is worth the extra/unavoidable overhead (regardless of how it's done - with segmentation, managed code or something else); and whether coarse grained isolation (e.g. isolating processes from each other) is a much better compromise between granularity and overhead.
In situations where managed code (in the sense of compiler-verified memory access, not wrt GC, et. al.) is acceptable, it's actually a pretty easy decision. When your compiler can verify, with no runtime overhead, that all memory accesses are in bounds, why would you not use that method?

If a compiler can verify, with no runtime overhead, that all memory accesses are in bounds, then:

The code being compiled is so simple that it doesn't matter what you do, or
The language used is so restrictive that it's virtually useless for anything of reasonable complexity, or
The performance of the resulting code is so bad that you wish you could've used a language that does run-time checks instead, or
You've broken the world record for the most complex (and most error prone) compiler in history

Cheers,

Brendan

Rusky · Post by **Rusky** » Fri Dec 02, 2011 11:40 pm

False. Learn about dependent types.

For example, see this paper, this paper on eliminating array bounds checks with dependent types, and this paper on applying it to imperative languages, along with a lot more of that guy's work. Among other things, he applies dependent types, and with them compile-time array bounds checking, to ML, a widely used and decidedly not useless language. This is done conservatively, meaning it doesn't restrict what the language can do.

Without restricting things to super-simple examples, restricting the language, or over-complicating the compiler, they get performance increases of 12-79%, depending on the algorithm, by eliminating array bounds checks at compile time. Of course, there are still runtime checks of some variables, but only the ones that are necessary and only when they're not redundant. There's plenty of example code in those papers, and it doesn't require lots of arcane type incantations (in fact, some of the work in dependent types is on type inference).

Brendan · Post by **Brendan** » Sat Dec 03, 2011 1:17 am

Hi,

Rusky wrote:Of course, there are still runtime checks of some variables, but only the ones that are necessary and only when they're not redundant.

Might be a good time to change your forum password.

The "Rusky impersonator" that posted the previous message was talking about "When your compiler can verify, with no runtime overhead, that all memory accesses are in bounds", not "When your compiler can verify, with some runtime overhead, that most memory accesses are in bounds".

Cheers,

Brendan

Rusky · Post by **Rusky** » Sat Dec 03, 2011 1:34 pm

Might be good to learn some computing theory.

It's not overhead if it's actually impossible to get by, securely, without it. The checks that are necessary are for things outside of what the compiler can possibly know in this universe, such as reading random bytes from the network. The compiler verifies, without requiring unnecessary or redundant runtime checks, that all memory access is in bounds, not only some.

Yes, that's all, 100%, every memory access. All of them. They are all verified to be in range at compile time. This is partly accomplished by disallowing them in certain branches where addresses are not guaranteed to be in bounds. Yes, that means there will be runtime checks, but there will not be extra, redundant, overhead-causing ones.

Brendan · Post by **Brendan** » Sat Dec 03, 2011 5:49 pm

Hi,

Rusky wrote:Might be good to learn some computing theory.

It's not overhead if it's actually impossible to get by, securely, without it. The checks that are necessary are for things outside of what the compiler can possibly know in this universe, such as reading random bytes from the network. The compiler verifies, without requiring unnecessary or redundant runtime checks, that all memory access is in bounds, not only some.

Think of it in terms of certainty. If the code says "let i = 3" then the compiler can be reasonably certain that "i == 3". If the documentation for a kernel API function says "This function will only ever return values from 0 to 3", then the programmer can be reasonably certain that the kernel API function will only ever return the values from 0 to 3, but the compiler can't/won't know that. For IO, depending on where the data comes from (e.g. reading from an "untrusted" file vs. IPC from another instance of the same executable) the programmer may be certain about the values the data contains, but the compiler can't/won't know. Then there's things that in theory the compiler could know, but in practice the compiler doesn't know because it's too complicated to figure out (e.g. no sane programmer will ever expect any compiler to generate 100% optimal code in every possible case, including removing unnecessary checks - 100% optimal code is like absolute zero in that it exists but isn't achievable in practice).

Rusky wrote:Yes, that's all, 100%, every memory access. All of them. They are all verified to be in range at compile time. This is partly accomplished by disallowing them in certain branches where addresses are not guaranteed to be in bounds. Yes, that means there will be runtime checks, but there will not be extra, redundant, overhead-causing ones.

I don't think you understand the difference between 100% and 99.9999995%.

Cheers,

Brendan

Rusky · Post by **Rusky** » Sat Dec 03, 2011 7:05 pm

Brendan wrote:Think of it in terms of certainty. If the code says "let i = 3" then the compiler can be reasonably certain that "i == 3". If the documentation for a kernel API function says "This function will only ever return values from 0 to 3", then the programmer can be reasonably certain that the kernel API function will only ever return the values from 0 to 3, but the compiler can't/won't know that. For IO, depending on where the data comes from (e.g. reading from an "untrusted" file vs. IPC from another instance of the same executable) the programmer may be certain about the values the data contains, but the compiler can't/won't know.

This is why I gave you those papers. If, instead of merely documenting the return value's range in a comment, you put that restriction in the function's type, the compiler can verify it and then rely on it for further analysis. This is more difficult when e.g. crossing IPC boundaries, but not impossible, depending on the language/compiler design.

Brendan wrote:Then there's things that in theory the compiler could know, but in practice the compiler doesn't know because it's too complicated to figure out (e.g. no sane programmer will ever expect any compiler to generate 100% optimal code in every possible case, including removing unnecessary checks - 100% optimal code is like absolute zero in that it exists but isn't achievable in practice).

What are some examples of this, in the case of bounds checking? I've given some pretty strong support for dependent types' abilities, you've just asserted that some things are "too complicated" for the compiler.

Brendan · Post by **Brendan** » Sat Dec 03, 2011 8:36 pm

Hi,

Rusky wrote:
Brendan wrote:Then there's things that in theory the compiler could know, but in practice the compiler doesn't know because it's too complicated to figure out (e.g. no sane programmer will ever expect any compiler to generate 100% optimal code in every possible case, including removing unnecessary checks - 100% optimal code is like absolute zero in that it exists but isn't achievable in practice).
What are some examples of this, in the case of bounds checking? I've given some pretty strong support for dependent types' abilities, you've just asserted that some things are "too complicated" for the compiler.

Here's one:

Code: Select all

int myArray[1234];

int foo(int index) {
    return myArray[index];
}

Assume that the caller is in a separate object file.

Cheers,

Brendan

Rusky · Post by **Rusky** » Sat Dec 03, 2011 8:48 pm

When compiling this code:

Code: Select all

int myArray[1234];

int foo(int index) {
    return myArray[index];
}

The compiler reports an error that index can be out of range. To fix it, you either specify in the type of foo that index must be in [0,1234), or you add a check for whether it's in range. No knowledge of other files necessary.

In the first case, when compiling the caller, you apply the same logic to whatever gets passed to foo. If it comes in from the network, for example, you have to add a runtime check at some point. If it comes from some other function (like that hypothetical API function that returns [0,3]), the compiler can use that information as well.

I'm beginning to think you didn't so much as glance at any of those papers or even google dependent types...

Brendan · Post by **Brendan** » Sat Dec 03, 2011 9:19 pm

Hi,

Rusky wrote:When compiling this code:
Code: Select all
int myArray[1234];

int foo(int index) {
    return myArray[index];
}
The compiler reports an error that index can be out of range. To fix it, you either specify in the type of foo that index must be in [0,1234), or you add a check for whether it's in range. No knowledge of other files necessary.

So to avoid having any run-time checks, you force the caller to check if the value they use for "index" is within range before calling (possibly during run-time), or you check that "index" is within range inside "foo()" at run-time.

Here's a slight variation:

Code: Select all

int myArray[10];

int foo(int value) {
    return myArray[value & 15];
}

Are you going to write a compiler that has a special integer type for numbers "from 0 to 9, or from 16 to 25, or from 32 to 41, or ...."?

Do I need to get more complex (maybe "return myArray[ sqrt(value) & 251];")?

Rusky wrote:I'm beginning to think you didn't so much as glance at any of those papers or even google dependent types...

I did glance at the second one briefly. To be perfectly honest, I saw "ML" and put it in the "wanky academic crap" pile..

Cheers,

Brendan

Rusky · Post by **Rusky** » Sat Dec 03, 2011 9:44 pm

Brendan wrote:Are you going to write a compiler that has a special integer type for numbers "from 0 to 9, or from 16 to 25, or from 32 to 41, or ...."?

Do I need to get more complex (maybe "return myArray[ sqrt(value) & 251];")?

How about a compiler with general types for integers that satisfy certain conditions? sqrt is trivial to deal with in that way; bitwise operators are a little trickier but still possible AFAIK. (If not, how useful is indexing an array with the result of a bitwise and anyway?)

Think of those functions and operators as having more specific types that specify their domain and range more precisely. The compiler will end up having to do some computation, but that's addressed in the literature...

In any case, my original point still stands that a well-designed managed language can gain a substantial performance increase through dependent types, and that hardware memory protection is unnecessary and harmful in that case.

Brendan wrote:I did glance at the second one briefly. To be perfectly honest, I saw "ML" and put it in the "wanky academic crap" pile..

If you could get over your pretensions, you would probably sound a bit less silly.

Brendan · Post by **Brendan** » Sat Dec 03, 2011 9:55 pm

Rusky wrote:
Brendan wrote:Are you going to write a compiler that has a special integer type for numbers "from 0 to 9, or from 16 to 25, or from 32 to 41, or ...."?

Do I need to get more complex (maybe "return myArray[ sqrt(value) & 251];")?
How about a compiler with general types for integers that satisfy certain conditions? sqrt is trivial to deal with in that way; bitwise operators are a little trickier but still possible AFAIK. (If not, how useful is indexing an array with the result of a bitwise and anyway?)

Hash tables (if modulo is overkill)?

Rusky wrote:
Brendan wrote:I did glance at the second one briefly. To be perfectly honest, I saw "ML" and put it in the "wanky academic crap" pile..
If you could get over your pretensions, you would probably sound a bit less silly.

"Rain increases the chance of car accidents. To improve safety, I invented a "functional car" - as soon as it detects the smallest drop of water it jams on the breaks, turns of the engine and refuses to move. People laugh at my "functional car" and call it funny names (like "non-functioning car"). I've been pushing the idea for over 50 years and still can't understand why nobody wants it."

"State increases the chance of programming mistakes. To improve "correctness", I invented "functional programming" - as soon as it detects the smallest amount of state, it spits out an error message and refuses to compile. People laugh at my "functional programming" and call it funny names (like "non-functioning programming"). I've been pushing the idea for over 50 years and still can't understand why nobody wants it."

Cheers,

Brendan

Rusky · Post by **Rusky** » Sat Dec 03, 2011 10:00 pm

Brendan wrote:Hash tables (if modulo is overkill)?

Modulo is significantly closer to what you mean (not to mention significantly simpler to handle with dependent types), and any optimizing compiler from the past few decades ought to be able to perform strength reduction on it.

Brendan wrote:"Rain increases the chance of car accidents. To improve safety, I invented a "functional car" - as soon as it detects the smallest drop of water it jams on the breaks, turns of the engine and refuses to move. People laugh at my "functional car" and call it funny names (like "non-functioning car"). I've been pushing the idea for over 50 years and still can't understand why nobody wants it."

"State increases the chance of programming mistakes. To improve "correctness", I invented "functional programming" - as soon as it detects the smallest amount of state, it spits out an error message and refuses to compile. People laugh at my "functional programming" and call it funny names (like "non-functioning programming"). I've been pushing the idea for over 50 years and still can't understand why nobody wants it."

... and there you go again. ML does, in fact, allow mutable state. In addition, one of the papers I linked to applies dependent types to a regular old imperative language. And further, if that's your attitude toward functional programming, I'm sorry.

Brendan · Post by **Brendan** » Sat Dec 03, 2011 10:26 pm

Hi,

Rusky wrote:
Brendan wrote:Hash tables (if modulo is overkill)?
Modulo is significantly closer to what you mean (not to mention significantly simpler to handle with dependent types), and any optimizing compiler from the past few decades ought to be able to perform strength reduction on it.

You're avoiding the issue.

How about, to avoid run-time checks, internally the compiler converts "int foo(int value)" to "int_with_standard_range foo(int_that_accepts_values_1to3_45_23_1983_6to15_4321to4444_elephant_and_69)", in a similar way to the name mangling C++ does to ensure type correctness during linking?

Rusky wrote:
Brendan wrote:"Rain increases the chance of car accidents. To improve safety, I invented a "functional car" - as soon as it detects the smallest drop of water it jams on the breaks, turns of the engine and refuses to move. People laugh at my "functional car" and call it funny names (like "non-functioning car"). I've been pushing the idea for over 50 years and still can't understand why nobody wants it."

"State increases the chance of programming mistakes. To improve "correctness", I invented "functional programming" - as soon as it detects the smallest amount of state, it spits out an error message and refuses to compile. People laugh at my "functional programming" and call it funny names (like "non-functioning programming"). I've been pushing the idea for over 50 years and still can't understand why nobody wants it."
... and there you go again. ML does, in fact, allow mutable state. In addition, one of the papers I linked to applies dependent types to a regular old imperative language. And further, if that's your attitude toward functional programming, I'm sorry.

You're right - to avoid confusion I should've said "pure functional programming" instead.

In the same way, I should've said "pure functional car", so people don't get confused and think I'm talking about cars that were wrapped in plastic to work-around the idiocy of the primary design goal (preventing people from driving in the rain).

The third paper applies dependent types to some contrived language they had to invent specifically for the purpose - it's hardly "regular" or "old".

Cheers,

Brendan

Combuster · Post by **Combuster** » Sun Dec 04, 2011 7:33 am

You both seem to have a problem:

I should've said "pure functional programming" instead.

ML does, in fact, allow mutable state

Every single functional language allows state, which works as follows:

Code: Select all

typedef struct 
{
     state a;
     int return_1;
     int return_2;
} reply_data;
reply_data pure_functional_function(const state s, const int arg_1, const int arg_2);

Haskell for example has a special "monad" object called IO that indicates system state.

As for array indices, it is possible as per the proof I posted earlier that you can generate the least amount of checks needed for any terminating algorithm. It is however completely infeasible to do so as it has not yet turned out to be achievable in polynomial time. Therefore compilers use proving techniques (with heuristics to find correctness proofs) to eliminate many access checks. If you have loops over arrays you can for instance check that the index is bounded by zero and the array size, and that the array size does not change during the course of the loop. This becomes more and more difficult as the amount of code in the loop changes and becomes impossible if the effects of called code on any argument becomes unavailable. Whole-program optimizers can pull the boundary of knowledge over the file border and make completions of correctness proofs more feasible.

But in general, compiler-generated access checking is not efficient at all for any sufficiently complex code.

OSDev.org

Functional Programming (was Pitfalls of lacking a MMU)

Functional Programming (was Pitfalls of lacking a MMU)

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU

Re: Pitfalls of lacking a MMU