Page 5 of 9

Re: Self hosting OS/Compiler from scratch

Posted: Thu Jan 23, 2014 11:59 am
by brunexgeek
HoTT wrote: What do you expect to gain from this? Keep in mind: Java has no value types except for primitives, it's semantic relies on a garbage collector, so no way around it, it heap allocates everything, so you'll need a sophisticated optimizer _before_ generating c-code, last time I looked the layout of java classes was not specified etc ..
I expect, by example, create a library in Java and allow use it in programs in other languages (C, python, etc.). Internally the library can take advantage of the OO language and still compatible with C. Should be the same what you do when creating a C++ library.

About the C data types, I could use primitives and structures at least. C structures and pointers could be handled via abstractions as is done in JNA. I don't understand the point about the GC.

Re: Self hosting OS/Compiler from scratch

Posted: Thu Jan 23, 2014 12:10 pm
by brunexgeek
Kevin wrote:Did you consider using gcj? Then you don't have the unnecessary step with C in the middle, but simply another GCC backend.
Yes, I did. But initially I wanted a simple compiler, using a small subset of Java syntax. Thus I considered simpler to implement a parser itself. You can imagine the C code generator as a backend that can be replaced later. Moreover, I planned inject meta-information about the classes in the binary file (e.g. custom ELF section) and AFAIK GCJ don't allows that.
Kevin wrote:Interfacing between C and Java is not completely straightforward with gcj because Java can't use C types, but you can declare things as extern "Java" and use them in C(++) code.
AFAIK GCJ use a C++ scheme to generate Java native code (as if it convert Java to C++ and compile). This "extern" keyword allows to export methods as a C functions too or just C++ classes?

Re: Self hosting OS/Compiler from scratch

Posted: Thu Jan 23, 2014 1:58 pm
by Brendan
Hi,
skeen wrote:
Brendan wrote:Surely if one level of stupidity makes things simpler, then 2 levels of stupidity should make things twice as simple. I mean; why write code (templates) that produce code at compile time when you could write code (meta-templates) that produce code (templates) that produce code at compile time?

What we really should be doing is using something like Perl to generate meta-templates that generate templates that generate code; so that we can have a "factory factory factory" and make everything 3 times as simple!
"Everything Should Be Made as Simple as Possible, But Not Simpler" - Quote; Some smart guy

So what you're suggesting against, is that we add a factories which;
  • * Translates from assembly to machine code (one level of stupidity)
    * Translates from some low level portable code to assembly (two levels of stupidity)
    * Translates from some high level code to some low level portable code (three levels of stupidity)
I honestly believe that these three levels of stupidity has helped me out a lot in dealing with complexity alike. I honestly prefer writing C++ over hex, but that could just be my lack of experience 'programming?' in hex, also I guess that portability is going to be an issue with hex.

I do get what your trying to say, but my point is that, we should settle at the abstraction we find the best. If you're happy dealing with hex, then do so, however if you'd rather deal with C, then do so. I personality dislike reinventing the wheel each time I need a list, and hence I like writing generic code. Heck I enjoy writing meta-template programming; I haven't really done a lot of meta-meta programming though.

But if someone was to do meta-n,...,meta-programming and was comfortable doing that, then I'd not be telling them they were doing something wrong; I'd be amazed!

Imagine a triangle like this:

Code: Select all

                  |\
                  |_\
                  |  \
  Skill/knowledge |   \
                  |    \
                  |     \
                  |      \
                  |_______\
                   Number of people
A huge area (from the bottom of the triangle up) represents about 6.5 billion people who have no experience programming at all. At the tip, most of that small area represents normal programmers (ranging in skill). A very tiny part at the very top of the triangle represents people that understand the more advanced things (like template meta-programming). The highest point of the triangle represents a single person who is likely to be smarter than Einstien.

Near the bottom you've got (e.g.) 4 year-olds that understand concepts like conditions "if (you don't eat that spinnach) { you get no ice cream for dessert }". Above them you've got 10 year-olds that understand basic algebra like "x = y*2 + k", and can easily understand concepts like loops and functions.

Now have a look at something like Etoys. I've seen videos (here's one done by Alan Kay himself) where kids are doing programming to model the behaviour of thousands of ants; not because they're learning programming, but because they're learning about ants. On the other hand, we've got people spending tens of thousands of dollars and 4 years (or more) of time to go to university, not to learn about complex things (like the biology of ants that leads to complex group behaviors) but to learn simple things (like programming).

Stupid people like Bjarne Stroustrup and Simon Peyton Jones try their hardest to destroy programming for 95% of the people on earth; and amazing people like Alan Kay are trying their hardest to bring programming to 100% of the people on earth.


Please understand that there are 2 completely different (and separate) types of complexity. I'm talking about the complexity of programming languages above (which has nothing to do with the inherent complexity of large projects). I was not talking about the inherent complexity of large projects (which has nothing to do with the complexity of programming languages).

To deal with the inherent complexity of large projects, you need software engineering tools and not programming tools. These are things like UML diagrams, and breaking large/complex systems into smaller/simpler things when the system is being designed (before anything is implemented at all). People that think things like generics are going to help with the "inherent complexity of large projects" problem are trying to solve the wrong problems in the wrong tools - these things do not help at all, they only make the "complexity of programming languages" problem worse.
skeen wrote:
Brendan wrote:What I do think is that the benefits are not worth the increase in difficulty/complexity; regardless of how large/small the increase is.
Just because you don't see the benefits of generics, does this mean it's always bad?
No. For example, stabbing a stranger in the face is almost always bad, but there are cases (e.g. a terrorist about to detonate a bomb and kill thousands of people) where it could be a very good thing. In the same way, generics are almost always bad, but there are cases (e.g. a terrorist trying to program a nuclear missile guidance system to use for killing thousands of people) where it could be a very good thing.

Also note that I *do* see the benefits of generics. In my eyes these benefits are small. I also see the disadvantages. In my eyes the disadvantages are massive.


Cheers,

Brendan

Re: Self hosting OS/Compiler from scratch

Posted: Thu Jan 23, 2014 8:22 pm
by Owen
Brendan wrote:
skeen wrote:Why would it break with strings? - The idea of a container is mainly that you store the data, and is able to retrieve it later. Whether I hand you back an int, or a string doesn't matter a lot.
Why wouldn't it break with strings? For integers, something like "if(400 < 100)" makes sense. For strings, something like "if( "400 BC" < 100)" has several different (incompatible) possibilities for what is/isn't considered correct behaviour (is the string less than 100 characters, is the year 400 BC before the year 100 AD, does the pointer point to a virtual address that's lower than 100, etc).
Spoken like a person who hasn't done the research.

Firstly, I'm going to assume you meant if("400 BC" < "100"), because anything else makes no sense (How did you manage to pass an int to a method that takes a string?)

In C++, the third template argument is the type of a comparator. The default value for this parameter is std::less<KeyType>, which is a simple functor* in the form

Code: Select all

template<typename T>
struct less {
    static inline bool operator()(const T& l, const T& r) { return l < r; }
};
Lets say you needed to do something special. I don't know what? Maybe you want your map to be case insensitive?

In that case you'd implement something like

Code: Select all

struct less_case_insensitive {
    static inline bool operator()(const std::string& l, const std::string& r) {
        return stricmp(l.c_str(), r.c_str()) < 0;
    }
};
and then use something like std::map<std::string, whatever, less_case_insensitive>.

Understand: Templates and generics exist because sometimes you want to do something that the language doesn't natively support, and believe me large innovations have come from that direction.

There is a sense in the C++ community that, yes, template programming and more so meta programming are "advanced guru" level stuff. Template programming (especially container classes) isn't too bad, meta programming is pretty complex. There is work under way to simplify matters, for example a feature which has made it into C++14 (if memory serves) is that you can declare function parameters with type "auto" and the compiler will infer a template, and there is desire to simplify matters.

But C++ can never be simple. The feature which made it a success (C compatibility) has many, many implications on its' ability to be simple. Plus, it has done lots to harm the image of C++ (lots of bad C++ codebases come from C developers treating it as 'just' extended C).

Its' one of the reasons, IMO, that Rust has been smart by going for new syntax. It makes it feel different, which means you're prepared for the different way things work.

I think you could learn quite a bit from Rust. Its' owned and borrowed pointers plug the memory safety holes in your language (Though implementing them is quite a lot of work, as I gather...)

* This takes the form of a class for unfortunate reasons due to C++'s low level nature and C compatibility

Re: Self hosting OS/Compiler from scratch

Posted: Thu Jan 23, 2014 9:13 pm
by Brendan
Hi,
Owen wrote:
Brendan wrote:
skeen wrote:Why would it break with strings? - The idea of a container is mainly that you store the data, and is able to retrieve it later. Whether I hand you back an int, or a string doesn't matter a lot.
Why wouldn't it break with strings? For integers, something like "if(400 < 100)" makes sense. For strings, something like "if( "400 BC" < 100)" has several different (incompatible) possibilities for what is/isn't considered correct behaviour (is the string less than 100 characters, is the year 400 BC before the year 100 AD, does the pointer point to a virtual address that's lower than 100, etc).
Spoken like a person who hasn't done the research.
Spoken like someone that has no idea what was being talked about in the first place. We were talking about a template that was written for integers only, with an unspecified purpose and unknown number of parameters, that someone decided to use for strings.
Owen wrote:Firstly, I'm going to assume you meant if("400 BC" < "100"), because anything else makes no sense (How did you manage to pass an int to a method that takes a string?)
Bad assumption. This would've originally been something like "if(x < 100)" until someone decided they can use the template for strings too and ended up with 'x' being the string "400 BC".
Owen wrote:In C++, the third template argument is the type of a comparator.
Wrong. The third template argument is whatever whoever wrote the template felt using (assuming it exists in the first place, which we can't).
Owen wrote:There is a sense in the C++ community that, yes, template programming and more so meta programming are "advanced guru" level stuff. Template programming (especially container classes) isn't too bad, meta programming is pretty complex. There is work under way to simplify matters, for example a feature which has made it into C++14 (if memory serves) is that you can declare function parameters with type "auto" and the compiler will infer a template, and there is desire to simplify matters.

But C++ can never be simple. The feature which made it a success (C compatibility) has many, many implications on its' ability to be simple. Plus, it has done lots to harm the image of C++ (lots of bad C++ codebases come from C developers treating it as 'just' extended C).
It's worse than that - new versions of C++ have to be compatible with C and all previous versions of C++.

Also note that (for this topic in general) we're talking about writing/designing a new language. C++ is just used an example of things you wouldn't want in a new language.
Owen wrote:Its' one of the reasons, IMO, that Rust has been smart by going for new syntax. It makes it feel different, which means you're prepared for the different way things work.

I think you could learn quite a bit from Rust. Its' owned and borrowed pointers plug the memory safety holes in your language (Though implementing them is quite a lot of work, as I gather...)
I don't know Rust. I took a look at a google selected Rust pointer guide and all I can imagine is everyone using the wrong type of pointer all the time. Also note that Rust uses GC, where I don't.


Cheers,

Brendan

Re: Self hosting OS/Compiler from scratch

Posted: Sat Jan 25, 2014 9:23 am
by Owen
Brendan wrote:
Owen wrote:Spoken like a person who hasn't done the research.
Spoken like someone that has no idea what was being talked about in the first place. We were talking about a template that was written for integers only, with an unspecified purpose and unknown number of parameters, that someone decided to use for strings.
So lets say it wasn't written using a comparator parameter.

It still works because strings define operator< (Come up with a good reason for not allowing overloading of that...)
Brendan wrote:
Owen wrote:Firstly, I'm going to assume you meant if("400 BC" < "100"), because anything else makes no sense (How did you manage to pass an int to a method that takes a string?)
Bad assumption. This would've originally been something like "if(x < 100)" until someone decided they can use the template for strings too and ended up with 'x' being the string "400 BC".
And why exactly was there a constant 100 in there?
Brendan wrote:
Owen wrote:In C++, the third template argument is the type of a comparator.
Wrong. The third template argument is whatever whoever wrote the template felt using (assuming it exists in the first place, which we can't).
OK: The third template argument is traditionally a comparator. Happy?
Brendan wrote:
Owen wrote:There is a sense in the C++ community that, yes, template programming and more so meta programming are "advanced guru" level stuff. Template programming (especially container classes) isn't too bad, meta programming is pretty complex. There is work under way to simplify matters, for example a feature which has made it into C++14 (if memory serves) is that you can declare function parameters with type "auto" and the compiler will infer a template, and there is desire to simplify matters.

But C++ can never be simple. The feature which made it a success (C compatibility) has many, many implications on its' ability to be simple. Plus, it has done lots to harm the image of C++ (lots of bad C++ codebases come from C developers treating it as 'just' extended C).
It's worse than that - new versions of C++ have to be compatible with C and all previous versions of C++.

Also note that (for this topic in general) we're talking about writing/designing a new language. C++ is just used an example of things you wouldn't want in a new language.
Actually, bits get deprecated and dropped.
Brendan wrote:
Owen wrote:Its' one of the reasons, IMO, that Rust has been smart by going for new syntax. It makes it feel different, which means you're prepared for the different way things work.

I think you could learn quite a bit from Rust. Its' owned and borrowed pointers plug the memory safety holes in your language (Though implementing them is quite a lot of work, as I gather...)
I don't know Rust. I took a look at a google selected Rust pointer guide and all I can imagine is everyone using the wrong type of pointer all the time. Also note that Rust uses GC, where I don't.
As has already been stated:
  • Rust does not have GC (Obsolete guide is obsolete)
  • "Everybody using the wrong pointer all the time" isn't possible (your code won't compile)

Re: Self hosting OS/Compiler from scratch

Posted: Sat Jan 25, 2014 12:41 pm
by Brendan
Hi,
Owen wrote:
Brendan wrote:Spoken like someone that has no idea what was being talked about in the first place. We were talking about a template that was written for integers only, with an unspecified purpose and unknown number of parameters, that someone decided to use for strings.
So lets say it wasn't written using a comparator parameter.

It still works because strings define operator< (Come up with a good reason for not allowing overloading of that...)
What does the '<' operator do for strings? Maybe the original (integer) version was doing "if(x < 100)" because they're printing "Customer ref number: %u" on a fixed width label and numbers with more than 2 digits caused line wrap and need to be shifted to the next line. Maybe the original (integer) version was doing "if(x < y)" to decide if 'x' should be before or after something else; and you want strings containing names to be in alphabetical order, strings containing years to be in chronological order and strings containing CPU names to be in order of when they were released.

The original idea was that templates are easy, because you can just convert a function that used one specific data type into a template that works with any data type simply by changing the data type to T (without caring about anything else). Integer and string were only silly examples - my actual sentence used "(.e.g) integer" and "(e.g.) string", but perhaps that wasn't enough to prevent people focusing on an irrelevant throw-away example and ignoring the main point (which was that you can't convert functions to templates and assume the template will actually work for other data types). Maybe the original function was designed for integers and now someone is using the resulting template for some "arbitrary precision rational number" type. Maybe the original function used bool and now someone is using the template with a complex structure or class. Maybe you're lucky and it will work for all data types, but maybe you weren't lucky and it won't. You can't just assume it will work.
Owen wrote:
Brendan wrote:
Owen wrote:Firstly, I'm going to assume you meant if("400 BC" < "100"), because anything else makes no sense (How did you manage to pass an int to a method that takes a string?)
Bad assumption. This would've originally been something like "if(x < 100)" until someone decided they can use the template for strings too and ended up with 'x' being the string "400 BC".
And why exactly was there a constant 100 in there?
The constant 100 is there because it happened to make sense for the original "integer only" function, before someone decided to convert it into a template by only changing the "int" data type to "T" (and not changing the constant 100).
Owen wrote:
Brendan wrote:
Owen wrote:Its' one of the reasons, IMO, that Rust has been smart by going for new syntax. It makes it feel different, which means you're prepared for the different way things work.

I think you could learn quite a bit from Rust. Its' owned and borrowed pointers plug the memory safety holes in your language (Though implementing them is quite a lot of work, as I gather...)
I don't know Rust. I took a look at a google selected Rust pointer guide and all I can imagine is everyone using the wrong type of pointer all the time. Also note that Rust uses GC, where I don't.
As has already been stated:
  • Rust does not have GC (Obsolete guide is obsolete)
  • "Everybody using the wrong pointer all the time" isn't possible (your code won't compile)
A language that was only released 2 years ago has obsolete guides already, because fundamental feature (memory management) was changed?

All I can imagine is everyone using the wrong type of pointer all the time (and getting compile time errors all the time).


Cheers,

Brendan

Re: Self hosting OS/Compiler from scratch

Posted: Sat Jan 25, 2014 2:49 pm
by Owen
Brendan wrote:
Owen wrote:
Brendan wrote:Spoken like someone that has no idea what was being talked about in the first place. We were talking about a template that was written for integers only, with an unspecified purpose and unknown number of parameters, that someone decided to use for strings.
So lets say it wasn't written using a comparator parameter.

It still works because strings define operator< (Come up with a good reason for not allowing overloading of that...)
What does the '<' operator do for strings? Maybe the original (integer) version was doing "if(x < 100)" because they're printing "Customer ref number: %u" on a fixed width label and numbers with more than 2 digits caused line wrap and need to be shifted to the next line. Maybe the original (integer) version was doing "if(x < y)" to decide if 'x' should be before or after something else; and you want strings containing names to be in alphabetical order, strings containing years to be in chronological order and strings containing CPU names to be in order of when they were released.

The original idea was that templates are easy, because you can just convert a function that used one specific data type into a template that works with any data type simply by changing the data type to T (without caring about anything else). Integer and string were only silly examples - my actual sentence used "(.e.g) integer" and "(e.g.) string", but perhaps that wasn't enough to prevent people focusing on an irrelevant throw-away example and ignoring the main point (which was that you can't convert functions to templates and assume the template will actually work for other data types). Maybe the original function was designed for integers and now someone is using the resulting template for some "arbitrary precision rational number" type. Maybe the original function used bool and now someone is using the template with a complex structure or class. Maybe you're lucky and it will work for all data types, but maybe you weren't lucky and it won't. You can't just assume it will work.
Your original example was a data structure, at least as far as I could tell, an associative map

Obviously you can't just take any arbitrary function and make it templated.

Seriously, whats the point in arguing if you're going to just go and turn the argument into an arbitrary straw-man?
Brendan wrote:
Owen wrote:
Brendan wrote:I don't know Rust. I took a look at a google selected Rust pointer guide and all I can imagine is everyone using the wrong type of pointer all the time. Also note that Rust uses GC, where I don't.
As has already been stated:
  • Rust does not have GC (Obsolete guide is obsolete)
  • "Everybody using the wrong pointer all the time" isn't possible (your code won't compile)
A language that was only released 2 years ago has obsolete guides already, because fundamental feature (memory management) was changed?

All I can imagine is everyone using the wrong type of pointer all the time (and getting compile time errors all the time).
A language which appeared with version number 0.1 2 years ago, and an explicit warning that everything was still subject to change. That has maintained that warning till recently (Now the changes being made are smaller and more incremental, dedicated to finalizing the language for a v1.0 with stable syntax)

And no, something fundamental hasn't changed. GC was always optional, its' just been made more so by moving it into a library

Well, now everything isn't still subject to change. The GC is gone* (As of the latest release - but note that it was deprecated in the previous release)

(* Actually its there, but disabled by default, because its replacement - the Gc<T> type - uses it internally for now)

Rust has two types of pointers - Owned (I.E. delete when this pointer's lifetime ends) and borrowed (Intrinsically limited in lifetime to the lifetime of the object they were borrowed from - that is, if you borrow from an owned pointer, the compiler will statically verify that the borrow cannot outlive the owned pointer's lifetime)

Well, it has a third - unsafe - which have to be used inside unsafe blocks and exist so you can write various bits of plumbing (the std::Rc<T> reference counted type is implemented using unsafe pointers, for example), but that one comes with the proviso of be careful.

Re: Self hosting OS/Compiler from scratch

Posted: Sat Jan 25, 2014 3:48 pm
by Brendan
Hi,
Owen wrote:
Brendan wrote:
Owen wrote:It still works because strings define operator< (Come up with a good reason for not allowing overloading of that...)
What does the '<' operator do for strings? Maybe the original (integer) version was doing "if(x < 100)" because they're printing "Customer ref number: %u" on a fixed width label and numbers with more than 2 digits caused line wrap and need to be shifted to the next line. Maybe the original (integer) version was doing "if(x < y)" to decide if 'x' should be before or after something else; and you want strings containing names to be in alphabetical order, strings containing years to be in chronological order and strings containing CPU names to be in order of when they were released.

The original idea was that templates are easy, because you can just convert a function that used one specific data type into a template that works with any data type simply by changing the data type to T (without caring about anything else). Integer and string were only silly examples - my actual sentence used "(.e.g) integer" and "(e.g.) string", but perhaps that wasn't enough to prevent people focusing on an irrelevant throw-away example and ignoring the main point (which was that you can't convert functions to templates and assume the template will actually work for other data types). Maybe the original function was designed for integers and now someone is using the resulting template for some "arbitrary precision rational number" type. Maybe the original function used bool and now someone is using the template with a complex structure or class. Maybe you're lucky and it will work for all data types, but maybe you weren't lucky and it won't. You can't just assume it will work.
Your original example was a data structure, at least as far as I could tell, an associative map

Obviously you can't just take any arbitrary function and make it templated.

Seriously, whats the point in arguing if you're going to just go and turn the argument into an arbitrary straw-man?
This part of the conversation began with skeen saying this:

"However I do think you're exaggerating, on how advanced it is to write templates; for containers it's mostly replacing your payload data type with T. For template meta-programming, then yes, things may seem somewhat advances, but this is mainly because you leave OOP and procedural programming behind, and venture into the world of functional programming (which is truly a challenge if you never been there before).'

Skeen might have been right for containers, but not all templates are containers. I was more interested in "all templates" rather than some restricted subset, and replied with this:

"If all you do is replace your payload data type with T, then you've probably ended up with a template that works with some data types (e.g. integers) and fails with others (e.g. strings); and while you may think that was easy some other sucker that tries to use it only to find it doesn't work probably won't think it's so easy."

Now; skeen was talking about containers, but never mentioned any specific type of container (and certainly didn't mention associative maps). My comment was mostly about templates in general (rather than just one use of templates) and didn't mention any containers (and certainly didn't mention associative maps). For the entire topic (not just this part of the conversation) I'm quite sure that nobody has mentioned "associative map" anywhere at all.

Basically; you didn't know what you were responding to and now (by saying "Obviously you can't just take any arbitrary function and make it templated") you're agreeing with what I was saying in the first place. Of course facts are inconvenient, so you accuse me of making a straw-man while at the same time pulling "associative maps" out of some undisclosed magic orifice. Nice job.


Cheers,

Brendan

Re: Self hosting OS/Compiler from scratch

Posted: Sat Jan 25, 2014 5:02 pm
by Kevin
Brendan wrote:"If all you do is replace your payload data type with T, then you've probably ended up with a template that works with some data types (e.g. integers) and fails with others (e.g. strings); and while you may think that was easy some other sucker that tries to use it only to find it doesn't work probably won't think it's so easy."
The solution is that you tell the compiler what the operations are that the generic function/class/whatever requires, and when you specialise it, you tell it what the implementation of this operation is.

Some source says more than a thousand words, so here we go (it has become a bit lengthy, but I wanted to keep it a complete compilable test program and use both integers, string and something a bit more interesting for the third example):

Code: Select all

with Ada.Text_IO;
with Ada.Integer_Text_IO;

use Ada.Text_IO;
use Ada.Integer_Text_IO;

procedure test is

    generic
        type T is private;
        type TArray is Array(Natural range <>) of T;
        with function ">" (a: T; b: T) return boolean is <>;
    procedure sort(x: in out TArray);

    procedure sort(x: in out TArray) is
        tmp: T;
    begin
        for i in reverse x'first .. x'last loop
            for j in 0 .. i - 1 loop
                if (x(j) > x(j + 1)) then
                    tmp := x(j);
                    x(j) := x(j + 1);
                    x(j + 1) := tmp;
                end if;
            end loop;
        end loop;
    end;


    -- Instantiate it for integers (using the normal > operator)
    type IntArray is Array (Natural range <>) of Integer;
    procedure IntegerSort is new sort(T => Integer, TArray => IntArray);

    x: IntArray := (1337, 20, 11, 5, 13);

    -- Instantiate it for strings (using the normal > operator)
    subtype MyString is String(1..3);
    type StrArray is Array(Natural range <>) of MyString;
    procedure StringSort is new sort(T => MyString, TArray => StrArray);

    y: StrArray := ("foo", "bar", "baz");

    -- Now some fun with structs that are sorted by the sum of their fields
    -- Not using operator overloading here to make Brendan happy.
    type TestStruct is record
        a, b: Integer;
    end record;

    function StructCompare(a, b: in TestStruct) return Boolean is
    begin
        return a.a + a.b > b.a + b.b;
    end;

    type StructArray is Array(Natural range <>) of TestStruct;
    procedure StructSort is new sort(T => TestStruct, TArray => StructArray,
                                     ">" => StructCompare);

    z: StructArray := ((a => 12, b => 7),
                       (a => 5,  b => 1),
                       (a => 42, b => 17));

begin
    IntegerSort(x);
    for i in x'range loop
        Put(x(i), 0);
        New_Line;
    end loop;

    StringSort(y);
    for i in y'range loop
        Put_Line(y(i));
    end loop;

    StructSort(z);
    for i in z'range loop
        Put(z(i).a, 0);
        Put(", ");
        Put(z(i).b, 0);
        New_Line;
    end loop;
end;
This is generics, and in my book it isn't code generation any more than the use of procedures instead of copying code is code generation (in the end it all generates assembly, but that's not really relevant for me as a user). I also can't see what is so bad, complex or error prone about it. In fact, it reduces complexity by having only one copy of the sort algorithm, and the moment someone cares enough to fix it to use something more efficient, all cases will take advantage from it at once.

So I consider this a valuable language feature.

Re: Self hosting OS/Compiler from scratch

Posted: Sat Jan 25, 2014 7:01 pm
by Brendan
Hi,
Kevin wrote:
Brendan wrote:"If all you do is replace your payload data type with T, then you've probably ended up with a template that works with some data types (e.g. integers) and fails with others (e.g. strings); and while you may think that was easy some other sucker that tries to use it only to find it doesn't work probably won't think it's so easy."
The solution is that you tell the compiler what the operations are that the generic function/class/whatever requires, and when you specialise it, you tell it what the implementation of this operation is.

Some source says more than a thousand words, so here we go (it has become a bit lengthy, but I wanted to keep it a complete compilable test program and use both integers, string and something a bit more interesting for the third example):
I don't know if this is confusing because it's in an "alien" language that few people understand (Ada?), or if it's confusing because it uses generics, or both. I suspect both; but I also assume that I would have been able to understand it (despite not having a clue about the language) if you didn't use generics.
Kevin wrote:This is generics, and in my book it isn't code generation any more than the use of procedures instead of copying code is code generation (in the end it all generates assembly, but that's not really relevant for me as a user).
As far as I'm concerned, the compiler has to figure out how each generic is being used, generate functions for each usage, then compile the auto-generated functions. That sound a lot like code generation to me.
Kevin wrote:I also can't see what is so bad, complex or error prone about it. In fact, it reduces complexity by having only one copy of the sort algorithm, and the moment someone cares enough to fix it to use something more efficient, all cases will take advantage from it at once.
Nonsense. Your "only one copy" ends up being multiple specialisations (duplication) and/or operator overloading (more mess) and/or relying on function pointers (that point to different pieces of code and therefore doesn't avoid duplication).

The moment anyone tries to optimise it for one specific case they break it or make it worse for other different cases. For example, the "sorting an array" example you've posted is fine for small arrays but bad for huge arrays (twice as much memory), but you can't just change it to something like quicksort (where a second array isn't needed) without breaking the semantics and breaking existing code (that expects the source array to be unmodified). In the same way, you can't use threads to make it faster for huge arrays without making it slower for small arrays.


Cheers,

Brendan

Re: Self hosting OS/Compiler from scratch

Posted: Sun Jan 26, 2014 3:24 am
by Kevin
Brendan wrote:I don't know if this is confusing because it's in an "alien" language that few people understand (Ada?), or if it's confusing because it uses generics, or both. I suspect both; but I also assume that I would have been able to understand it (despite not having a clue about the language) if you didn't use generics.
Yes, the choice of an "alien" language was intended. We're discussing the usefulness of programming language features in general, and I felt this discussion somehow got a wrong focus on C++. And seeing something new can't hurt, right? Anyone around here, especially those as clever as you, should be able to understand the program.

So if you claim that you can't understand the code (I think it's more likely you don't want to) and suspect that it's because of generics, let me repeat the lines that are related to generics, so that you can tell me which part in specific you find confusing. Perhaps we can find a better solution of it for our imaginary programming language then.

The first couple of lines related to generics is the declaration of the generic procedure:

Code: Select all

    generic
        type T is private;
        type TArray is Array(Natural range <>) of T;
        with function ">" (a: T; b: T) return boolean is <>;
    procedure sort(x: in out TArray);
The rest are the specialisation, let me pick a random one:

Code: Select all

    procedure StructSort is new sort(T => TestStruct, TArray => StructArray,
                                     ">" => StructCompare);
So, which of the both (or even both?) are confusing to you?
As far as I'm concerned, the compiler has to figure out how each generic is being used, generate functions for each usage, then compile the auto-generated functions. That sound a lot like code generation to me.
Well, yes, like I said, obviously compilers do generate code. That's not really news.

I guess the difference to make is between "stupid", for example text-based, code generation like with C macros and more intelligent code generation that fits in the type system, is explicit about required operations etc. The code generation with this one is comparable to the code generation with inline functions. Yes, it ends up as multiple copies in the binary, but in the source code, there is only one copy to be maintained. The complexity in the source code is what a programmer has to worry about (and in the binary it isn't worse than your individual copy-and-paste copies).
Nonsense. Your "only one copy" ends up being multiple specialisations (duplication) and/or operator overloading (more mess) and/or relying on function pointers (that point to different pieces of code and therefore doesn't avoid duplication).
How does the resulting machine code affect the complexity for the programmer?
The moment anyone tries to optimise it for one specific case they break it or make it worse for other different cases. For example, the "sorting an array" example you've posted is fine for small arrays but bad for huge arrays (twice as much memory), but you can't just change it to something like quicksort (where a second array isn't needed) without breaking the semantics and breaking existing code (that expects the source array to be unmodified). In the same way, you can't use threads to make it faster for huge arrays without making it slower for small arrays.
My code sorts in-place, so the first part of this isn't applicable to my example. But yes, there may be changes that change the interface. Then you need to change the code that uses it. You do the same when you change the parameter list of a function. But once you did that (and the compiler reminds you of them if you forget one), they all get the improvement.

And if I ever notice that I have some cases where threaded sorting would be so much better, I would implement threaded sorting and change my specialisation for this case. I can still keep the other implementation for cases where single-threaded is faster.

Generics aren't about one-size-fits-all solutions to any thinkable requirement, but about reusing code where the requirements are the same. Really, compare it to the kind of abstraction that functions provide. You could use exactly the same pseudo-arguments against functions. You always get exactly the same behaviour and can't adapt it to the specific requirements, the compiler may choose to inline them and generate code for umpteen copies, causing insane complexity, it is error-prone because you might pass parameters for which the function doesn't work etc.

Re: Self hosting OS/Compiler from scratch

Posted: Sun Jan 26, 2014 5:09 pm
by Brendan
Hi,
Kevin wrote:
Brendan wrote:I don't know if this is confusing because it's in an "alien" language that few people understand (Ada?), or if it's confusing because it uses generics, or both. I suspect both; but I also assume that I would have been able to understand it (despite not having a clue about the language) if you didn't use generics.
Yes, the choice of an "alien" language was intended. We're discussing the usefulness of programming language features in general, and I felt this discussion somehow got a wrong focus on C++. And seeing something new can't hurt, right? Anyone around here, especially those as clever as you, should be able to understand the program.

So if you claim that you can't understand the code (I think it's more likely you don't want to) and suspect that it's because of generics, let me repeat the lines that are related to generics, so that you can tell me which part in specific you find confusing. Perhaps we can find a better solution of it for our imaginary programming language then.
You're right - I didn't understand it because I didn't want to. I probably could've wasted several days learning Ada, or wasted several hours only learning the pieces of Ada you've used, and after wasting more time than it's worth I could've understood your code, but I chose not to.

Code: Select all

        type T is private;
Why do you need to bother? Is there any way that something outside the template can access the template's local stuff?

Code: Select all

        type TArray is Array(Natural range <>) of T;
What is a "Natural range" and how does it differ from an unnatural range? What is "<>" supposed to be, and is it setting limits for the range like "values in this range are not less than (unspecified) and not greater than (unspecified)" where you could have specified something like "8<>12" to say the values range between 8 and 12, but didn't?

Code: Select all

        with function ">" (a: T; b: T) return boolean is <>;
What are "a:" and "b:" here (I guess they aren't floppy disk drives but they weren't defined or used before)? I still don't understand the "<>" part (does this "<>" have the same meaning/purpose as the previous one?).

Code: Select all

    procedure sort(x: in out TArray);
What is the difference between a procedure and a function (isn't a procedure a function that doesn't return one value)? We've got the "x:" again - are you defining 'x' as an "in out TArray"? What is an "in out Tarray" (is it an array that can be read and written to)?

Code: Select all

    procedure StructSort is new sort(T => TestStruct, TArray => StructArray,
                                     ">" => StructCompare);
I recognise the "new" keyword - it allocates memory for a new object. Can I assume this code is allocating a new object called StructSort, then passing a bunch of boolean values to the new object's constructor? Why do you care if "T" is greater than or equal to "TestStruct" anyway?
Kevin wrote:
As far as I'm concerned, the compiler has to figure out how each generic is being used, generate functions for each usage, then compile the auto-generated functions. That sound a lot like code generation to me.
Well, yes, like I said, obviously compilers do generate code. That's not really news.
Yes, and like I said, this isn't a case of "compiler generates code" it's a case of "compiler generates functions that generate code". For the purpose of finding/fixing bugs; you should be able to tell the compiler to "expand generics only", so that you can examine the resulting functions it generated (in the same way that most compilers can be asked to pre-process only, so you can examine the code after pre-processing).
Kevin wrote:I guess the difference to make is between "stupid", for example text-based, code generation like with C macros and more intelligent code generation that fits in the type system, is explicit about required operations etc. The code generation with this one is comparable to the code generation with inline functions. Yes, it ends up as multiple copies in the binary, but in the source code, there is only one copy to be maintained. The complexity in the source code is what a programmer has to worry about (and in the binary it isn't worse than your individual copy-and-paste copies).
Exactly - generics are like a preprocessor macro that create functions, except that (unlike preprocessor macros) it doesn't break type checking completely. Of course it still breaks type checking partially - e.g. accidentally passing the wrong type of arguments will cause a new function to be generated instead of a compile-time error.
Kevin wrote:
Nonsense. Your "only one copy" ends up being multiple specialisations (duplication) and/or operator overloading (more mess) and/or relying on function pointers (that point to different pieces of code and therefore doesn't avoid duplication).
How does the resulting machine code affect the complexity for the programmer?
Who said anything about machine code? I was talking about the source code the programmer wrote. For example, for your Ada sorting example, how many specialisations did you have to write? To avoid that you could've used a function pointer to a comparator (I hear that's a popular third argument) and then wrote 3 different comparators; or you could've overloaded the "less than" operator and used 3 different pieces of code for comparisons that way.

Of course different types of data are different, and therefore some duplication can't be avoided. Because you can't avoid having some different code for different types anyway; instead of having a nice easily understood function for sorting strings you end up with a piece of common sorting code in one place, plus another piece somewhere else in a specialisation and/or other piece/s somewhere else in operator overloading and/or other piece/s somewhere else used via. function pointer.

By not using generics, you end up with more code, but all the code is simple and in one place. For example, for your Ada sorting, you could've had 3 separate functions for sorting (one for integers, one for strings and one for structures) and each of them would've been simple.

It's like comparing 3 little kittens (no generics) to several "pieces of kitten" splattered around 20 meter "hand grenade blast" radius that add up to 2 kittens. Sure there's less total kitten, but they aren't happy kittens that your friends can play with.
Kevin wrote:
The moment anyone tries to optimise it for one specific case they break it or make it worse for other different cases. For example, the "sorting an array" example you've posted is fine for small arrays but bad for huge arrays (twice as much memory), but you can't just change it to something like quicksort (where a second array isn't needed) without breaking the semantics and breaking existing code (that expects the source array to be unmodified). In the same way, you can't use threads to make it faster for huge arrays without making it slower for small arrays.
My code sorts in-place, so the first part of this isn't applicable to my example. But yes, there may be changes that change the interface. Then you need to change the code that uses it. You do the same when you change the parameter list of a function. But once you did that (and the compiler reminds you of them if you forget one), they all get the improvement.

And if I ever notice that I have some cases where threaded sorting would be so much better, I would implement threaded sorting and change my specialisation for this case. I can still keep the other implementation for cases where single-threaded is faster.
So instead of having one "sort" specialisation to sort strings, you've got 2 "sort" specialisations that are both used for strings (e.g. one for large arrays of strings, and one for small arrays of strings)? I'd be surprised if that was possible - I would've assumed you'd need to create a whole new "sort_threaded" generic thing (with a bunch of new specialisations) so the compiler could figure out which sort you want in each case.
Kevin wrote:Generics aren't about one-size-fits-all solutions to any thinkable requirement, but about reusing code where the requirements are the same. Really, compare it to the kind of abstraction that functions provide. You could use exactly the same pseudo-arguments against functions. You always get exactly the same behaviour and can't adapt it to the specific requirements, the compiler may choose to inline them and generate code for umpteen copies, causing insane complexity, it is error-prone because you might pass parameters for which the function doesn't work etc.
Think of it in terms of "levels of complexity". Something like "0*5+1" is very simple for people to deal with. Something like "x*5+1" is a little harder for people because now they have to think about ranges of values rather than just one set of values (e.g. they need to think about whether some value/s of 'x' can cause an overflow). Converting that into a function makes it slightly more complex because now they have to care about all possible values of 'x' (rather than only caring about their own values of 'x'). Converting that into a generic/template increases complexity again - instead of caring about all possible values of 'x' they have to care about all possible values for all possible types of 'x'.

Of course you aren't looking at language complexity at all - you're only looking at "amount of typing". For some reason you think less code makes things simpler. It's like comparing 1 million lines of simple code to 1 thousand lines of impossible to understand code that does the exact same thing, and deciding that 1 thousand is less than 1 million and therefore "impossible to understand" is simpler. It's a completely wrong and broken way to look a language complexity.

Now, if you assume generics makes things 3 times as complicated and also makes things use 3 times less code; then the question is whether the amount of code is more or less important than complexity. Amount of code is a trivial to deal with and isn't much of a problem at all; while language complexity is a significant (and growing) problem that is magnified by the complexity of the software being written.

Basically; amount of code isn't very important while language complexity is; so anything that reduces the amount of code and increases language complexity is bad idea (unless you're deliberately trying to make it hard for people to write code - e.g. to increase university enrolments or increase your own job security).


Cheers,

Brendan

Re: Self hosting OS/Compiler from scratch

Posted: Mon Jan 27, 2014 3:14 am
by Combuster
It appears people don't want to understand each other here. I'm not sure if Owen and co is bad at explaining, or Brendan is being unnecessarily thick or something in between - after all he writes everything in assembly because he seems to be comfortable with anything else.

Templates and generics are only ways to add functionality without writing more code - and many an implementation is basically nothing but type-safe #defines.

Given an object of type List. What kind of objects does it contain? You don't know because I didn't tell you anything else.
if we explicitly write List<Person>, we get:
- We know on sight the list is supposed to contain people, reducing lookup time and chances of bugs.
- The compiler can enforce that only people can be added to the list, and that anything else returns an error. This also reduces the chance of bugs significantly.

Assembly can never achieve the second advantage due to being untyped, the first can only be achieved with apt naming and coding discipline (where most programmers tend to be big failures at that).

C can achieve both, but only if individual functions are created for each type. The only way to achieve this is to create a series of macros that expand to the implementation, then invoke those implementation macros for each type you want to use that way. Because errors are printed on the line of the macro invocation, you'll find it really hard to isolate the bug from the error context - have fun finding your syntax error in half a page of code.

in Java, you can use actual generic syntax, which means you get the errors for the implementation on the exact line you wrote them, and the errors for the invocation where you actually call the generic. And because you explicitly used an generic, the compiler can split error reporting at the boundary between call and implementation, and gives the user an explicit error that it is mixing types when it does so, instead of having to pass that type all the way through the macro to see if it works the way you think it does - where the macro in question might have expanded to something that looks acceptable to the compiler when it isn't. And also a feature in Java, a generic costs space exactly once, not for each type you throw at it.

In all three languages, you would write about the same amount of code, but with generics (as in Java) the language prevents more bugs, and gives the most accurate error messages.

Therefore, generics are useful.


And if you wish to stick to a language without (or with less safe) generics for various other reasons, that's power to you.

Re: Self hosting OS/Compiler from scratch

Posted: Mon Jan 27, 2014 5:12 am
by Kevin
Brendan wrote:You're right - I didn't understand it because I didn't want to. I probably could've wasted several days learning Ada, or wasted several hours only learning the pieces of Ada you've used, and after wasting more time than it's worth I could've understood your code, but I chose not to.
Well, I see two options. Either you could understand the code if you wanted without spending hours of research on Ada, and you're just playing the fool to escape a discussion specifically about generics. Then you're not really interested in the discussion and further attempts to have it are futile.

Or you really aren't capable of reading this code. I would expect of anyone who takes part in a discussion about programming languages in general that he has seen a few languages (probably including at least one Pascal-style language) and can recognise patterns like argument lists. If you're not capable of this, you're not really qualified for the topic, so further attempts to have a meaningful discussion are futile under this assumption as well.


But because I can't resist feeding trolls, here's a heavily commented version (that you probably still won't want to understand).

Code: Select all

    -- The following block describes the types and functions that must be provided to specialise the procedure
    generic
        -- T can be any type
        type T is private;
        -- TArray must be an array with elements of the same type
        -- Its indices can be any range of numbers within the type Natural
        type TArray is Array(Natural range <>) of T;
        -- An operator > must exist that takes two operands a and b of type T
        -- and that returns a boolean.
        with function ">" (a: T; b: T) return boolean is <>;
    -- This defines the generic procedure, named "sort" that takes a TArray argument
    -- that it can read from and write to (i.e. passed by reference), and that doesn't have
    -- a return value (which is the difference from a function).
    procedure sort(x: in out TArray);

Code: Select all

    -- StructSort is the specialisation of the generic procedure "sort", with the following
    -- specialisations that were required by the declaration of the generic procedure:
    -- * The generic type T is specialised as the concrete type TestStruct
    -- * The generic type TArray is specialised as the concrede type StructArray
    -- * The operation ">" is implemented by the StructCompare function
    -- All of these match the type requirements made in the generic block of "sort"
    procedure StructSort is new sort(T => TestStruct, TArray => StructArray,
                                     ">" => StructCompare);
For the purpose of finding/fixing bugs; you should be able to tell the compiler to "expand generics only", so that you can examine the resulting functions it generated (in the same way that most compilers can be asked to pre-process only, so you can examine the code after pre-processing).
Most compilers don't have an "expand function calls only" mode either. Who said that the compiler has to generate three instances of the sort procedure? Why can't it just statically check the types and then use the same code internally for all three instances, with a function pointer for the compare operation?

Generics are not stupid code generation/macro expansion, they are an abstraction element of programming languages like functions. Abstractions generally make it harder to see what's going on under the hood, but they do reduce complexity on the source code level of the high level language.
Exactly - generics are like a preprocessor macro that create functions, except that (unlike preprocessor macros) it doesn't break type checking completely. Of course it still breaks type checking partially - e.g. accidentally passing the wrong type of arguments will cause a new function to be generated instead of a compile-time error.
Care to give an example? Assume we have another type BrendanStruct, how are you going to accidentally pass it to the wrong function or generate a new function, where a compile time error should have occured? Feel free to use non-Ada pseudo code as long as you keep the fundamental mode of operation intact.
For example, for your Ada sorting example, how many specialisations did you have to write? To avoid that you could've used a function pointer to a comparator (I hear that's a popular third argument) and then wrote 3 different comparators; or you could've overloaded the "less than" operator and used 3 different pieces of code for comparisons that way.
A specialisation is a declaration "I want to use function X with type T under the name Y". That's hardly comparable to writing an implementation.

I did use three different comparators, because obviously there isn't a single working comparator for integers, string and TestStructs. But I didn't have to tinker around with function pointers. I could indeed have used a function pointer as an argument to sort() instead of requiring the operator ">" in the generic block, and thanks to generics it would even be a type-safe function pointer, but why would I want to?

In order to get fully rid of generics, I wouldn't only have to remove the comparator from the generic section, but also the type T. And then I wouldn't have type safety any more.

Look, this is sorting without generics, C does it:

Code: Select all

void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));
It uses untyped pointers, passes the element size and a function pointer for the comparison explicitly on each invokation, whereas generics move it to the specialisation. (It also passes the array size because C has, I guess you would say less complex, arrays that don't know their size.) Do you really think that having the complexity on each caller, without any type checks, would make the code better?
By not using generics, you end up with more code, but all the code is simple and in one place.
No. You end up with a string sorting code all in one place. And you end up with integer sorting code in one place. And with struct sorting code in one place. But certainly not with all sorting code in one place.

In my book, have three implementations for sorting (especially once it gets a bit cleverer than bubble sort) add to more complexity than having one implementation for sorting and three trivial specialisation declarations for different data types.
I would've assumed you'd need to create a whole new "sort_threaded" generic thing (with a bunch of new specialisations) so the compiler could figure out which sort you want in each case.
That's what I meant. I would implement sort_threaded() and then specialise that one for StrSort() instead of the old sort().
Think of it in terms of "levels of complexity". Something like "0*5+1" is very simple for people to deal with. Something like "x*5+1" is a little harder for people because now they have to think about ranges of values rather than just one set of values (e.g. they need to think about whether some value/s of 'x' can cause an overflow). Converting that into a function makes it slightly more complex because now they have to care about all possible values of 'x' (rather than only caring about their own values of 'x'). Converting that into a generic/template increases complexity again - instead of caring about all possible values of 'x' they have to care about all possible values for all possible types of 'x'.
Indeed. So once you've got rid of generics, the next thing to get rid of are functions, right? And then counting loops, which could just be unrolled. Sequential code is so much easier to follow, right? Or let's just get rid of all jumps and loops, because Turing completeness is overrated anyway. Let's make all programs "0*5+1".
Of course you aren't looking at language complexity at all - you're only looking at "amount of typing". For some reason you think less code makes things simpler.
I think abstraction can simplify things. You seem to think that abstraction is always uselessly added complexity.
Now, if you assume generics makes things 3 times as complicated and also makes things use 3 times less code
Then your assumption is wrong. Generics avoid code duplication and having to manage code duplication is an important reason why a project gets complex.
Basically; amount of code isn't very important while language complexity is; so anything that reduces the amount of code and increases language complexity is bad idea (unless you're deliberately trying to make it hard for people to write code - e.g. to increase university enrolments or increase your own job security).
So you're arguing against functions once again...