OSDev.org

Posted: **Thu Apr 28, 2016 5:36 pm**

Hello, AltaLang'ers!

I have a problem, that I managed to solve with huge effort, but with 2-2.5x performance degradation. So, I guess, I should take a step back and re-think if I'm solving the right problem.

Currently I have a bunch of built-in operators, which are overloaded: you can use '+' with int's and float's, use can use '&' with int's and bool's, you can assign value of any type into a reference to that type, etc.

I've started with a bunch of if's then refactored that into a table that contains all possible signatures - with the built-in types number of signatures was finite.

That was a temporary solution as I'm planning to implement int's of different capacity as generics. But, for now, I don't have generics, so I just limited it to commonly used capacities - 8, 16, 32 and 64 bits.

But now I'm implementing tuples - first used defined types. There are two operations on tuples - assignment and member access. Member access is handled separately, because it involves member lookup. But assignment feels like a normal infix operation on two expressions.

For tuples there two overloaded versions of assignment operator: normal one assigns entire tuple into reference to tuple; and unpacking version assigns tuple into tuple of references.

So, I've tried to switch from specifying exact types in the operator table, to using type patterns. So, when checking if certain operator can be used here, I need to find a type that matches generic type pattern and is a supertype of the argument types.

I've ended up with patterns that consist of:
Variables - X
Constants - anything handled by the resolver in the opaque way
Lists - zero or more expressions
For-All expressions - lists of unknown size with elements of the same structure

Lists are used to represent arrays and product types (structures) of compiler data structures used to describe types.
For modelling sum types (tagged unions), I'm using 2-element lists where first element is a tag and a second one is a payload.

Type resolver is trying to solve a statement built out of
* Relation - binary relation on two expressions. Resolver has a built-in relation of equality and a table of user-defined relations, subtyping in particular
* And/Or - containers that require either all/any of their elements to be true
* For-All statement - a statement about elements of the For-All expression

With this representation, here are my subtyping rules:

Code: Select all

// Any type is a subtype of itself, for tuples and functions there are distinct rules
T != Tuple, T != Function ⊢ (T, X) <: (T, X)

// This rule enables automatic de-referencing of L-values
T != Ref, (T, X) <: (U, Y) ⊢ (Ref, (T, X)) <: (U, Y)

// Bottom type is a subtype of any type. 'break', 'continue', 'return', 'throw' have bottom type
⊢ (Bottom, ()) <: (T, X)

// Tuple is a subtype of other tuple, if they have identical field names and type of each field in the first tuple is a subtype of the type of the field in the second tuple
[T <: U for N, T, U in c] ⊢ (Tuple, [(N, T) for N, T, U in c]) <: (Tuple, [(N, U) for N, T, U in c])

// Functions result types are covariant, argument types are contravariant
Q <: R, Y <: X ⊢ (Function, (X, R)) <: (Function, (Y, Q))

Resolving algorithm:
1. Normalize input statement into DNF and store a list of alternatives
2. While list of alternatives is not empty - take one alternative and try to solve
3. Processing alternative may fail - no solutions for this alternative, get solved and produce a solution, or spawn new alternatives
4. Add newly spawned alternatives to the list and repeat

For each alternative, statements maintained sorted and are processed in the following order:
1. First equality relations
1.1. First equalitiy of constant to constant, list or forall - this fails bad alternatives early
1.2. Equality of variable to something
1.3. Equality of two lists
1.4. Other equality
2. User defined relations
2.1. Relations over expressions with constant prefixes - they have minimum possible alternatives
2.2. Other relations
3. For-All statements - never processed directly, should be expanded when matching corresponding for-all expression

User defined relations are processed by comparing their elements with conclusions of the rules and get replaced with Or of the premises of the matched rules. This is the source of forking of the alternatives.

First implementation of the resolver slowed-down my tests 10x, after several optimizations it is now about 2-2.5x. In particular, for real inputs resolving does not fork alternatives at all, and all rules are matched by O(1) lookup in hashtable indexed by constant prefixes of the related items. Around 70% of resolving time is spent on variable substitution and about 10% - on normalizing statements.

Probably, I could come up with more ideas for optimization and bring the number down to 1.5-2x - but that's still a lot. That means that matching operators takes as much time as parsing and the rest of semantical analysis altogether. That's very frustrating.

Should I just stop worring about this and continue developing new features on top? Or should I proceed with optimizing? Or is there a way to do things in a dramatically simpler way?

Posted: **Sat Apr 30, 2016 4:24 am**

kjam wrote:So, I've tried to switch from specifying exact types in the operator table, to using type patterns. So, when checking if certain operator can be used here, I need to find a type that matches generic type pattern and is a supertype of the argument types.

It seems you haven't split the problem into a set of manageable sub-problems. Your post contains a lot of words about everything. It's very hard to understand what you really want.

There's a claim about performance hit (in comparison to what?) due to "something" was done (what part of your description is the "something"?) and some ramblings about your algorithms (which of them is related to the problem?). Also you pay almost no attention to term definitions. What is type pattern? Why certain operator "can be used here" instead of actually used in parsed text? What means the sentence - "a type that matches generic type pattern and is a supertype of the argument types"? What the examples of your language mean? What language it is? And so on.

I can advice you to use a bit less specific language. For example - instead of describing your specific types of your specific unknown (for me) language it is possible to define the problem as a graph processing algorithm question. Use only well known types like vertices and edges and describe your problem as a goal for an algorithm, operating on a graph, that you can define rigorously. Then many people will be able to understand you. While now it seems even you yourself can't understand the whole thing that is crying about the need for simplification.

Posted: **Sat Apr 30, 2016 3:17 pm**

Thanks for the answer, I'll try to explain better.

When adding new feature to my compiler, I've created a regression in performance - all the features that worked just fine before, now work slower. And I'm feeling frustrated because of this situation. That's the root problem to solve - I'm feeling frustrated. So, I'm perfectly open for non-technical advices, it is more a philosophical question.

So, first of all, am I worrying about the right thing? How do you define that something is too slow? Should I maybe accept it as OKish? All my tests in debug mode still pass in less then a minute - should not be an inconvenience for development. I'm more worried that accepting one performance regression after another, I will end up having a slowpoke monster.

And if it is worth to worry about, should I:
* Proceed with optimizing solution - I've spent already about 2-3 months on this feature, and I don't want to stuck here forever.
* Postpone optimizing and re-think after having more features in the language - nominal types with subclassing, in particular.
* Change language design to simplify the problem - C++ has a overloading resolution algorithm that is fast, but sometimes not very user-friendly.

Posted: **Sat Apr 30, 2016 3:50 pm**

Speaking about the last option.

For C++, in this example LHS and RHS of the assignment operator have type bar &. Signature for the built-in assignment operator is template<class T> T & (*)(T&, T). For built-in assignment operator C++ compiler has no problems in deducing [T=bar] and casting RHS from bar & to bar.

Code: Select all

void foo() {
    struct bar {} x, y;
    x = y; // <--
}

But in this example, while having a similar situation, compiler produces an error:

Code: Select all

template<bool isSigned, unsigned capacity> class Int {};
template<unsigned capacity> class Float {};
template<class T> class Ref : public T {};

template<class T>
void assign(Ref<T>*, T*);

void bar() {
    Ref<Int<true, 32>> x;
    Ref<Int<true, 32>> y;
    assign(&x, &y); // <--
}

In my language I'm considering reference to T to be a subtype of T - this allows using mutable variables in any context where expression is expected. So the first case is actually a custom case of the second one. And my algorithm is generic enough to crunch both cases.

Posted: **Sat Apr 30, 2016 11:45 pm**

kjam wrote:So, first of all, am I worrying about the right thing? How do you define that something is too slow? Should I maybe accept it as OKish? All my tests in debug mode still pass in less then a minute - should not be an inconvenience for development. I'm more worried that accepting one performance regression after another, I will end up having a slowpoke monster.

Measure, measure, measure. If you aren't measuring, you don't care (enough) about performance.

Please don't mistake what I am saying. If you don't care about performance, that might be fine! It all depends on your project, and its reach. If you do care about performance and/or need to care, you need to measure. If you aren't sure what to measure or how to measure, let us know. If you don't actually care or need to care, let us know you were just "venting" (which is also fine, but might be better in the auto-delete forum)

- Monk

Posted: **Sun May 01, 2016 5:48 am**

kjam wrote:When adding new feature to my compiler, I've created a regression in performance - all the features that worked just fine before, now work slower. And I'm feeling frustrated because of this situation. That's the root problem to solve - I'm feeling frustrated. So, I'm perfectly open for non-technical advices, it is more a philosophical question.

Yes, the philosophy is required

But at least it is possible to separate some values for assessment. Every value can be represented on an axis and the set of axes defines multi-dimensional space. First it is required to extract each axis from your situation. As I see it there are generic axes for:

Compiled program performance
Compiler performance
Developer performance #1 (as a user of the compiler)
Developer performance #2 (as a compiler developer)

First two values can be measured easily in plain numbers. Second two values are a bit philosophical because in addition to the time taken to develop something (compiler or a program to compile) a few human feelings are involved in the process. If it is possible to develop something in the same time, but in one case it is a tedious task and in another the task is more interesting, then we should augment the time taken with the measure of the human's enjoinment during the process. It's obvious that the last part here is highly discussable and error prone.

However, in your particular case all measurements haven't been performed yet. There is the compilation time and the case where your compiler makes developer's life easier in comparison to the C++ case. And that's all metrics at hand. If I was you my choice would be to perform all required measuring. The multi-dimensional result won't be the answer to your question, but it can give you additional useful information to play with. Next, if I was you and have already performed the measuring then I could look at the another issue - the weight of each axis. In an imaginary situation where the weight is defined in a satisfying manner the rest is very simple - just calculate the distance from the point zero to the resulting point in the multi-dimensional space. So, we are now face the issue of satisfying weight definition. Here's the philosophy begins

But at least one can play with different weights and construct a picture of the measured universe in mind, which can be of great help in performing the choice about what axis is more important and what is less.

As a compiler developer you can look at it from two sides - from the side of your efforts and from the side of the perfection (when compiler user's efforts are minimized, for example). What side to come along with is up to you, of course, but unless your efforts are just plain business (paid for) the question "what for I have spent all my efforts?" can help to move you towards the perfection side. Next the time constraints will limit your movement and you'll find a somewhat "optimal" situation.

And about philosophical part of the calculations. If there's just one situation when your compiler makes user's life easier while another compiler don't - it's easy to decide about the value for the particular axis. But usually there are more complex situations at hand when your approach increases something while also decreases something else in comparison to the another approach. One way to assess such situation is to ask many people about it. And you can do it here, for example.

kjam wrote:And if it is worth to worry about, should I:
* Proceed with optimizing solution - I've spent already about 2-3 months on this feature, and I don't want to stuck here forever.
* Postpone optimizing and re-think after having more features in the language - nominal types with subclassing, in particular.
* Change language design to simplify the problem - C++ has a overloading resolution algorithm that is fast, but sometimes not very user-friendly.

To rethink the problem is the good choice. But it is often impossible until you collect enough additional information. So, mostly you have no choice except to proceed somehow forward and wait until the information volume can be processed with somewhat better results.

Posted: **Tue May 03, 2016 3:00 pm**

Perhaps you should reconsider the language design.

In my case, i made operator overloads with precise types.
When each operator is processed, there is a check against all visible operator overloads for a match for every parameter type.
Type conversion is not performed, minor details like integer sizes are resolved on the code generation level.
This have been sufficient and simple enough to implement.

Then again, i consider operator overloading a reductionistic feature, not general confusion feature.
The former would be something to use to implement dynamic strings, for example - instead of having them as a compiler-inbuilt feature you can have them as a library feature, making the compiler simpler.
While a general confusion feature would be something to be used in everyday programming, i.e. to implement shortcuts for function calls. After a hiatus you'll have no idea what each of the operators do.

A good litmus test on the language and code quality is how easy it is to understand what the code does after a long pause.
If you can come back to a project a year later with a bright idea and be able to start implementing that idea almost at once, then it's a good code.
If instead you feel despair as you read through and poke the thing trying to figure out what does what and what gets broken, then it's a bad code.

A good language shouldn't be an enabler for bad code.

Posted: **Wed Jun 01, 2016 1:54 pm**

Meanwhile, simply due to lack of better ideas, I was optimizing and optimizing code. I've ended up having about 8% performance degradation (performance of semantical analyzer test suite), and even allowed myself to do some refactoring at the cost of performance, having +15% in the end. Starting from +900% that's pretty cool result.

Lot's of optimizations were done in the common code, so partially this result is not not exactly because of optimizing type resolving itself. Anyway, I'm feeling comfortable enough to stop at this point and proceed with implementing a new features.

After finishing current feature, I'm planning to update licensing info and open sources.

Artlav, where can I read more details about your language? Do you have compiler sources open?

Posted: **Fri Jun 17, 2016 4:41 pm**

kjam wrote:Artlav, where can I read more details about your language? Do you have compiler sources open?

Not exactly.
The language itself is a cleaned up Pascal dialect, with various relics/Borland residue removed.
It's a purely for my own use kind of tool, initially designed as a scripting language, and eventually evolved to compile into javascript and C.
Since that and it being written in itself there is almost no point to share it, even though it could be compiled by FPC with the right switches and tongue angle.

OSDev.org

[CompilerDev] Resolving operator overloading

[CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading

Re: [CompilerDev] Resolving operator overloading