CVS difference for ai12s/ai12-0242-1.txt

Differences between 1.1 and version 1.2
Log of other versions for file ai12s/ai12-0242-1.txt

--- ai12s/ai12-0242-1.txt	2018/01/11 06:57:10	1.1
+++ ai12s/ai12-0242-1.txt	2018/01/16 06:55:09	1.2
@@ -2898,18 +2898,18 @@
 From: Edmond Schonberg
 Sent: Saturday, January 6, 2018  10:40 AM
 
-I think that the introduction of an attribute ‘Reduce is so far the clearest
+I think that the introduction of an attribute 'Reduce is so far the clearest
 choice: it makes the syntax clear without the need for a new keyword, and it
 avoid the ambiguities /  obscuritiies that Randy has found in previous
 proposals. Using an attribute also allows for a compact notation when applied to
-a composite that supports iteration:   V’Reduce (“+”, 0)  is clear and compact
+a composite that supports iteration:   V'Reduce ("+", 0)  is clear and compact
 (almost like +/V !).  But the prefix can be any iterated_component association :
 
-   (for J in V’range => V (J) ** 2)’Reduce (F, V0)
+   (for J in V'range => V (J) ** 2)'Reduce (F, V0)
 
        which can also be written as:
 
-   (for E of V => E ** 2)’Reduce (F. 0).
+   (for E of V => E ** 2)'Reduce (F. 0).
 
 The first argument of the attribute must be s combiner function or operator,
 i.e. a binary operation with operands and result of the same type.  Being able
@@ -2917,3 +2917,2117 @@
 natural generalization.  Randy, is this any more palatable?
 
 ****************************************************************
+
+From: Brad Moore
+Sent: Saturday, January 6, 2018  11:45 AM
+
+>>> "Integer*Min(5, {for I in 1..10 => I})" I am not sure I even understood.
+>>> Is this a strange way of spelling "1"? I presume that the I in the
+>>> iteration range of the example is a typo, no? If not, what does it
+>>> mean?
+>>> Sales_Price : constant Integer := Integer'Min(5, (for I in I .. 10
+>>> => I));
+>
+>> The * between Integer and Min is a typo. It should be ' , since 'Min
+>> is the attribute.
+>
+> Sorry, my typo. I understood that part. But you did not answer my
+> question(s). What is the outcome? and was the I in the range
+> intentional and I simply did not understand the idea?
+
+The idea was that Integer'Min was to be treated as the combiner function, and
+the curly braces indicated the values to be fed into the combiner function as
+the second parameter.
+
+As a combiner function, the other parameter shows the initial value (5) to be
+passed into the first parameter, but for subsequent iterations, the result of
+the previous calls to Integer'Min are passed back into Integer'Min for the first
+parameter instead. The more I look at that example, the more I see your
+confusion, and the more I dislike that whole proposal. Its way too subtle and
+non-intuitive.
+
+Integer'Min(5, {for I in 1 .. 10 => I});
+
+Compare to;
+
+(for I in 1 .. 10 => I)'Reduce(Integer'Min, 5)
+
+As per Ed's latest suggested tweaks, which to me make it even clearer and
+simpler. I agree with Ed and I think the 'Reduce attribute idea is a clear
+winner over anything we've discussed before.
+
+I will take a stab at updating the AI to reflect this latest idea.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Saturday, January 6, 2018  12:57 PM
+
+I like Ed's refinement of Brad's 'Reduce proposal as well.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Saturday, January 6, 2018  2:37 PM
+
+> The first argument of the attribute must be s combiner function or
+> operator, i.e. a binary operation with operands and result of the same
+> type.
+
+Can we relax this restriction a bit and say that that its a binary operation,
+and the type of the 2nd argument of the attribute must match at least one of the
+arguments of the binary operation as well as the result type of the binary
+operation. The type of the values being reduced must match the type of the other
+parameter of the binary operation.
+
+This would then allow reductions such as concatenation where the result type can
+be an array type, but the elements being concatentated are the elements of the
+array.
+
+e.g.
+
+(for Letter in 'A' .. 'Z' => Letter)'Reduce("&", "")
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Saturday, January 6, 2018  4:17 PM
+
+Agreed, we should support the case where we have Vector op Component => Vector.
+
+****************************************************************
+
+From: Erhard Ploederder
+Sent: Saturday, January 6, 2018  4:08 PM
+
+>    (for J in V’range => V (J) ** 2)’Reduce (F, V0)
+>
+>        which can also be written as:
+>
+>    (for E of V => E ** 2)’Reduce (F. 0).
+>
+> The first argument of the attribute must be s combiner function or
+> operator, i.e. a binary operation with operands and result
+> of the same type.  Being able to have an expression on the prefix that
+> yields an itterable construct is also
+> a natural generalization.  Randy, is this any more palatable?
+
+
+I find this model and syntax a lot better.
+
+Semantic question: In the case of
+ (for E of V => g(E))'Reduce(f, 0)
+where g has a side effect, what will the semantics say about the guaranteed
+sequence of g and f calls?
+
+1) all g calls before all f calls ?  Then you lost all ability to stream the
+   values to and through the Reduce.
+
+2) g, f, g, f, ... ?
+
+3) (almost) anything goes
+
+?
+------------------------
+
+1) is the semantics that you undoubtedly would be guaranteed to get if the
+   prefix is to be treated as a normal array aggregate. (=> need for a different
+   syntax, if you do not want that - and I sure do not want this canonical
+   semantics)
+
+3) is necessary if parallelization of g calls is supposed to play here.
+
+2) maybe as a selectable option? It clearly makes implemention easier, since no
+   buffer for results of g calls is needed, as is in case 3) and it still avoids
+   storing the intermediate array.
+
+--------------------
+
+****************************************************************
+
+From: Edmond Schonberg
+Sent: Saturday, January 6, 2018  4:40 PM
+
+Given the interest in parallizability, I think the choice must be 3),  i.e.
+unspecified order. If this is to be parallelizable, g better be side-effect free
+anyway.
+
+****************************************************************
+
+From: Edmond Schonberg
+Sent: Saturday, January 6, 2018  4:47 PM
+
+> Agreed, we should support the case where we have Vector op Component =>
+> Vector.
+
+I still think that this is an abuse of notation. It is amusing that it may fall
+out of the proposed semantics of reduction, but a construct that builds an array
+from scalar components does not look like its reducing anything:-)! We have
+plenty of constructs for aggregates already, hiding this one in this new
+notation is redundant (not to mention the awkward implementation it requires).
+So I’d rather retain the consistent type requirement. (not a deal breaker
+obviously, but also simpler to describe).
+
+****************************************************************
+
+From: Brad Moore
+Sent: Saturday, January 6, 2018  5:05 PM
+
+A good point.
+
+Another advantage of this restriction would be that we would not need the
+'Reducer aspect for parallelism, since the combiner function would then always a
+reducer function. (i.e. The function used to combine results from independent
+tasklets).
+
+We would still need the Identity aspect, but it would simplify the parallelism
+part of this AI, and reduce the number of aspects needed to be placed on various
+library calls in the RM.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Saturday, January 6, 2018  8:44 PM
+
+I have a few questions.
+
+1) I presume we want 'Reduce to work on multidimensional arrays?
+
+   Example from RM:
+
+  G : constant Matrix :=
+    (for I in 1 .. 4 =>
+       (for J in 1 .. 4 =>
+          (if I=J then 1.0 else 0.0))); -- Identity matrix
+
+  Sum : constant Float := G'Reduce("+", 0.0);
+
+  Reason, seems like a useful thing to be able to do, and easy enough to understand and implement.
+
+
+2) I presume we are probably *not* interested in defining a version of 'Reduce
+   that operates on a single dimension of a multidimentional array?
+
+  e.g.
+  Sum_1 : constant Float := G'Reduce(1, "+", 0.0);
+
+  Where 1 represents the first dimension, similar to 'Length(1)
+
+  Reason, while this could be implemented probably about as easy at the other
+  version of 'Reduce, I suspect there might not be enough need for this to
+  justify. We could always add this later, if needed.
+
+
+3) Its not clear to me whether we'd want to be able to apply 'Reduce to
+   aggregate expressions. I don't see a reason not to, and I suspect it would be
+   allowed based on Ed's outline of the rules.
+
+That is, I would think one would be able to write something like?
+
+  Sum : constant Float :=
+     Matrix'(for I in 1 .. 4 =>
+               (for J in 1 .. 4 =>
+                  (if I=J then 1.0 else 0.0)))'Reduce("+", 0.0);
+
+4) And if the answer to 3 is yes, then I presume it would also work for other
+   forms of array aggregates, not just ones involving
+   iterated_component_associations?
+
+eg
+
+Sum : constant Float := Matrix'(1 => (1.1, 1.2, 1.3), 2 => (2.1, 2.2, 2.3))'Reduce("+", 0.0);
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Sunday, January 7, 2018  12:42 AM
+
+I have been followin this thread with mixed feelings...
+
+Although I see the benefit of the attribute, I don't like having:
+(very_complicated_expression_spanning_several_lines)'Reduce()
+
+I would suggest doing it the other way round:
+For every function F with appropriate profile (TBD), there is a function
+F'Reduce that operates on arrays of Elems, with the following semantics:
+1) if the array is empty, Constraint_Error is raised
+2) if the array has one element, the element is returned
+3) otherwise, the result is the successive application of F to elements in an
+   arbitrary order (needs better definition, it's just to give the idea).
+
+This has the following benefits:
+- Easy to read and understand
+- No need for Identity
+- For parallelism, easy to specify the reduction function as
+  an aspect Reduction => F'Reduce
+
+Some examples:
+
+Old:
+(for E of V => E ** 2)’Reduce (F. 0)
+
+Proposed:
+F'Reduce ((for E of V => E ** 2))
+
+Old:
+(for I in 1 .. 10 => I)'Reduce(Integer'Min, 5)
+
+Proposed:
+Integer'Min'Reduce ((5, for I in 1..10 => I))
+
+
+Old:
+(for Letter in 'A' .. 'Z' => Letter)'Reduce("&", "")
+
+Proposed:
+"&"'Reduce ((for Letter in 'A' .. 'Z' => Letter))
+
+****************************************************************
+
+From: Peter Chapin
+Sent: Sunday, January 7, 2018  9:17 AM
+
+Hello!
+
+I'm just an observer here, for now, so I feel some reluctance to jump in with
+too many comments. I feel like I'm still gathering background information. ??
+That said, I did want to say a few things about the debate regarding reduction
+expressions...
+
+I have extensive background with the functional language Scala and I've worked
+with OCaml and Haskell as well. I can appreciate that Ada is not, at its core, a
+functional language and that making it into one is fraught with potential
+difficulties. On the other hand, there are two forces at work that are pushing
+Ada to grow more functional features: contracts and now parallelism. Contracts
+benefit from powerful expressions, and parallelism benefits from pure
+expressions, both things that functional languages are good at. So it makes
+sense to me to borrow ideas from the functional world to support these things in
+Ada. The trick is doing so in a reasonable (for Ada) way!
+
+In Scala, at least, there is a distinction between "reduce" and "fold." Folds
+take an initial value, can be applied to an empty sequence, and use a combiner
+function that might have parameters of different types (for example when folding
+a sequence of characters into a string). In contrast, reductions can't be
+applied (without exception) to an empty sequence and use a combiner function
+taking parameters of the same type as the sequence elements. The AI talks about
+"reducer functions" and "reduction expressions" which I found confusing at first
+since the "reduction expression" is really a fold in the functional sense. I'm
+not sure if this matters to anything, but speaking as someone coming from the
+functional community I had to read the AI twice before I de-confused myself on
+this point.
+
+Regarding syntax... I realize we aren't talking about adding general lambdas to
+Ada, but does it make sense to choose a syntax that wouldn't prohibit such an
+addition in the future?
+
+I see a pretty similarity between:
+
+(1 .. 10)'Reduce(3 + <>)
+
+and the Scala equivalent:
+
+(1 to 10).fold(3)(_ + _)
+
+Scala allows methods with multiple parameter lists, and for obscure technical
+reasons (not one of Scala's better features) related to type inference, the
+initial value needs to be put into its own parameter list. So the syntax
+logically is trying to be:
+
+(1 to 10).fold(3, _ + _ )
+
+where the weird looking "_ + _" is Scala's highly abbreviated way of writing a
+lambda. The underscores are placeholders for the unnamed function arguments,
+similar to the <> notation in the proposed 'Reduce syntax. Spelling out the
+anonymous combiner function longhand gives:
+
+(1 to 10).fold(3, (x: Int, y: Int) => x + y) )
+
+I guess that's all I have to say for now. I don't have a specific suggestion or
+feeling at the moment about what is best for Ada here. I'm still absorbing the
+very interesting discussion. I did want to throw in my support for the idea of,
+in some fashion, adding folds, er... reduction expressions, to Ada. Certainly
+they are widely used in languages that currently have them and they would enrich
+Ada's expression sublanguage considerably.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Sunday, January 7, 2018  10:30 AM
+
+...
+> I would suggest doing it the other way round:
+> For every function F with appropriate profile (TBD), there is a
+> function F'Reduce that operates on arrays of Elems, with the following
+> semantics:
+
+I believe it is a serious mistake to have to create an array to do a reduction.
+There are many cases when you want to do a reduction over a series of values
+that are produced in various ways.  Limiting it to arrays means you end up
+having to figure out how to create the array, and furthermore, you need to know
+in advance how big an array to allocate, or undergo the dynamic allocation
+implicit in a growable vector.
+
+Using a generalized iterator (including a filter of some sort) is significantly
+more flexible and potentially hugely more time and space efficient.
+
+****************************************************************
+
+From: Erhard Ploedereder
+Sent: Sunday, January 7, 2018  12:02 PM
+
+> I believe it is a serious mistake to have to create an array to do a
+> reduction.
+
+Indeed. Argument to the Reduction should be a Generator (which, as a very
+special case, can be an array iterator).
+
+****************************************************************
+
+From: Brad Moore
+Sent: Sunday, January 7, 2018  12:05 PM
+
+> I have been followin this thread with mixed feelings...
+>
+> Although I see the benefit of the attribute, I don't like having:
+> (very_complicated_expression_spanning_several_lines)'Reduce()
+>
+> I would suggest doing it the other way round:
+> For every function F with appropriate profile (TBD), there is a
+> function F'Reduce that operates on arrays of Elems, with the following semantics:
+> 1) if the array is empty, Constraint_Error is raised
+
+I don't think we want the user to have to worry about handling such constraint
+errors. That's why we have the initial_value. You get that result if you are
+trying to apply 'Reduce to a null array, or an empty
+iterated_component_association. Having to deal with exceptions makes it more
+difficult to use with contracts for example.
+
+> 2) if the array has one element, the element is returned
+> 3) otherwise, the result is the successive application of F to
+> elements in an arbitrary order (needs better definition, it's just to give
+> the idea).
+>
+> This has the following benefits:
+> - Easy to read and understand
+> - No need for Identity
+> - For parallelism, easy to specify the reduction function as an
+>   aspect Reduction => F'Reduce
+
+I think we can (and probably should) get rid of the 'Identity aspect with either
+proposal. I don't think it is needed. For tasklets where Initial_Value is not
+being applied, the initial value can be the value of the first item. If there
+are more than one item per tasklet, the Reducer function can be applied to
+combine the first and subsequent results.
+
+Another criticism of this idea is that it would significantly expand our concept
+of what an attribute is.
+
+With the previous 'Reduce, the arguments of the attribute translate to arguments
+of a function call, which we have already with other attributes in the language.
+
+With your suggestion, the contents of this attribute are an expression, which
+needs to be parsed, which is something we don't have yet in the language.
+
+> Some examples:
+>
+> Old:
+> (for E of V => E ** 2)’Reduce (F. 0)
+>
+> Proposed:
+> F'Reduce ((for E of V => E ** 2))
+>
+> Old:
+> (for I in 1 .. 10 => I)'Reduce(Integer'Min, 5)
+>
+> Proposed:
+> Integer'Min'Reduce ((5, for I in 1..10 => I))
+
+I also find the expression (5, for I in 1..10 => I) strange. It doesn't seem to
+mean anything by itself.
+
+Only in the context of outside its enclosing parens as part of one particular
+attribute.
+
+With the 'old' syntax, there are no meaningless expressions.
+
+Otherwise your suggestion appears to be mainly a cosmetic one, of just moving
+things around. I can see your reasoning for doing so, but I don't think can be
+justified given these criticisms.
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Sunday, January 7, 2018  12:20 PM
+
+> I believe it is a serious mistake to have to create an array to do a
+> reduction.  There are many cases when you want to do a reduction over
+> a series of values that are produced in various ways.  Limiting it to
+> arrays means you end up having to figure out how to create the array,
+> and furthermore, you need to know in advance how big an array to
+> allocate, or undergo the dynamic allocation implicit in a growable
+> vector.
+>
+> Using a generalized iterator (including a filter of some sort) is
+> significantly more flexible and potentially hugely more time and space
+> efficient.
+
+Indeed. This proposal does not intend to really create an array. It's 100% "as
+if".
+
+I was not explicit about this, but since F'Reduce would be 100% in the
+compiler's realm, it could be defined as having convention => intrinsic and
+aspect Inline. Since it deals only with elements, it could be considered an
+optimization that F is simply called in line on all the elements without needing
+to construct an actual array.
+
+OTOH, a compiler would be /allowed/ to construct an array if it is more
+convenient or efficient in some cases.
+
+I think it would be easier to understand and explain with this model.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Sunday, January 7, 2018  1:00 PM
+
+Your proposed syntax does not feel like an improvement from a readability point
+of view, and seems to muddy the waters significantly semantically, in my view.
+
+Personally, I prefer the order where the iterator comes first, so you know what
+you are talking about, and then the reduction operation.  There are various
+syntaxes that have been proposed that have that ordering.  I happen to still
+prefer the one that ParaSail uses, but the one with:
+
+   (iterator)'Reduce(func, params)
+
+is second best, in my view.
+
+I think we should not eliminate the 'Identity aspect, and in fact might want
+additional aspects such as 'Commutative and 'Associative, but the basic
+semantics of the reduction expression should not depend on having any of these
+aspects specified.  Specifying them can provide more ways to optimize things,
+but without them, I would argue the semantics should be very simple, and not
+require the operation to be commutative, and perhaps not even associative
+(though that is a bit more stringent).
+
+The semantics of "(iterator)'Reduce(func, initial_val) would be simply:
+
+    tmp := initial_val;
+    while more in (iterator) loop
+        tmp := func (tmp, <next-val-of-(iterator)>);
+    end loop;
+    return tmp;
+
+Parallelization would be optional, and could use properties of "func" as defined
+by various aspects (e.g. Identity, Reducer, Commutative, Associative, etc.), or
+built-in knowledge in case of language-defined operators, to achieve maximum
+speedup.
+
+Overload resolution for "func" would expect a function with a profile of
+
+    function func (Left : <initial_val'Type>; Right : <iterator-element'Type>) return <initial_val'Type>;
+
+For the first version of this, we could require that <initial_val'Type> and
+<iterator-element'Type> be the same, though that doesn't seem necessary, and
+interferes with operations that are, for example, taking individual elements and
+building up a set.  I agree it makes parallelization easier, if you presume func
+has two operands of same type, and is associative, though having an aspect that
+identifies a "Reducer" function that has those properties can provide the same
+advantage.
+
+You might also want to allow a second version of this, that could perhaps use
+the identical syntax, and be chosen when the "Reduce" operation parameter is
+actually a procedure rather than a function:
+
+    (iterator)'Reduce(proc, initial_val)
+
+would expand into
+
+    tmp := initial_val
+    while more in (iterator) loop
+         proc(tmp, <next-val-of-(iterator)>);
+    end loop;
+    return tmp;
+
+This could be significantly more efficient when the reduction is producing a
+more complex data structure, such as a hashed set, or a large matrix.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Sunday, January 7, 2018  1:57 PM
+
+> I think we should not eliminate the 'Identity aspect, and in fact
+> might want additional aspects such as 'Commutative and 'Associative,
+> but the basic semantics of the reduction expression should not depend
+> on having any of these aspects specified.  Specifying them can provide
+> more ways to optimize things, but without them, I would argue the
+> semantics should be very simple, and not require the operation to be
+> commutative, and perhaps not even associative (though that is a bit more
+> stringent).
+
+I was actually thinking of writing the AI to replace the Identity aspect with an
+Associative Boolean aspect. That seems more useful, since parallelism is not
+possible if a reducer function is not associative.
+
+If the reducer function does not have the associative aspect, then'Reduce
+attribute would normally apply sequential processing. On the other hand, if the
+reduction function does have it, then parallel execution can potentially be
+applied by the implementation.
+
+A problem with the Identity aspect is that it is difficult to apply everywhere.
+
+For example, the language defined bitwise "and" of;
+
+type Fingers is mod 5;
+
+is hard to define. You could say it is Fingers'Last, but then "and"ing that with
+3, wouldn't work. You could say it is 7, but then the value is outside the range
+of values for the type, and its use could trigger constraint errors.
+
+Using the logic that Jean-Pierre proposed, you don't have to define this, and
+the 'Reduce attribute works.
+
+I think the Associative aspect is a better replacement for 'Identity, because it
+can easily defined on all subprograms that need it. (And its simpler, being a
+boolean aspect)
+
+I think we should not include the Identity aspect in this proposal unless
+someone can show a valid need for it. I would want to see some sort of
+explanation on how it can improve parallelism or understanding.
+
+As it stands, it just appears to add complexity to the model in an inconsistent
+manner without any real benefit. We can always add it later, if we can find a
+valid use for it.
+
+The associative aspect already seems to have validity. It can be used to prevent
+errors, and help in the decision of whether to use parallelism or not.
+
+A communitative aspect might be useful for future consideration, but my
+experience with Paraffin is that having the implementation assume all reductions
+are non-commutative does not add noticeable overhead to parallelism. So for now,
+I would recommend we leave the Commutative aspect out of the AI.
+
+> Overload resolution for "func" would expect a function with a profile
+> of
+>
+>    function func (Left : <initial_val'Type>; Right : <iterator-element'Type>)
+>    return <initial_val'Type>;
+>
+> For the first version of this, we could require that
+> <initial_val'Type> and <iterator-element'Type> be the same, though
+> that doesn't seem necessary, and interferes with operations that are,
+> for example, taking individual elements and building up a set.  I
+> agree it makes parallelization easier, if you presume func has two
+> operands of same type, and is associative, though having an aspect
+> that identifies a "Reducer" function that has those properties can provide the same advantage.
+
+By first version, I assume you mean first version of this AI/feature.
+
+I agree we can always consider later whether to add a 'Reducer aspect and
+support reducer functions with differing types of parameters, based on feedback
+from users. I think for now assuming all parameters are of the same type
+probably covers a majority of the uses.
+
+> You might also want to allow a second version of this, that could
+> perhaps use the identical syntax, and be chosen when the "Reduce"
+> operation parameter is actually a procedure rather than a function:
+>
+>    (iterator)'Reduce(proc, initial_val)
+>
+> would expand into
+>
+>    tmp := initial_val
+>    while more in (iterator) loop
+>         proc(tmp, <next-val-of-(iterator)>);
+>    end loop;
+>    return tmp;
+
+I agree allowing a procedure for a reducer is useful, and probably worth
+considering for this version of the AI. We could say that it can be a procedure
+that accepts two parameters of the same type, where the first parameter is an in
+out parameter.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Sunday, January 7, 2018  3:24 PM
+
+...
+> I was actually thinking of writing the AI to replace the Identity
+> aspect with an Associative Boolean aspect. That seems more useful,
+> since parallelism is not possible if a reducer function is not associative.
+
+Both are relevant.  It is not an either/or.
+
+> If the reducer function does not have the associative aspect,
+> then'Reduce attribute would normally apply sequential processing. On
+> the other hand, if the reduction function does have it, then parallel
+> execution can potentially be applied by the implementation.
+>
+> A problem with the Identity aspect is that it is difficult to apply
+> everywhere.
+
+That's OK.  If there is no well-defined identity, then so be it.
+
+> For example, the language defined bitwise "and" of;
+>
+> type Fingers is mod 5;
+>
+> is hard to define. You could say it is Fingers'Last, but then "and"ing
+> that with 3, wouldn't work. You could say it is 7, but then the value
+> is outside the range of values for the type, and its use could trigger
+> constraint errors.
+
+Right.  These non-power-of-2 modular types are weird in various ways, and
+certainly don't have well-defined bit-wise boolean operators.  So these
+operators wouldn't have a well-defined Identity aspect, which is fine, since
+no-one in their right mind would use bit-wise boolean operators to reduce a
+bunch of non-power-of-2 modular values.  But clearly addition and multiplication
+have well-defined Identity aspects, even for such oddball modular types.  No
+need to throw out the baby with the non-power-of-2 modular type bathwater!
+
+> Using the logic that Jean-Pierre proposed, you don't have to define
+> this, and the 'Reduce attribute works.
+>
+> I think the Associative aspect is a better replacement for 'Identity,
+> because it can easily defined on all subprograms that need it. (And
+> its simpler, being a boolean aspect)
+
+There might still be reasons to define an Identity aspect, as it might be useful
+for some purposes.  I don't see this as one "replaces" the other.  They should
+all be optional, and only used for certain optimizations.
+
+> I think we should not include the Identity aspect in this proposal
+> unless someone can show a valid need for it. I would want to see some
+> sort of explanation on how it can improve parallelism or understanding.
+
+Well, "normal" reductions use identity quite a bit, but it seems fine to omit
+it.  In general, I would try to separate this proposal into what is needed for
+basic functionality, and what is useful for optimizations and parallelization.
+The aspects should only be needed for optimizations, and never be required for
+basic functionality.  And if we cannot agree on the aspects, or want to leave
+them for a future AI, that seems fine.
+
+>> ...
+>> You might also want to allow a second version of this, that could
+>> perhaps use the identical syntax, and be chosen when the "Reduce"
+>> operation parameter is actually a procedure rather than a function:
+>>
+>>   (iterator)'Reduce(proc, initial_val)
+>>
+>> would expand into
+>>
+>>   tmp := initial_val
+>>   while more in (iterator) loop
+>>        proc(tmp, <next-val-of-(iterator)>);
+>>   end loop;
+>>   return tmp;
+>
+> I agree allowing a procedure for a reducer is useful, and probably
+> worth considering for this version of the AI. We could say that it can
+> be a procedure that accepts two parameters of the same type, where the first
+> parameter is an in out parameter.
+
+Right.  Though again, we could allow the second parameter to be of a different
+type.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Monday, January 8, 2018  3:04 PM
+
+...
+> Regarding syntax... I realize we aren't talking about adding general
+> lambdas to Ada, but does it make sense to choose a syntax that wouldn't
+> prohibit such an addition in the future?
+
+There is a separate AI on lambdas.  The latest syntax proposal for reduction
+expressions (at least last time I checked!) is compatible with using a lambda
+expression for the function name.  E.g.
+
+    (for E of V => E**2)'Reduce("+", 0)  -- sum of squares
+
+could have a lambda expression in place of the "+":
+
+    (for E of V => E**2)'Reduce(<lambda-expression>, 0)
+
+The proposed syntax for lambda expressions is currently defined in:
+
+   http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ai12s/ai12-0190-1.txt?rev=1.5
+
+and uses a syntax like:
+
+   (function (X, Y) return X + Y)
+
+So the above example could be written as:
+
+   (for E of V => E**2)'Reduce( (function (X, Y) return X+Y) , 0)
+
+Obviously not a great example, but you could imagine more interesting functions.
+
+As far as terminology, "fold" and "reduce" are clearly very similar in
+semantics. The annoying thing about "fold" is that there are so many different
+versions of it (foldl, foldl1, foldr, ...).  So we have chosen to use the term
+"reduction" even though in some cases it might be closer to a "fold."
+
+****************************************************************
+
+From: Peter Chapin
+Sent: Monday, January 8, 2018  3:24 PM
+
+Thanks! I'll take a look at that lambdas AI. As for the terminology, I don't
+think it's a big deal. We have to call it something and "fold" is really
+somewhat of an odd term in many ways. "Reduce" has the advantage of sounding
+more like what is going on.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Monday, January 8, 2018  7:17 PM
+
+In private email, Jean-Pierre was questioning the need for an initial value as
+an argument to the 'Reduce attribute, as it could be considered an error for
+attempting to reduce 0 items.
+
+This is something I've struggled with as well.
+
+This got me to thinking that there probably are times when an exception is
+wanted, and times when it is not.
+
+For example,
+
+-- Determine the maximum number of widgets that can be found at a particular
+-- store brand near me. If the answer is 0, because there are no such stores,
+-- or 0 because the stores that do exist are out of stock, I dont care, the
+-- answer is still 0.
+Max_Widgets : constant Integer :=
+   (for Location in Stores when Location.City = Calgary)'Reduce(Integer'Max, 0);
+
+-- On the other hand, there may not always be a clear value to use. Is it clear
+-- that a result of Temperature'Min means that no cities were found, or that
+-- is an actual value for the coldest settlement in Antartica?
+Max_Temp : constant Temperature :=
+   (for City of World
+       when City.Continent = Antartica => City.Max_Temperature)'Reduce(Temperature'Max, Temperature'Min);
+
+A nice feature of the 'Reduce attribute concept, is that we can have more than
+one version of the attribute.
+
+We could also have a version that only accepts one argument, the reducer
+subprogram, and it raises constraint error if there are no items to be reduced.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Monday, January 8, 2018  7:41 PM
+
+> In private email, Jean-Pierre was questioning the need for an initial
+> value as an argument to the 'Reduce attribute, as it could be considered an
+> error for attempting to reduce 0 items.
+> ...
+> A nice feature of the 'Reduce attribute concept, is that we can have more than
+> one version of the attribute.
+>
+> We could also have a version that only accepts one argument, the
+> reducer subprogram, and it raises constraint error if there are no items to be
+> reduced.
+
+I suppose, but I think you are over-thinking this.  It is very rare that you
+really *want* an exception.  Let's keep this simple.  Yes there are a million
+options imaginable, but to me that is where the foldl, foldl1, foldr, foldr1
+madness arose.
+
+A simple pragma Assert can always be inserted if you really want an exception
+under certain conditions.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, January 8, 2018  9:28 PM
+
+> I believe it is a serious mistake to have to create an array to do a
+> reduction.  There are many cases when you want to do a reduction over
+> a series of values that are produced in various ways.  Limiting it to
+> arrays means you end up having to figure out how to create the array,
+> and furthermore, you need to know in advance how big an array to
+> allocate, or undergo the dynamic allocation implicit in a growable
+> vector.
+
+I don't think anyone was suggesting that the array would have to actually
+be materialized. But the prefix of this attribute seems like it has to be
+an array aggregate, as it has the syntax of an array aggregate. Saying that
+the semantics of some construct is determined by 7 characters at the very
+end -- many lines after the start -- seems like the wrong way to go.
+
+So, my view was that the prefix was an object of some appropriate type, and
+there would be a rule that the order of evaluation of the parts of the prefix
+and the 'Reduce itself are unspecified (probably would take a notwithstanding
+rule). I suppose that means that the attribute would have to be able to handle
+an existing array object, but that doesn't seem like a problem.
+
+> Using a generalized iterator (including a filter of some
+> sort) is significantly more flexible and potentially hugely more time
+> and space efficient.
+
+Since container aggregates need to be able to contain generalized iterators, I
+don't see a difference here. Certainly, as-if optimizations are always allowed,
+and we'll have to make the order of evaluation unspecified if any parallelism is
+going to be applied.
+
+---
+
+One note here is that an aggregate standing by itself isn't currently a legal
+attribute prefix. But a qualified aggregate is legal, so I don't think there
+would be any significant issue with allowing an aggregate as a prefix. Probably
+would want to allow any expression so long as it was in parens. (Note that we
+just rejected this idea in terms of Obj'Image, but I guess we're allowed to
+change our mind.)
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, January 8, 2018  10:34 PM
+
+...
+> The semantics of "(iterator)'Reduce(func, initial_val) would be
+> simply:
+
+There's a stray quote above that confused me for a while.
+
+>     tmp := initial_val;
+>     while more in (iterator) loop
+>         tmp := func (tmp, <next-val-of-(iterator)>);
+>     end loop;
+>     return tmp;
+
+Or more conventionally:
+
+    declare
+       tmp : <reducer type> := initial_val;
+    begin
+       for <id> <rest of iterator> loop
+           tmp := func (tmp, <id>);
+       end loop;
+       Result := tmp; -- Or "return tmp" if this is a function.
+    end;
+
+(Here, Tucker's "(iterator)" is expanded a bit to be clearer.)
+
+And this illustrates the problem I have with this whole idea. My goal for the
+Ada 2020 is to have safe-by-construction parallelism features that can be added
+to improve the performance of *existing* Ada code.
+
+Thus, my hope would be that one could add parallelism to a loop simply by adding
+"parallel" to the iterator specification (with possibly a small amount of change
+to that specification). That assumes, of course, that the operations in the body
+can be operated in parallel.
+
+The problem is, of course, that that isn't quite possible for loops that involve
+some sort of result. Just adding "parallel" to the  ought to get a error because
+the use of "tmp" is not task-safe.
+
+The solution to that problem that we're proposing is to completely abandon
+conventional loops and switch to a functional notation. That's going to be hard
+to do for existing code; discovering the reduction function and variable is
+going to be hard in real loops (which are a lot more complex than the above, and
+often have a number of variables to construct a result). It requires turning the
+entire loop inside-out (and often creating helper types and functions, if there
+is more than a single result object).
+
+New code won't be much better; users will be required to think functionally
+about reductions, but Ada is nowhere near a complete functional language (and it
+would be impractical to allow things like general closures, so that isn't likely
+to change), so one will then have to switch mental gears halfway through. It
+sounds like the worst of both worlds to me.
+
+At one point (Pittsburgh), we had a reasonable but unfinished solution that
+looked a whole lot more like standard Ada. My recollection was that the main
+problem was that the accumulator objects could be used as arrays, but I don't
+know why we would make that property visible. (No one needs to access these
+explicitly as arrays).
+
+That would give something like:
+
+    declare
+       tmp : parallel <reducer type> := initial_val;
+    begin
+       parallel for <id> <rest of iterator> loop
+           tmp := func (tmp, <id>);
+       end loop;
+       Result := tmp; -- Or "return tmp" if this is a function.
+    end;
+
+The idea being that "tmp" acts like a set of objects, one per tasklet, inside of
+a parallel construct, but as a single object outside of such constructs. To get
+from the set to the single object, the reducer/identity for the reducer type
+would be used by default, or one could specify them with aspects as needed.
+
+This looks a lot more like conventional Ada code, and certainly would be easier
+to apply to existing loops than a reduction expression. And much less likely to
+require frequent gear-switching in new code.
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Monday, January 8, 2018  11:42 PM
+
+> In private email, Jean-Pierre was questioning the need for an initial
+> value as an argument to the 'Reduce attribute, as it could be considered
+> an error for attempting to reduce 0 items.
+
+Oops, I didn't want to make it private. Thunderbird changed its UI recently, and
+I keep hitting "Reply" instead of "Reply to all"...
+
+For the record, here it is:
+-------------------------------------------
+Le 07/01/2018 à 19:04, Brad Moore a écrit :
+> I don't think we want the user to have to worry about handling such
+> constraint errors. That's why we have the initial_value. You get that
+> result if you are trying to apply 'Reduce to a null array, or an empty
+> iterated_component_association. Having to deal with exceptions makes
+> it more difficult to use with contracts for example.
+
+But isn't reducing 0 values an error? The maximum of 0 elements is certainly not
+Integer'First!
+
+[...]
+> I think we can (and probably should) get rid of the 'Identity aspect
+> with either proposal. I don't think it is needed. For tasklets where
+> Initial_Value is not being applied, the initial value can be the value
+> of the first item. If there are more than one item per tasklet, the
+> Reducer function can be applied to combine the first and subsequent
+> results.
+
+Agreed, that's what I propose.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Thursday, January 11, 2018  1:56 AM
+
+> ...
+>> The semantics of "(iterator)'Reduce(func, initial_val) would be
+>> simply:
+>
+> There's a stray quote above that confused me for a while.
+>
+>>     tmp := initial_val;
+>>     while more in (iterator) loop
+>>         tmp := func (tmp, <next-val-of-(iterator)>);
+>>     end loop;
+>>     return tmp;
+>
+> Or more conventionally:
+>
+>    declare
+>       tmp : <reducer type> := initial_val;
+>    begin
+>       for <id> <rest of iterator> loop
+>           tmp := func (tmp, <id>);
+>       end loop;
+>       Result := tmp; -- Or "return tmp" if this is a function.
+>    end;
+>
+> (Here, Tucker's "(iterator)" is expanded a bit to be clearer.)
+>
+> And this illustrates the problem I have with this whole idea. My goal
+> for the Ada 2020 is to have safe-by-construction parallelism features
+> that can be added to improve the performance of *existing* Ada code.
+>
+> Thus, my hope would be that one could add parallelism to a loop simply
+> by adding "parallel" to the iterator specification (with possibly a
+> small amount of change to that specification). That assumes, of
+> course, that the operations in the body can be operated in parallel.
+>
+> The problem is, of course, that that isn't quite possible for loops
+> that involve some sort of result. Just adding "parallel" to the  ought
+> to get a error because the use of "tmp" is not task-safe.
+>
+> The solution to that problem that we're proposing is to completely
+> abandon conventional loops and switch to a functional notation. That's
+> going to be hard to do for existing code; discovering the reduction
+> function and variable is going to be hard in real loops (which are a
+> lot more complex than the above, and often have a number of variables to
+> construct a result). It requires turning the entire loop inside-out (and
+> often creating helper types and functions, if there is more than a single
+> result object).
+
+I really think in some cases users wont mind deleting an existing loop such as;
+
+declare
+   Sum : Integer := 0;
+begin
+
+   for I in 1 .. 1000 loop
+      Sum := Sum + A(I);
+   end loop
+
+   Put_Line ("Sum of A=" & Integer'Image(Sum); end
+
+and replacing with;
+
+Put_Line ("Sum of A=" & Integer'Image(A'Reduce("+", 0)));
+
+> New code won't be much better; users will be required to think
+> functionally about reductions, but Ada is nowhere near a complete
+> functional language (and it would be impractical to allow things like
+> general closures, so that isn't likely to change), so one will then
+> have to switch mental gears halfway through. It sounds like the worst of
+> both worlds to me.
+>
+> At one point (Pittsburgh), we had a reasonable but unfinished solution
+> that looked a whole lot more like standard Ada. My recollection was
+> that the main problem was that the accumulator objects could be used
+> as arrays, but I don't know why we would make that property visible.
+> (No one needs to access these explicitly as arrays).
+>
+> That would give something like:
+>
+>    declare
+>       tmp : parallel <reducer type> := initial_val;
+>    begin
+>       parallel for <id> <rest of iterator> loop
+>           tmp := func (tmp, <id>);
+>       end loop;
+>       Result := tmp; -- Or "return tmp" if this is a function.
+>    end;
+>
+> The idea being that "tmp" acts like a set of objects, one per tasklet,
+> inside of a parallel construct, but as a single object outside of such
+> constructs. To get from the set to the single object, the
+> reducer/identity for the reducer type would be used by default, or one
+> could specify them with aspects as needed.
+>
+> This looks a lot more like conventional Ada code, and certainly would
+> be easier to apply to existing loops than a reduction expression. And
+> much less likely to require frequent gear-switching in new code.
+
+I can see that having both 'Reduce and parallel loop with reduction capabilities
+would be appreciated by users.
+
+I think there will be times when one or the other is a better choice for a
+particular usage, and a 'Reduce attribute I believe will also have utility
+beyond just parallelism.
+
+What I am finding is that both forms need common support for parallelism.
+
+I think both need subprograms to be annotated with an Associated aspect, as well
+as the Reducer aspect that we had in earlier versions of the AI.
+
+What I'd like to do, is carve out another AI from AI12-00119 that just has
+wording for describing tasklets, executors, the Associated aspect, and Reducer
+aspect. This would also have the changes to the containers and libraries to have
+these attributes added.
+
+
+It looks to me like this will be the most work to nail down in the RM. I think
+the Associated aspect can borrow much of the wording that we currently have for
+the NonBlocking aspect, except change occurrences of NonBlocking for Associated.
+
+Then the Parallel Operations AI can be a lot better focused. The 'Reduce AI I
+think will also be quite a lot simpler to write up, as both can reference the
+tasklet AI.
+
+Sound OK?
+
+By having these in separate AI's we can discuss the pros and cons of each
+feature based on their own merits, without having to look at the common support
+as much, since that is needed for either solution anyways.
+
+And yes, Randy, I will be sure to take your latest version of AI12-00119 as a
+starting point. Thanks for your input.
+
+
+On an aside, I did an experiment today where I wrote a parallel loop that
+calculates 3 results, a sum of a set of values, the min of a set of values, and
+the max of a set of values.
+
+I was curious to see if writing this as three separate loops would be better or
+worse than creating a composite object and writing a reducer function that can
+combine results of this record type.
+
+I found that writing as 3 separate loops was significantly faster than 1 loop.
+
+For a loop that involves 4_000_000_000 iterations, the 3 loop solution took
+11.17 seconds. The single loop took 18.29 seconds. Why bring this up?
+
+The 'Reduce approach encourages users to write 3 separate 'Reduce expressions,
+whereas the parallel loop approach encourages users to write a single loop with
+3 results, which I found is easiest to implement in Paraffin if the 3 results
+are combined into a single result type.
+
+So it may be that using 'Reduce can result in significantly better performance
+results than a parallel loop approach, for certain problems at least.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, January 11, 2018  3:23 AM
+
+...
+> I really think in some cases users wont mind deleting an existing loop
+> such as;
+>
+> declare
+>    Sum : Integer := 0;
+> begin
+>
+>    for I in 1 .. 1000 loop
+>       Sum := Sum + A(I);
+>    end loop
+>
+>    Put_Line ("Sum of A=" & Integer'Image(Sum); end
+>
+> and replacing with;
+>
+> Put_Line ("Sum of A=" & Integer'Image(A'Reduce("+", 0)));
+
+Actually, they probably will, because the latter might very well run 10 times
+slower. (It certainly will if the compiler tries to parallelize it, since
+setup/teardown of a tasklet probably will take a lot more than 250 integer adds;
+it might even if left sequential, as compilers have been optimizing the former
+for 40 years and probably will generate only 4 or 5 x86 machine instructions for
+the former -- but the patterns are likely to be very specific to the existing
+format and won't work on the slightly different organization of a reduction.)
+
+Indeed, some hardware has instructions specifically for loops like this. It
+might end up running in microcode and there's no way that any other feature
+could beat that.
+
+Besides, real loops that need to run faster don't look like this. They look
+like:
+
+	-- Now, calculate the various running totals:
+	for I in 1 .. Actual_Len loop
+	    if Sample_Data(I).Check_Item then
+		-- Note: We used to calculate Sample_Count,
+		-- Actual_Sum, and Actual_Sqr_Sum here, but we've
+		-- removed these as this is likely the innermost loop
+		-- for the program.
+		if Sample_Data(I).Score <= 0.0 then
+		    Negative_Count := Negative_Count + 1;
+		end if;
+		Sample_Sum := Sample_Sum + Sample_Data(I).Score;
+		Sample_Sqr_Sum := Sample_Sqr_Sum + (Sample_Data(I).Score*Sample_Data(I).Score);
+		Product_Sum := Product_Sum +
+	            (Actual_Data(I).Weighted_Worse_Ratio*Sample_Data(I).Score);
+		Diff_Sum := Diff_Sum +
+		    (Actual_Data(I).Weighted_Worse_Ratio - Sample_Data(I).Score);
+		Diff_Sqr_Sum := Diff_Sqr_Sum +
+		    (Actual_Data(I).Weighted_Worse_Ratio - Sample_Data(I).Score)**2;
+	        if Result_Debug or else (Limited_Calc_Debug and then (I <= Debug_Calc_Limit or else I >= Actual_Len - Debug_Calc_Limit)) then
+		    Ada.Text_IO.Put_Line ("Item " & Actual_Data(I).Item_Name & "; cnt="
+			    & Natural'Image(Actual_Count));
+		    Ada.Text_IO.Put("Sample Score="); LFlt_IO.Put (Sample_Data(I).Score, 10,6,0);
+		    Ada.Text_IO.Put("; Actual Score="); LFlt_IO.Put (Actual_Data(I).Weighted_Worse_Ratio, 10,6,0);
+	            Ada.Text_IO.New_Line;
+		    --Ada.Text_IO.Put("Sample Sum="); LFlt_IO.Put (Sample_Sum, 10,6,0);
+		    --Ada.Text_IO.Put("; Sample Sum**2="); LFlt_IO.Put (Sample_Sqr_Sum, 10,6,0);
+	            --Ada.Text_IO.New_Line;
+		    --Ada.Text_IO.Put("Actual Sum="); LFlt_IO.Put (Actual_Sum, 10,6,0);
+		    --Ada.Text_IO.Put("; Actual Sum**2="); LFlt_IO.Put (Actual_Sqr_Sum, 10,6,0);
+	            --Ada.Text_IO.New_Line;
+		    --Ada.Text_IO.Put("Product Sum="); LFlt_IO.Put (Product_Sum, 10,6,0);
+		    --Ada.Text_IO.Put("; Diff Sum="); LFlt_IO.Put (Diff_Sum, 10,6,0);
+		    --Ada.Text_IO.Put("; Diff Sum**2="); LFlt_IO.Put (Diff_Sqr_Sum, 10,6,0);
+	            --Ada.Text_IO.New_Line;
+		-- else no trace
+		end if;
+	    -- else do not score this pair.
+	    end if;
+	end loop;
+
+This is the operative loop of a correlation calculator, comparing an array of
+actual values to an array of (proposed) estimated values. The array length is
+tends to be as much as 4000 items. (Note that the two values that only depend on
+Actual_Data are precalculated, since Actual_Data never changes for a particular
+run, while the Sample_Data is regenerated for each trial.)
+
+Among other things, one wonders where the debugging would go in a reduction. If
+the reduction gives dubious results, you'll have to end up writing a
+conventional loop to figure out why (no I/O is allowed in parallel operations,
+since it allows blocking, and even when it is, you can't trace the intermediate
+results).
+
+Another thing about this code is that common subexpression elimination means
+that many of the memory reads are eliminated (or should be, anyway). For
+instance, Sample_Data(I).Score should be loaded into a register and only read
+once. The reduced memory traffic provides a substantial speed-up to the code.
+
+...
+> It looks to me like this will be the most work to nail down in the RM.
+> I think the Associated aspect can borrow much of the wording that we
+> currently have for the NonBlocking aspect, except change occurrences
+> of NonBlocking for Associated.
+
+Why? I though "associated" just specified that an operation is associative; that
+doesn't change the semantics of an operation that calls it. All of the
+(substantial) complication of nonblocking comes because one can't allow a call
+(anywhere) of an operation that allows blocking, so the property has to
+propagate. Don't see why "associative" would have to propagate.
+
+Or is "associated" supposed to be something completely different? I don't
+remember seeing anything about it previously...
+
+> Then the Parallel Operations AI can be a lot better focused.
+> The 'Reduce AI I think
+> will also be quite a lot simpler to write up, as both can
+> reference the tasklet AI.
+>
+> Sound OK?
+
+I don't really see the point, since 'Reducer is trivial (it's a Boolean and
+isn't transitive), the same should be true for 'Associative, and the basic
+parallel loop proposal (which doesn't support any reductions) can provide all of
+the tasklet stuff (there isn't much).
+
+I'd expect that any reduction proposal would be in a separate AI (or perhaps
+alternative AIs). If we decided to adopt multiple proposals, we could worry
+about eliminating duplication, but I do that sort of thing all of the time.
+
+...
+> On an aside, I did an experiment today where I wrote a parallel loop
+> that calculates 3 results, a sum of a set of values, the min of a set
+> of values, and the max of a set of values.
+>
+> I was curious to see if writing this as three separate loops would be
+> better or worse than creating a composite object and writing a reducer
+> function that can combine results of this record type.
+>
+> I found that writing as 3 separate loops was significantly faster than
+> 1 loop.
+>
+> For a loop that involves 4_000_000_000 iterations, the 3 loop solution
+> took 11.17 seconds. The single loop took
+> 18.29 seconds. Why bring this up?
+
+Because you're testing the wrong thing?
+
+All you've proved here is that Ada has lousy support for tuple-based
+programming. I'd be very surprised of any formulation that uses functions
+returning composite types would have decent performance doing anything.
+
+If you *really* wanted to test this, you would have used a Reducer *procedure*
+to combine the records. The overhead of a single in out parameter is likely to
+be many times less than a function return (which is creating an object for each
+iteration).
+
+But you'd also have to test whether the approach of writing a single
+conventional loop is faster or slower than any parallelism. The overhead of a
+loop, especially a parallel loop, can be substantial.
+
+> The 'Reduce approach encourages users to write 3 separate 'Reduce
+> expressions, whereas the parallel loop approach encourages users to
+> write a single loop with 3 results, which I found is easiest to
+> implement in Paraffin if the
+> 3 results are combined into a single result type.
+>
+> So it may be that using 'Reduce can result in significantly better
+> performance results than a parallel loop approach, for certain
+> problems at least.
+
+Comparing one insane solution (writing a function returning a record, requiring
+extra type and function declarations) to a second insane solution (writing the
+iterator part repeatedly, with all of the possibilities of error with that)
+hardly seems valuable. Both solutions require writing a lot of extra text
+compared to the sequential loop, and both seem to require turning the loop
+inside out.
+
+The big thing this demostrates is that Ada is a long way from really supporting
+functional programming, and it probably never will be able to do that AND have
+performance. (Ada doesn't handle tuples easily, and IMHO it shouldn't.)
+
+In any case, a parallel loop is not going to fix every performance problem. Any
+problem made up of reducers that are simple machine instructions (like "+" or
+"Min") are probably going to be fastest without any parallelism at all. (And
+will take the least text if there is only one loop!) You'd need 100s of
+thousands of iterations for those to matter, and there aren't that many real
+problems with that many iterations.
+
+But there are a lot of problems with expensive iterations that might be helped.
+And with expensive setups that would be costly to repeat. (And note the above
+comments about common subexpression elimination. If you converted my example
+into 5 loops, you'd have to redo all of the memory reads of .Score 5 times,
+while the original loop only would do each read once. That would add a lot of
+slowdown.
+
+Perhaps you'd like to retry your experiment with my example above? I suspect the
+results would be somewhat (or maybe a lot) different.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Thursday, January 11, 2018  3:59 PM
+
+> ...
+>> I really think in some cases users wont mind deleting an existing
+>> loop such as;
+>>
+>> declare
+>>    Sum : Integer := 0;
+>> begin
+>>
+>>    for I in 1 .. 1000 loop
+>>       Sum := Sum + A(I);
+>>    end loop
+>>
+>>    Put_Line ("Sum of A=" & Integer'Image(Sum); end
+>>
+>> and replacing with;
+>>
+>> Put_Line ("Sum of A=" & Integer'Image(A'Reduce("+", 0)));
+>
+> Actually, they probably will, because the latter might very well run
+> 10 times slower. (It certainly will if the compiler tries to
+> parallelize it, since setup/teardown of a tasklet probably will take a
+> lot more than 250 integer adds; it might even if left sequential, as
+> compilers have been optimizing the former for 40 years and probably
+> will generate only 4 or 5
+> x86 machine instructions for the former -- but the patterns are likely
+> to be very specific to the existing format and won't work on the
+> slightly different organization of a reduction.)
+
+I disagree with this assessment. I would expect that a 'Reduce implementation
+would generate the same code as a corresponding parallel loop, under the covers,
+assuming only a single reduction result is needed.
+
+For instance if a Paraffin like library was being called under the covers, the
+same calls could be applied.
+
+...
+> Among other things, one wonders where the debugging would go in a reduction.
+> If the reduction gives dubious results, you'll have to end up writing
+> a conventional loop to figure out why (no I/O is allowed in parallel
+> operations, since it allows blocking, and even when it is, you can't
+> trace the intermediate results).
+
+'Reduce is simple enough that there's nothing much to debug typically.
+
+Reducers like Integer'Min, or "+", etc doesn't need debugging, and the rest of
+the parallelism is provided by the implementation. There's a lot less to go
+wrong.
+
+I think if a loop is complex, then you are better off using a parallel loop,
+then you can also insert debugging code more easily.
+
+> Another thing about this code is that common subexpression elimination
+> means that many of the memory reads are eliminated (or should be,
+> anyway). For instance, Sample_Data(I).Score should be loaded into a
+> register and only read once. The reduced memory traffic provides a
+> substantial speed-up to the code.
+>
+> ...
+>> It looks to me like this will be the most work to nail down in the RM.
+>> I think the Associated aspect can borrow much of the wording that we
+>> currently have for the NonBlocking aspect, except change occurrences
+>> of NonBlocking for Associated.
+>
+> Why? I though "associated" just specified that an operation is
+> associative; that doesn't change the semantics of an operation that
+> calls it. All of the
+> (substantial) complication of nonblocking comes because one can't
+> allow a call (anywhere) of an operation that allows blocking, so the
+> property has to propagate. Don't see why "associative" would have to
+> propagate.
+
+I think we'd also want to something similar where the compiler might warn the
+user that a 'Reduce cannot be done in parallel. Similarly, you might want to
+allow 'Associative on generic formals, then you would need to disallow an actual
+that isn't associative. You might also want to query an attribute, like you can
+for NonBlocking.
+
+> Or is "associated" supposed to be something completely different? I
+> don't remember seeing anything about it previously...
+
+The idea is that it is a boolean aspect that can be applied to a binary function
+where both parameters and result are of the same type. , or to a binary
+procedure where both parameters are the same type, and one of the parameters is
+an in out.
+
+All it says is that semantically, you get the same result, regardless which
+actual gets passed in which parameter. (e.g  A + B = B + A) for Integer "+"
+
+>> Then the Parallel Operations AI can be a lot better focused.
+>> The 'Reduce AI I think
+>> will also be quite a lot simpler to write up, as both can reference
+>> the tasklet AI.
+>>
+>> Sound OK?
+>
+> I don't really see the point, since 'Reducer is trivial (it's a
+> Boolean and isn't transitive), the same should be true for
+> 'Associative, and the basic parallel loop proposal (which doesn't
+> support any reductions) can provide all of the tasklet stuff (there isn't
+> much).
+
+The Reducer aspect I am talking about is an aspect that can be applied to a
+binary function or procedure where the two parameters are not of the same type.
+It identifies another subprogram that has the Associative aspect, to be used
+when combining results from a subprogram that has the Reducer aspect.
+
+eg.
+
+In Ada.Containers.Vectors...
+
+   function "&" (Left, Right : Vector) return Vector with Associative;
+
+   function "&" (Left : Vector; Right : Element_Type) return Vector with Reducer => "&";
+
+The Reducer aspect names another function that has the Associative aspect.
+
+You can then, for instance, append Characters to a Vector in parallel.
+Otherwise, you could only concatenate vectors together in parallel.
+
+> I'd expect that any reduction proposal would be in a separate AI (or
+> perhaps alternative AIs). If we decided to adopt multiple proposals,
+> we could worry about eliminating duplication, but I do that sort of thing all of the time.
+>
+> ...
+>> On an aside, I did an experiment today where I wrote a parallel loop
+>> that calculates 3 results, a sum of a set of values, the min of a set
+>> of values, and the max of a set of values.
+>>
+>> I was curious to see if writing this as three separate loops would be
+>> better or worse than creating a composite object and writing a
+>> reducer function that can combine results of this record type.
+>>
+>> I found that writing as 3 separate loops was significantly faster
+>> than 1 loop.
+>>
+>> For a loop that involves 4_000_000_000 iterations, the 3 loop
+>> solution took 11.17 seconds. The single loop took
+>> 18.29 seconds. Why bring this up?
+>
+> Because you're testing the wrong thing?
+>
+> All you've proved here is that Ada has lousy support for tuple-based
+> programming. I'd be very surprised of any formulation that uses
+> functions returning composite types would have decent performance doing anything.
+
+Actually this works surprisingly well even for composites. I have updated my
+program to produce a more information, and I think the results are quite
+interesting.
+
+I have also implemented Randy's test program from his previous email, and yes
+that does also provide some interesting results that are quite different. See
+even farther below.
+
+
+With Optimization turned off in GNAT with 4 cores I ran the following tests on
+my program which produced the following results.
+
+1. Calculation of Sum of adding 1, 8_000_000_000 times using a sequential loop
+2. Doing the same, but with a parallel loop
+3. Calculate the sum as before, but also calculate min and max of the loop
+   index, in 3 separate sequential loops
+4. Do 3 again, but using 3 separate parallel loops.
+5. Do all 3 calculations on a composite result in a single parallel loop using
+   functional reduction. The calculation of results in each tasklet is done
+   using a procedure, but the reduction between tasklets is done using a
+   function, which is also used to provide the final result.
+6. Do 5, except use a procedural reduction library. All reduction calls and
+   final result are via a procedure in out parameter
+7. Do 2, except instead of using my own Paraffin library, use Ada code passed
+   to OpenMP library
+8. Do 4, except use my OpenMP bindings from Ada.
+9. Do 6, except use my OpenMP bindings from Ada.
+
+We see here that using 3 separate parallelism loops ends up being twice as fast
+as sequential. However, using composite record types, the performance is not
+even worthwhile, as it is worse or at least roughly equivalent as sequential.
+
+My suspicion is that having to use record types in comparison with elementary
+types can add a significant amount of overhead. Probably the elementary types
+can be processed more using hardware registers, etc.
+
+Interesting things happen when I turn on optimization though. Skip down to see
+those results.
+
+
+************* Parallel Framework Test *************
+  Physical Processors= 4
+
+Single operation in a sequential Loop
+
+{Elapsed = 00:00:19.25
+Sum= 8000000000
+
+
+Single Operation Parallelism Loop
+
+{Elapsed = 00:00:05.30
+Sum= 8000000000
+
+
+3 operations in a sequential Loop
+
+{Elapsed = 00:00:28.14
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+3 Separate Parallelism Loops
+
+{Elapsed = 00:00:15.64
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single Functional Parallelism Loops
+
+{Elapsed = 00:00:29.70
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single Procedural Parallelism Loops
+
+{Elapsed = 00:00:30.28
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single Operation OpenMP Parallelism Loop
+
+{Elapsed = 00:00:04.66
+Sum= 8000000000
+
+
+3 Separate OpenMP Parallelism Loops
+
+{Elapsed = 00:00:15.55
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single OpenMP Parallelism Loops
+
+{Elapsed = 00:02:39.06
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+----------------
+
+Now after turning on full optimization in GNAT, I get the following results for
+the same tests.
+
+
+************* Parallel Framework Test *************
+  Physical Processors= 4
+
+Single operation in a sequential Loop
+
+{Elapsed = 00:00:00.00
+Sum= 8000000000
+
+
+Single Operation Parallelism Loop
+
+{Elapsed = 00:00:02.07
+Sum= 8000000000
+
+
+3 operations in a sequential Loop
+
+{Elapsed = 00:00:08.24
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+3 Separate Parallelism Loops
+
+{Elapsed = 00:00:04.14
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single Functional Parallelism Loops
+
+{Elapsed = 00:00:05.30
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single Procedural Parallelism Loops
+
+{Elapsed = 00:00:05.31
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single Operation OpenMP Parallelism Loop
+
+{Elapsed = 00:00:02.09
+Sum= 8000000000
+
+
+3 Separate OpenMP Parallelism Loops
+
+{Elapsed = 00:00:04.19
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+
+Single OpenMP Parallelism Loops
+
+{Elapsed = 00:00:17.29
+Sum= 8000000000, Min= 1, Max= 8000000000
+
+Interesting that a single operation, (sum) doesn't even register a time.
+But when you try to do more than one, then there can be parallelism benefits.
+
+But even with optimization turned on, you get better performance using 3
+separate loops, although now at least there is at least some benefit to using a
+single loop.
+
+> If you *really* wanted to test this, you would have used a Reducer
+> *procedure* to combine the records. The overhead of a single in out
+> parameter is likely to be many times less than a function return
+> (which is creating an object for each iteration).
+
+In all my testing with Paraffin, I have found surprisingly that there never is
+much of a difference between using a function vs a procedure. You can see that
+in the results above.
+
+> But you'd also have to test whether the approach of writing a single
+> conventional loop is faster or slower than any parallelism. The
+> overhead of a loop, especially a parallel loop, can be substantial.
+
+I show that above in the results also. You can see the speedup vs sequential
+code.
+
+>> The 'Reduce approach encourages users to write 3 separate 'Reduce
+>> expressions, whereas the parallel loop approach encourages users to
+>> write a single loop with 3 results, which I found is easiest to
+>> implement in Paraffin if the
+>> 3 results are combined into a single result type.
+>>
+>> So it may be that using 'Reduce can result in significantly better
+>> performance results than a parallel loop approach, for certain
+>> problems at least.
+>
+> Comparing one insane solution (writing a function returning a record,
+> requiring extra type and function declarations) to a second insane
+> solution (writing the iterator part repeatedly, with all of the
+> possibilities of error with that) hardly seems valuable. Both
+> solutions require writing a lot of extra text compared to the
+> sequential loop, and both seem to require turning the loop inside out.
+>
+> The big thing this demostrates is that Ada is a long way from really
+> supporting functional programming, and it probably never will be able
+> to do that AND have performance. (Ada doesn't handle tuples easily,
+> and IMHO it
+> shouldn't.)
+
+I think the results above show that it actually does handle tuples pretty well.
+
+> In any case, a parallel loop is not going to fix every performance problem.
+> Any problem made up of reducers that are simple machine instructions
+> (like "+" or "Min") are probably going to be fastest without any
+> parallelism at all. (And will take the least text if there is only one
+> loop!) You'd need 100s of thousands of iterations for those to matter,
+> and there aren't that many real problems with that many iterations.
+
+I see benefits to parallelism with both optimization turned off or on, even for
+these simple operations.
+
+> But there are a lot of problems with expensive iterations that might
+> be helped. And with expensive setups that would be costly to repeat.
+> (And note the above comments about common subexpression elimination.
+> If you converted my example into 5 loops, you'd have to redo all of
+> the memory reads of .Score 5 times, while the original loop only would
+> do each read once. That would add a lot of slowdown.
+>
+> Perhaps you'd like to retry your experiment with my example above? I
+> suspect the results would be somewhat (or maybe a lot) different.
+
+I did just that, and created another test program.
+
+This program runs the code sequentially, then as separate parallel loops one for
+each result, then a single loop using a composite result type.
+
+First are the results with optimization turned off.
+
+************* Parallel Framework Test *************
+  Physical Processors= 4
+
+Multiple operations in a sequential Loop
+
+{Elapsed = 00:00:01.45
+Negative_Count= 0
+Sample_Sum= 1.99965350799589E+07
+Sample_Sqr_Sum= 1.33299985832170E+07
+Product_Sum= 1.99965350799589E+07
+Diff_Sum= 2.00046739200409E+07
+Diff_Sqr_Sum= 1.33381374236478E+07
+
+
+Multiple parallel loops
+
+{Elapsed = 00:00:01.38
+Negative_Count= 0
+Sample_Sum= 1.99965350797953E+07
+Sample_Sqr_Sum= 1.33299985832053E+07
+Product_Sum= 1.99965350797953E+07
+Diff_Sum= 2.00046739202047E+07
+Diff_Sqr_Sum= 1.33381374236362E+07
+
+
+Single parallel loop
+
+{Elapsed = 00:00:00.42
+Negative_Count= 0
+Sample_Sum= 1.99965350797952E+07
+Sample_Sqr_Sum= 1.33299985832053E+07
+Product_Sum= 1.99965350797952E+07
+Diff_Sum= 2.00046739202047E+07
+Diff_Sqr_Sum= 1.33381374236363E+07
+
+---------------------------------
+
+Then I run it again with optimization turned on....
+
+************* Parallel Framework Test *************
+  Physical Processors= 4
+
+Multiple operations in a sequential Loop
+
+{Elapsed = 00:00:00.62
+Negative_Count= 0
+Sample_Sum= 1.99965350799589E+07
+Sample_Sqr_Sum= 1.33299985832170E+07
+Product_Sum= 1.99965350799589E+07
+Diff_Sum= 2.00046739200409E+07
+Diff_Sqr_Sum= 1.33381374236478E+07
+
+
+Multiple parallel loops
+
+{Elapsed = 00:00:00.94
+Negative_Count= 0
+Sample_Sum= 1.99965350797953E+07
+Sample_Sqr_Sum= 1.33299985832053E+07
+Product_Sum= 1.99965350797952E+07
+Diff_Sum= 2.00046739202048E+07
+Diff_Sqr_Sum= 1.33381374236362E+07
+
+
+Single parallel loops
+
+{Elapsed = 00:00:00.20
+Negative_Count= 0
+Sample_Sum= 1.99965350797950E+07
+Sample_Sqr_Sum= 1.33299985832052E+07
+Product_Sum= 1.99965350797950E+07
+Diff_Sum= 2.00046739202050E+07
+Diff_Sqr_Sum= 1.33381374236362E+07
+
+
+With optimization turned off, there is only a slight benefit to using separate
+loops, but there is a big advantage to using a single loop.
+
+With optimization turned on, using parallelism with separate loops is quite a
+bit worse than sequential.
+
+However, using a single loop is about 3 times faster. So the results from this
+program are pretty much the reverse of what you get with my program.
+
+So to summarize all this,
+
+I'd say that as the complexity of a loop increases, it is better and easier to
+use a parallel loop.
+
+As the complexity of a loop decreases, it is likely better and easier to use a
+'Reduce call.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, January 11, 2018  10:40 PM
+
+...
+> > Actually, they probably will, because the latter might very well run
+> > 10 times slower. (It certainly will if the compiler tries to
+> > parallelize it, since setup/teardown of a tasklet probably will take
+> > a lot more than 250 integer adds; it might even if left sequential,
+> > as compilers have been optimizing the former for 40 years and
+> > probably will generate only 4 or 5
+> > x86 machine instructions for the former -- but the patterns are
+> > likely to be very specific to the existing format and won't work on
+> > the slightly different organization of a reduction.)
+>
+> I disagree with this assessment. I would expect that a 'Reduce
+> implementation would generate the same code as a corresponding
+> parallel loop, under the covers, assuming only a single reduction
+> result is needed.
+
+I think you missed my point. For the 1000 iterations in the original example,
+highly optimized sequential loop code probably would clobber any attempt at a
+parallel loop, as creation/ending synch on the tasklets would far outweigh the
+cost of doing 1000 iterations (most likely with all of the critical stuff in
+registers).
+
+You saw an extreme case of this in your simple test program with optimization
+on, as GNAT recognized that the loop didn't do anything very interesting other
+than waste time and replaced the entire thing with the result of 8_000_000. Not
+doing a loop is going to beat doing a loop any time. ;-)
+
+The other part of my point is that a *sequential* reduce expression might not
+get the benefit of those optimizations, because the setup and organization would
+be somewhat different. Some of these optimizations tend to be pattern matching,
+where a known pattern is recognized and replaced with something much simpler; if
+the code is slightly different, the pattern doesn't match and the optimization
+doesn't get done. Obviously, that would be very implementation-specific, so it's
+hard to say much more about that.
+
+...
+> I think if a loop is complex, then you are better off using a parallel
+> loop, then you can also insert debugging code more easily.
+
+Other than that we don't have a way to do reductions currently in a parallel
+loop. :-)
+
+Note that I'd be happy to finish the current version of the parallel loop AI
+without worrying about reducers, and deal with the reducer issues separately.
+(That would at least let us make some progress.)
+
+> > ...
+> >> It looks to me like this will be the most work to nail down in the RM.
+> >> I think the Associated aspect can borrow much of the wording that
+> >> we currently have for the NonBlocking aspect, except change
+> >> occurrences of NonBlocking for Associated.
+> >
+> > Why? I though "associated" just specified that an operation is
+> > associative; that doesn't change the semantics of an operation that
+> > calls it. All of the (substantial) complication of nonblocking comes
+> > because one can't allow a call (anywhere) of an operation that
+> > allows blocking, so the property has to propagate. Don't see why "associative"
+> > would have to propagate.
+>
+> I think we'd also want to something similar where the compiler might
+> warn the user that a 'Reduce cannot be done in parallel.
+
+Warnings are not the Standard's concern. OTOH, I would like to have a way to
+tell the compiler whether to parallelize (or not) a reduction expression, since
+it doesn't in general have the information needed to make that choice. It can't
+find out the number of iterations nor how expensive the reduction function is
+(in special cases it can, like statically bounded iterations with a predefined
+operator as a reduction function). Having to make a runtime choice would mean
+essentially generating all of the code twice (once for a parallel reduction, and
+once for a sequential reduction), and always choosing parallel would make most
+reductions pretty slow (you need a significant amount of work - either
+iterations or in the function body - before parallelizing will help anything).
+
+> Similarly, you might want to allow 'Associative on generic formals,
+> then you would need to disallow an actual that isn't associative. You
+> might also want to query an attribute, like you can for NonBlocking.
+
+That's still many times simpler than Nonblocking. Because Nonblocking controls
+legality rules in bodies for which it is true, but we have cases where you don't
+know the value until the generic is instantiated. You don't need any of that.
+
+> > Or is "associated" supposed to be something completely different? I
+> > don't remember seeing anything about it previously...
+>
+> The idea is that it is a boolean aspect that can be applied to a
+> binary function where both parameters and result are of the same type.
+> , or to a binary procedure where both parameters are the same type,
+> and one of the parameters is an in out.
+>
+> All it says is that semantically, you get the same result, regardless
+> which actual gets passed in which parameter. (e.g A + B = B + A) for
+> Integer "+"
+
+Um, that's "commutative". :-)
+
+An associative operator is one for which (A + B) + C = A + (B + C). My
+understanding is that was the only requirement needed to allow parallel
+reduction (because we don't know which one we'd want to do first, but it is easy
+enough to keep Left things on the left and Right things on the right).
+
+> >> Then the Parallel Operations AI can be a lot better focused.
+> >> The 'Reduce AI I think
+> >> will also be quite a lot simpler to write up, as both can reference
+> >> the tasklet AI.
+> >>
+> >> Sound OK?
+> >
+> > I don't really see the point, since 'Reducer is trivial (it's a
+> > Boolean and isn't transitive), the same should be true for
+> > 'Associative, and the basic parallel loop proposal (which doesn't
+> > support any reductions) can provide all of the tasklet stuff (there
+> > isn't much).
+>
+> The Reducer aspect I am talking about is an aspect that can be applied
+> to a binary function or procedure where the two parameters are not of
+> the same type. It identifies another subprogram that has the
+> Associative aspect, to be used when combining results from a
+> subprogram that has the Reducer aspect.
+>
+> eg.
+>
+> In Ada.Containers.Vectors...
+>
+>    function "&" (Left, Right : Vector) return Vector with Associative;
+>
+>    function "&" (Left : Vector; Right : Element_Type) return Vector
+> with Reducer => "&";
+>
+> The Reducer aspect names another function that has the Associative
+> aspect.
+>
+> You can then, for instance, append Characters to a Vector in parallel.
+> Otherwise, you could only concatenate vectors together in parallel.
+
+Yes, but this is still trivial (perhaps fairly large, but that is different than
+hard). In any case, I don't want these aspects stuff mixed in with the basic
+parallel loop proposal (which actually is nearing completion, I think -- it just
+needs wording polishing).
+
+Reduction stuff is its own thing, regardless of what context it appears in.
+
+...
+> So to summarize all this,
+>
+> I'd say that as the complexity of a loop increases, it is better and
+> easier to use a parallel loop.
+
+...assuming that we have a way to write a reduction in such a loop. :-)
+
+> As the complexity of a loop decreases, it is likely better and easier
+> to use a 'Reduce call.
+
+Makes sense, but I at least rarely write such loops (outside of string processing, which wouldn't have enough iterations to be worth parallelizing, and which generally aren't reductions anyway). YMMV.
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Thursday, January 11, 2018  11:35 PM
+
+>> All it says is that semantically, you get the same result, regardless
+>> which actual gets passed in which parameter. (e.g A + B = B + A) for
+>> Integer "+"
+> Um, that's "commutative". :-)
+>
+> An associative operator is one for which (A + B) + C = A + (B + C). My
+> understanding is that was the only requirement needed to allow
+> parallel reduction (because we don't know which one we'd want to do
+> first, but it is easy enough to keep Left things on the left and Right things
+> on the right).
+
+And especially, note that "&" is NOT commutative...
+
+****************************************************************
+
+From: Brad Moore
+Sent: Friday, January 12, 2018  1:13 PM
+
+...
+> I think you missed my point. ...
+
+OK
+
+> ...
+>> I think if a loop is complex, then you are better off using a
+>> parallel loop, then you can also insert debugging code more easily.
+>
+> Other than that we don't have a way to do reductions currently in a
+> parallel loop. :-)
+>
+> Note that I'd be happy to finish the current version of the parallel
+> loop AI without worrying about reducers, and deal with the reducer
+> issues separately. (That would at least let us make some progress.)
+
+I think thats a good idea for now, for that AI.
+I'd probably still try to create a separate AI on Associative, Reducer aspects, as that would be needed for the 'Reduce AI, but would be better to separate that part out.
+
+> I would like to have a way to
+> tell the compiler whether to parallelize (or not) a reduction
+> expression, since it doesn't in general have the information needed to make that choice.
+> It can't find out the number of iterations nor how expensive the
+> reduction function is (in special cases it can, like statically
+> bounded iterations with a predefined operator as a reduction
+> function). Having to make a runtime choice would mean essentially
+> generating all of the code twice (once for a parallel reduction, and
+> once for a sequential reduction), and always choosing parallel would
+> make most reductions pretty slow (you need a significant amount of
+> work - either iterations or in the function body - before parallelizing will
+> help anything).
+
+Maybe we could allow a parallel do block for this purpose. It indicates that
+parallelism is desired, and we could also say that the "and" is optional.
+
+parallel
+do
+    Foo := A'Reduce(F,0);
+end do;
+
+If the "and" is present, then parallelism is first applied to the separate
+blocks, and then optionally applied internally to each block, if more cores are
+available.
+
+e.g.
+
+parallel
+do
+    Foo := A'Reduce(F,0);
+and
+    Bar := A'Reduce(G,0);
+end do;
+
+Though if the compiler was unsure about reducing via the F reducer, it would
+probably fall back to the default as would be applied if the enclosing parallel
+do wasn't there.
+
+If one really wanted the F reducer to be done in parallel for this example, one
+could write:
+
+parallel
+do
+    parallel do
+       Foo := A'Reduce(F,0);
+    end do;
+and
+    Bar := A'Reduce(G,0);
+end do;
+
+...
+>> > Or is "associated" supposed to be something completely different? I
+>> > don't remember seeing anything about it previously...
+>>
+>> The idea is that it is a boolean aspect that can be applied to a
+>> binary function where both parameters and result are of the same
+>> type.
+>> , or to a binary procedure where both parameters are the same type,
+>> and one of the parameters is an in out.
+>>
+>> All it says is that semantically, you get the same result, regardless
+>> which actual gets passed in which parameter. (e.g A + B = B + A) for
+>> Integer "+"
+>
+> Um, that's "commutative". :-)
+
+Yes, sorry my mistake for trying to type out this email too fast without
+thinking. The intent was to write the formula
+    (A + B) + C = A + (B + C)
+
+And so my description of associative is also incorrect. It is not about being
+able to switch the order of parameters to the function, but rather being
+flexible in applying the function to a series of values without concern for
+order in how the function is applied.
+
+> In any case, I don't want these aspects stuff mixed in with the basic
+> parallel loop proposal (which actually is nearing completion, I think
+> -- it just needs wording polishing).
+>
+> Reduction stuff is its own thing, regardless of what context it appears in.
+
+Agreed.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Friday, January 12, 2018  1:17 PM
+
+...
+> And especially, note that "&" is NOT commutative...
+
+Yes, sorry for the confusion of typing the wrong formula for associativity,
+
+but note that while this is not commutative, it is associative, and thus is a
+candidate for parallelisation. The intent of the proposal is to support both
+commutative and non-commutative reducer functions.
+
+****************************************************************
+

Questions? Ask the ACAA Technical Agent